| 224 | |
| 225 | {{{#!comment |
| 226 | SUBJECT: Call for Participation: 8th Web as Corpus Workshop (22 July 2013, Lancaster, UK) |
| 227 | """ |
| 228 | CALL FOR PARTICIPATION |
| 229 | |
| 230 | 8th Web as Corpus Workshop (WAC-8) |
| 231 | Endorsed by ACL SIGWAC |
| 232 | Hosted by the Corpus Linguistics 2013 Conference |
| 233 | |
| 234 | Monday, 22 July 2013 (Lancaster, UK) |
| 235 | |
| 236 | ** Note that registration for the workshop and the main conference closes on SUNDAY, JUNE 30. ** |
| 237 | Registration URL: http://ucrel.lancs.ac.uk/cl2013/register.php |
| 238 | |
| 239 | Further details can be found on the workshop homepage at |
| 240 | |
| 241 | http://sigwac.org.uk/wiki/WAC8 |
| 242 | |
| 243 | ______________________________________________________________________ |
| 244 | |
| 245 | Web corpora and other Web-derived data have become a gold mine for corpus linguistics and natural language processing. The Web is an easy source of unprecedented amounts of linguistic data from a broad range of registers and text types. However, a collection of Web pages is not immediately suitable for exploration in the same way a traditional corpus is. |
| 246 | |
| 247 | Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005 Conference, a highly successful series of yearly Web as Corpus workshops provides a venue for interested researchers to meet, share ideas and discuss the problems and possibilities of compiling and using Web corpora. After a stronger focus on application-oriented natural language processing and Web technology in recent years – with workshops taking place at NAACL-HLT 2010, 2011 and WWW 2012 – the 8th Web as Corpus Workshop returns to its roots in the corpus linguistics community. |
| 248 | |
| 249 | Accordingly, the leading theme of this workshop is the application of Web data in language research, including linguistic evaluation of Web-derived corpora as well as strategies and tools for high-quality automatic annotation of Web text. The workshop brings together presentations on all aspects of building, using and evaluating Web corpora, with a particular focus on the following topics: |
| 250 | |
| 251 | * applications of Web corpora and other Web-derived data sets for language research |
| 252 | * automatic linguistic annotation of Web data such as tokenisation, part-of-speech tagging, lemmatisation and semantic tagging (the accuracy of currently available off-the-shelf tools is still unsatisfactory for many types of Web data) |
| 253 | * critical exploration of the characteristics of Web data from a linguistic perspective and its applicability to language research |
| 254 | * presentation of Web corpus collection projects or software tools required for some part of this process (crawling, filtering, de-duplication, language identification, indexing, ...) |
| 255 | |
| 256 | ______________________________________________________________________ |
| 257 | |
| 258 | PROGRAMME |
| 259 | |
| 260 | 09:00 Akshay Minocha, Siva Reddy and Adam Kilgarriff -- Feed Corpus: An Ever Growing Up-to-date Corpus |
| 261 | 09:30 Stephen Wattam, Paul Rayson and Damon Berridge -- LWAC: Longitudinal Web-as-Corpus Sampling |
| 262 | 10:00 Roland Schäfer, Adrien Barbaresi and Felix Bildhauer -- The Good, the Bad, and the Hazy: Design Decisions in Web Corpus Construction |
| 263 | 10:30 Jesse Egbert and Douglas Biber -- Developing a User-based Method of Web Register Classification |
| 264 | |
| 265 | 11:00 - 11:30 Tea Break |
| 266 | |
| 267 | 11:30 Adam Kilgarriff and Vít Suchomel -- Web Spam |
| 268 | 12:00 David Lutz, Parry Cadwallader and Mats Rooth -- A web application for filtering and annotating web speech data |
| 269 | 12:30 Sarah Schulz, Verena Lyding and Lionel Nicolas -- STirWaC - Compiling a diverse corpus based on texts from the web for South Tyrolean German |
| 270 | |
| 271 | 13:00 - 14:00 Lunch |
| 272 | |
| 273 | 14:00 Alexander Piperski, Vladimir Belikov, Nikolay Kopylov, Vladimir Selegey and Serge Sharoff -- Big and diverse is beautiful: A large corpus of Russian to study linguistic variation |
| 274 | 14:30 Adriano Ferraresi and Silvia Bernardini -- The academic Web-as-Corpus |
| 275 | 15:00 Silke Scheible and Sabine Schulte Im Walde -- A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus |
| 276 | |
| 277 | 15:30 - 16:00 Tea Break |
| 278 | |
| 279 | 16:00 Andrew Brindle -- Thug breaks man's jaw: A Corpus Analysis of Responses to Interpersonal Street Violence |
| 280 | 16:30 Colleen Crangle -- A web-based model of semantic relatedness and the analysis of electroencephalographic (EEG) data |
| 281 | 17:00 Discussion and wrap-up |
| 282 | |
| 283 | 18:00 Pub |
| 284 | |
| 285 | ______________________________________________________________________ |
| 286 | |
| 287 | Looking forward to seeing you at the workshop, |
| 288 | The organising committee. |
| 289 | |
| 290 | Stefan Evert, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) |
| 291 | Egon Stemle, European Academy of Bozen/Bolzano (EURAC) |
| 292 | Paul Rayson, Lancaster University |
| 293 | """ |
| 294 | }}} |