7th Web as Corpus Workshop (WAC-7)
Lyon, France; 17th April 2012
To be held in association with WWW2012.
Sponsored by ACL SIGWAC and PRESEMT
More and more people are using Web data for linguistic and NLP research: the Web provides an easy source of linguistic data in a great variety of languages. However, a ‘crawl’ is not ready for exploration in the same way a traditional ‘corpus’ is. We need to turn a crawl into a corpus. The workshop, the seventh in an annual series, provides a venue for exploring what it involves, how to do it, and what we find out if we do.
We invite submissions which:
- describe Web corpus collection projects, or modules for one part of the process (crawling, filtering, de-duplication, language-id, tokenising, indexing, ...)
- explore characteristics of Web data from a linguistics/NLP perspective including registers, domains, frequency distributions, comparisons between datasets
- use crawled Web data for NLP purposes (with emphasis on the data rather than the use)
The previous WAC workshops have been co-located with various conferences in computational linguistics. This time the workshop co-locates with WWW2012, the main world conference on the Web technologies and their impact on the society.
Programme
Room Saint Clair 4 at Convention Centre, WWW2012
The proceedings are available from here
9.00 | Welcome |
9.10 | Invited Talk: Benno Stein |
Exploiting the Web for Text and Language Reuse Applications | |
10.00 | Marco Brunello |
Understanding the composition of parallel corpora from the web | |
10.25 | Vit Suchomel, Jan Pomikalek |
Efficient Web Crawling for Large Text Corpora | |
10.40 | Coffee |
11.00 | Ed Chow, Dayne Freitag, Paul Kalmar, Tulay Muezzinoglu, John Niekrasz |
A corpus of online discussions for research into linguistic memes | |
11.25 | Paul Rayson, Oliver Charles, Ian Auty |
Can Google count? Estimating search engine result consistency | |
11.50 | Tobias Roth |
Using Web Corpora for the Recognition of Regional Variation in Standard German Collocations | |
12.15 | Yannick Versley, Yana Panchenko |
Not Just Bigger: Towards Better-Quality Web Corpora | |
12.40 | Discussion, wrap-up |
13.00 | End |
Organising committee
- Adam Kilgarriff (Lexical Computing Ltd.)
- Serge Sharoff (University of Leeds, Workshop Chair)
Programme committee
Organising committee plus:
- Silvia Bernardini, U of Bologna, Italy
- Stefan Evert, U of Osnabrück, Germany
- Cédrick Fairon, UCLouvain, Belgium
- William H. Fletcher, U.S. Naval Academy, USA
- Gregory Grefenstette, Exalead, France
- Igor Leturia, Elhuyar Fundazioa, Basque Country, Spain
- Preslav Nakov, National U of Singapore
- Jan Pomikalek (Masaryk University)
- Reinhard Rapp, U Mainz, Germany
- Kevin Scannell, Saint Louis U, USA
- Gilles-Maurice de Schryver, U Gent, Belgium
- Pierre Zweigenbaum, LIMSI, France
Attachments (1)
-
wac7-proc.pdf
(3.7 MB
) - added by 13 years ago.
Workshop proceedings