Version 10 (modified by 13 years ago) ( diff ) | ,
---|
7th Web as Corpus Workshop (WAC-7)
To be held in association with WWW2012 in Lyon, France, 17th April 2012
Sponsored by ACL SIGWAC
More and more people are using Web data for linguistic and NLP research: the Web provides an easy source of linguistic data in a great variety of languages. However, a ‘crawl’ is not ready for exploration in the same way a traditional ‘corpus’ is. We need to turn a crawl into a corpus. The workshop, the seventh in an annual series, provides a venue for exploring what it involves, how to do it, and what we find out if we do.
We invite submissions which:
- describe Web corpus collection projects, or modules for one part of the process (crawling, filtering, de-duplication, language-id, tokenising, indexing, ...)
- explore characteristics of Web data from a linguistics/NLP perspective including registers, domains, frequency distributions, comparisons between datasets
- use crawled Web data for NLP purposes (with emphasis on the data rather than the use)
The previous WAC workshops have been co-located with various conferences in computational linguistics. This time the workshop co-locates with WWW2012, the main world conference on the Web technologies and their impact on the society.
wiki:Programme
Organising committee
- Adam Kilgarriff (Lexical Computing Ltd.)
- Serge Sharoff (University of Leeds, Workshop Chair)
Programme committee
Organising committee plus:
- Silvia Bernardini, U of Bologna, Italy
- Stefan Evert, U of Osnabrück, Germany
- Cédrick Fairon, UCLouvain, Belgium
- William H. Fletcher, U.S. Naval Academy, USA
- Gregory Grefenstette, Exalead, France
- Igor Leturia, Elhuyar Fundazioa, Basque Country, Spain
- Preslav Nakov, National U of Singapore
- Jan Pomikalek (Masaryk University)
- Reinhard Rapp, U Mainz, Germany
- Kevin Scannell, Saint Louis U, USA
- Gilles-Maurice de Schryver, U Gent, Belgium
- Pierre Zweigenbaum, LIMSI, France
Attachments (1)
-
wac7-proc.pdf
(3.7 MB
) - added by 13 years ago.
Workshop proceedings