wiki:WAC5

Context Navigation

Version 21 (modified by Serge Sharoff, 17 years ago) ( diff )
--

Call for Participation

The workshop will be held on 7 September, 2009, in San Sebastian, preceding SEPLN, the Spanish NLP conference: http://ixa2.si.ehu.es/sepln2009/

For registration to the workshop and for information on accommodation in San Sebastian, check the SEPLN conference page

Preliminary programme

9.15 – 9.30	Welcome & Introduction
Session 1	Collecting Web corpora (1)
9.30 – 10.00	Jonathan Howell and Mats Rooth. Web Harvest of Minimal Intonational Pairs
10.00 – 10.30	Marco Brunello. The creation of free linguistic corpora from the web
10.30 – 11.00	Coffee break
Session 2	Aspects of Web processing (1)
11.00 – 11.30	Eugenie Giesbrecht and Stefan Evert. Part-of-Speech (POS) Tagging - a Solved Task? An Evaluation of POS Taggers for the Web as Corpus
11.30 – 12.00	Matthias Wendt, Christoph Büscher, Christian Herta, Steffen Kemmerer, Walter Tietze, Manuel Messner, Martin Gerlach and Holger Düwiger. Extracting domain terminologies from the World Wide Web
12.00 – 13.00	Invited talk: Dekang Lin. Unsupervised acquisition of lexical knowledge from the Web
13.00 – 15.00	Lunch break
Session 3	Collecting Web corpora (2)
15.00 – 15.30	Johannes M. Steger and Egon W. Stemle. The Architecture for Unified Processing of Web Content
15.30 – 16.00	Igor Leturia Azkarate, Iñaki San Vicente and Xabier Saralegi. Search engine based approaches for collecting domain-specific Basque-English comparable corpora from the Internet
16.00 – 16.30	Coffee break
Session 4	Aspects of Web processing (2)
16.30 – 17.00	Joel Tetreault and Martin Chodorow. Examining the Use of Region Web Counts for ESL Error Detection
17.00 – 17.30	Kristin Davidse and Emeline Doyen. Using Internet data for the study of language change: a comparative study of the grammaticalized uses of French genre in teenage and adult forum data
17.30 – 18.00	Nabil Hathout, Franck Sajous and Ludovic Tanguy. Looking for French deverbal nouns in an evolving Web (a short history of WAC)
18.00 - 18.30	General discussion, wrap-up & conclusion

Call for Papers

We invite papers on various topics concerning the use of Web resources for corpus research and NLP applications, including (but not limited to) the following:

linguistic Web crawler technology and Web corpus collection projects
applications of Web-derived corpora and other kinds of Web data
how far does the “easy way” get you? (using search engines, or Google's n-gram lists; we are particularly interested in a critical discussion of the usefulness and limitations of such approaches)
methods and tools for “cleaning” Web pages to turn them into a corpus
automatic linguistic annotation of Web data: tokenisation, POS tagging, lemmatisation, semantic tagging, etc. (established tools often perform very poorly on Web data)
search engine architectures for linguists: bringing linguistics to commercial search engines, or high-performance search technology to linguistics?
search engine-related topics such as result ranking (e.g. how to identify “typical” uses rather than returning 50 very similar matches on the first page)
duplicate detection, interactive query refinement, etc.
reviews and clever uses of search engine APIs (Google, Yahoo, Altavista, and in particular Microsoft's current generous Live Search API)

We particularly welcome submissions on the use of languages other than English. One of the bottlenecks in corpus linguistic research on a particular language consists in availability of corpora for this language: translation studies for, say, Ukrainian or Vietnamese are limited by the existence of diverse corpora for these languages. The Web gives the opportunity to alleviate this bottleneck, as millions of Ukrainian or Vietnamese texts are available on the Web, but we still do not know many parameters of what is there and how useful it is for translation, language teaching, linguistics research, etc.

Submission information

Authors are invited to submit full papers on original, unpublished work in the topic area of this workshop. Submissions should follow the format of ACL proceedings and should not exceed eight (8) pages, including references. All submissions have to be anonymous, i.e. use empty lines for authors and affiliations, refer to your own works indirectly, e.g., instead of 'We previously showed' use 'Smith previously showed'. We strongly recommend the use of ACL LaTeX or Microsoft Word style files tailored for this year's conference (http://www.acl-ijcnlp-2009.org/main/authors/stylefiles/).

Submissions are managed via Easy Chair. In order to submit a paper, login at http://www.easychair.org/conferences/?conf=wac5 (or register an account with Easy Chair if you don't have one yet), then click New Submission and fill in the standard fields.

Important dates

Submission deadline: 24 April, 2009
Decisions sent by: 12 June, 2009
Camera-ready submission deadline: 17 July, 2009
Welcome party: 6 September, 2009
Workshop: 7 September, 2009

Programme committee

Silvia Bernardini, U of Bologna, Italy
Jesse de Does, INL, Netherlands
Katrien Depuydt, INL, Netherlands
Stefan Evert, U of Osnabrück, Germany
Cédrick Fairon, UCLouvain, Belgium
William Fletcher, U.S. Naval Academy, USA
Gregory Grefenstette, Commissariat à l'Énergie Atomique, France
Katja Hofmann, U of Amsterdam, Netherlands
Adam Kilgarriff, Lexical Computing Ltd, UK
Igor Leturia, Elhuyar Fundazioa, Basque Country, Spain
Preslav Nakov, National U of Singapore
Phil Resnik, U of Maryland, College Park, USA
Kevin Scannell, Saint Louis U, USA
Gilles-Maurice de Schryver, U Gent, Belgium
Klaus Schulz, LMU München, Germany
Serge Sharoff, U of Leeds, UK
Eros Zanchetta, U of Bologna, Italy

Organising committee

Iñaki Alegria, University of the Basque Country
Adam Kilgarriff, Lexical Computing Ltd
Igor Leturia, Elhuyar Fundazioa
Serge Sharoff, University of Leeds

Attachments (1)

WAC5_proceedings.pdf (1.9 MB ) - added by Serge Sharoff 17 years ago. Workshop proceedings

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text

ACL SIGWAC