    1313  * [ WAC3, Louvain-la-Neuve, Belgium, 15-16 September 2007 ]
    1414  * [ WAC4 at LREC, Marrakech, Morocco, 1 June 2008]
     15  * WAC5 is scheduled for 8 September 2009, San Sebastian, Spain
     17We invite papers on various topics concerning the use of Web resources for corpus research and NLP applications, including (but not limited to) the following:
     19    * linguistic Web crawler technology and Web corpus collection projects
     20    * applications of Web-derived corpora and other kinds of Web data
     21    * how far does the “easy way” get you? (using search engines, or Google's n-gram lists; we are particularly interested in a critical discussion of the usefulness and limitations of such approaches)
     22    * methods and tools for “cleaning” Web pages to turn them into a corpus (contributors to this topic will be encouraged to participate in the second CLEANEVAL competition to be held in 2009)
     23    * automatic linguistic annotation of Web data: tokenisation, POS tagging, lemmatisation, semantic tagging, etc. (established tools often perform very poorly on Web data)
     24    * search engine architectures for linguists: bringing linguistics to commercial search engines, or high-performance search technology to linguistics?
     25    * search engine-related topics such as result ranking (e.g. how to identify “typical” uses rather than returning 50 very similar matches on the first page)
     26    * duplicate detection, interactive query refinement, etc.
     27    * reviews and clever uses of search engine APIs (Google, Yahoo, Altavista, and in particular Microsoft's current generous LiveSearch API)
    1729== Activities ==