Version 13 (modified by 16 years ago) ( diff ) | ,
---|
ACL SIGWAC home page
The Special Interest Group of the Association for Computational Linguistics (ACL) on Web as Corpus.
Objectives
- to promote interest in the use of the web as a source of linguistic data, and as an object of study in its own right;
- to provide members of the ACL with a special interest in the web-as-corpus with a means of exchanging news of recent research developments and other matters of interest;
- to sponsor meetings and workshops on the web as corpus that appear to be timely and worthwhile.
Meetings
- WAC1, at Corpus Linguistics conference, Birmingham, UK, July 2005
- WAC2, at EACL, Trento, Italy, April 2006
- WAC3, Louvain-la-Neuve, Belgium, 15-16 September 2007
- WAC4 at LREC, Marrakech, Morocco, 1 June 2008
- WAC5 is scheduled for 8 September 2009, San Sebastian, Spain
We invite papers on various topics concerning the use of Web resources for corpus research and NLP applications, including (but not limited to) the following:
- linguistic Web crawler technology and Web corpus collection projects
- applications of Web-derived corpora and other kinds of Web data
- how far does the “easy way” get you? (using search engines, or Google's n-gram lists; we are particularly interested in a critical discussion of the usefulness and limitations of such approaches)
- methods and tools for “cleaning” Web pages to turn them into a corpus (contributors to this topic will be encouraged to participate in the second CLEANEVAL competition to be held in 2009)
- automatic linguistic annotation of Web data: tokenisation, POS tagging, lemmatisation, semantic tagging, etc. (established tools often perform very poorly on Web data)
- search engine architectures for linguists: bringing linguistics to commercial search engines, or high-performance search technology to linguistics?
- search engine-related topics such as result ranking (e.g. how to identify “typical” uses rather than returning 50 very similar matches on the first page)
- duplicate detection, interactive query refinement, etc.
- reviews and clever uses of search engine APIs (Google, Yahoo, Altavista, and in particular Microsoft's current generous LiveSearch API)
Activities
- CLEANEVAL, a competition for cleaning webpages
- Mailing list:
- sign up here
- address to send mail to: sigwac@…
Officers
- Chair: Serge Sharoff
- Secretary: Marco Baroni
Useful resources
Attachments (2)
-
constitution.txt
(1.7 KB
) - added by 17 years ago.
Constitution of the SIGWAC
- at.gif (884 bytes ) - added by 16 years ago.
Download all attachments as: .zip
Note:
See TracWiki
for help on using the wiki.