Changes between Version 2 and Version 3 of WAC8
- Timestamp:
- 01/05/13 22:07:50 (12 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
WAC8
v2 v3 1 1 = 8th Web as Corpus Workshop (WAC-8) @ [http://ucrel.lancs.ac.uk/cl2013/ Corpus Linguistics 2013]= 2 == Lancaster, UK; Monday 22nd July 2013==2 == Monday, 22 July 2013 (Lancaster, UK) == 3 3 4 {{{#!comment 5 Sponsored by [http://www.sigwac.org.uk ACL SIGWAC]. 6 }}} 4 //Endorsed by [http://www.sigwac.org.uk ACL SIGWAC].// 7 5 8 6 Web corpora and other Web-derived data have become a gold mine for corpus linguistics and natural language processing. The Web is an easy source of unprecedented amounts of linguistic data from a broad range of registers and text types. However, a collection of Web pages is not immediately suitable for exploration in the same way a traditional corpus is. 9 7 10 Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005 Conference, a highly succesful series of yearly Web as Corpus workshops provides a venue for interested researchers to meet, share ideas and discuss the problems and possibilities of compiling and using Web corpora. After a stronger focus on application-oriented natural language processing and Web technology in recent years -- with workshops taking place at the NAACL-HLT 2010, 2011 and WWW 2012 --the 8th Web as Corpus Workshop returns to its roots in the corpus linguistics community.8 Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005 Conference, a highly succesful series of yearly Web as Corpus workshops provides a venue for interested researchers to meet, share ideas and discuss the problems and possibilities of compiling and using Web corpora. After a stronger focus on application-oriented natural language processing and Web technology in recent years – with workshops taking place at NAACL-HLT 2010, 2011 and WWW 2012 – the 8th Web as Corpus Workshop returns to its roots in the corpus linguistics community. 11 9 12 10 Accordingly, the leading theme of this workshop is the application of Web data in language research, including linguistic evaluation of Web-derived corpora as well as strategies and tools for high-quality automatic annotation of Web text. We invite papers on all aspects of building and using Web corpora, with a particular focus on (but not limited to) the following: 13 11 14 12 * applications of Web corpora and other Web-derived data sets for language research 15 * automatic linguistic annotation of Web data such as tokenisation, part-of-speech tagging, lemmatisation and semantic tagging (the accuracy of established softwaretools is still unsatisfactory for many types of Web data)16 * critical exploration of characteristics of Web data from a linguistic perspective and its applicability to language research17 * presentation of Web corpus collection projects or software tools required for some part of th eprocess (crawling, filtering, de-duplication, language identification, indexing, ...)13 * automatic linguistic annotation of Web data such as tokenisation, part-of-speech tagging, lemmatisation and semantic tagging\\ (the accuracy of currently available off-the-shelf tools is still unsatisfactory for many types of Web data) 14 * critical exploration of the characteristics of Web data from a linguistic perspective and its applicability to language research 15 * presentation of Web corpus collection projects or software tools required for some part of this process (crawling, filtering, de-duplication, language identification, indexing, ...) 18 16 19 17 {{{#!comment