Changes between Version 2 and Version 3 of WAC8


Ignore:
Timestamp:
01/05/13 22:07:50 (11 years ago)
Author:
stefan
Comment:

small edits

Legend:

Unmodified
Added
Removed
Modified
  • WAC8

    v2 v3  
    11= 8th Web as Corpus Workshop (WAC-8) @ [http://ucrel.lancs.ac.uk/cl2013/ Corpus Linguistics 2013]=
    2 == Lancaster, UK; Monday 22nd July 2013 ==
     2== Monday, 22 July 2013 (Lancaster, UK) ==
    33
    4 {{{#!comment
    5 Sponsored by [http://www.sigwac.org.uk ACL SIGWAC].
    6 }}}
     4//Endorsed by [http://www.sigwac.org.uk ACL SIGWAC].//
    75
    86Web corpora and other Web-derived data have become a gold mine for corpus linguistics and natural language processing.  The Web is an easy source of unprecedented amounts of linguistic data from a broad range of registers and text types.  However, a collection of Web pages is not immediately suitable for exploration in the same way a traditional corpus is.
    97
    10 Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005 Conference, a highly succesful series of yearly Web as Corpus workshops provides a venue for interested researchers to meet, share ideas and discuss the problems and possibilities of compiling and using Web corpora.  After a stronger focus on application-oriented natural language processing and Web technology in recent years -- with workshops taking place at the NAACL-HLT 2010, 2011 and WWW 2012 -- the 8th Web as Corpus Workshop returns to its roots in the corpus linguistics community.
     8Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005 Conference, a highly succesful series of yearly Web as Corpus workshops provides a venue for interested researchers to meet, share ideas and discuss the problems and possibilities of compiling and using Web corpora.  After a stronger focus on application-oriented natural language processing and Web technology in recent years – with workshops taking place at NAACL-HLT 2010, 2011 and WWW 2012 – the 8th Web as Corpus Workshop returns to its roots in the corpus linguistics community.
    119
    1210Accordingly, the leading theme of this workshop is the application of Web data in language research, including linguistic evaluation of Web-derived corpora as well as strategies and tools for high-quality automatic annotation of Web text. We invite papers on all aspects of building and using Web corpora, with a particular focus on (but not limited to) the following:
    1311
    1412 * applications of Web corpora and other Web-derived data sets for language research
    15  * automatic linguistic annotation of Web data such as tokenisation, part-of-speech tagging, lemmatisation and semantic tagging (the accuracy of established software tools is still unsatisfactory for many types of Web data)
    16  * critical exploration of characteristics of Web data from a linguistic perspective and its applicability to language research
    17  * presentation of Web corpus collection projects or software tools required for some part of the process (crawling, filtering, de-duplication, language identification, indexing, ...)
     13 * automatic linguistic annotation of Web data such as tokenisation, part-of-speech tagging, lemmatisation and semantic tagging\\ (the accuracy of currently available off-the-shelf tools is still unsatisfactory for many types of Web data)
     14 * critical exploration of the characteristics of Web data from a linguistic perspective and its applicability to language research
     15 * presentation of Web corpus collection projects or software tools required for some part of this process (crawling, filtering, de-duplication, language identification, indexing, ...)
    1816
    1917{{{#!comment