Changes between Version 9 and Version 10 of WAC-XI


Ignore:
Timestamp:
01/18/17 14:12:56 (7 years ago)
Author:
Roland Schäfer
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WAC-XI

    v9 v10  
    22
    33= 11th Web as Corpus Workshop (WAC-XI) =
     4at [http://www.birmingham.ac.uk/research/activity/corpus/events/2017/cl2017/index.aspx Corpus Linguistics 2017, Birmingham]
    45featuring the First !CleanerEval Shared Task panel discussion
    56
     
    1617== Main workshop ==
    1718
    18 The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems such as data sparseness or the lack of variation in written data. Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types. The field is still new, though, and a number of issues in web corpus construction need much additional research, both fundamental and applied. These issues range from questions of corpus design (e.g., the assessment of corpus composition or the handling of web spam and duplicated material) to more technical aspects (e.g., efficient implementation of individual post-processing steps in document cleansing and linguistic annotation, automatic generation of document-level meta data, or large-scale parallelization to achieve web-scale corpus construction). Similarly, the systematic evaluation of web corpora, for example in the form of task-based comparisons to traditional corpora, has only recently shifted into focus. Finally, other forms of computer-mediated communication (e.g., Twitter) have recently received a lot of attention from corpus designers.
    19 
    20 For almost a decade, the ACL SIGWAC (http://www.sigwac.org.uk/), and especially the highly successful Web as Corpus (WAC) workshops have served as a platform for researchers interested in compilation, processing and application of web-derived corpora and other types of CMC corpora. Past workshops were co-located with major conferences on computational linguistics and/or corpus linguistics (such as ACL, EACL, NAACL, LREC, WWW, Corpus Linguistics). As in previous years, the 11th Web as Corpus workshop (WAC-XI) invites contributions pertaining to all aspects of web corpus creation, including but not restricted to
    21 
    22 * data collection (both large web corpora and other types of CMC corpora)
    23 * cleaning/handling of noise
    24 * duplicate removal/document filtering
    25 * linguistic post-processing (including non-standard data)
    26 * automatic generation of meta data (including register, genre, etc.)
    27 
    28 Furthermore, aspects of usability and availability of web-derived corpora are highly relevant in the context of WAC-XI
    29 
    30 * development of user interfaces
    31 * visualization techniques
    32 * tools for statistical analysis of very large (e.g., web-derived) corpora
    33 * long-term archiving
    34 * documentation and standardization
    35 * legal issues
    36 
    37 Finally, reports of the use of web corpora in language technology and linguistics are welcome, for example
    38 
    39 * linguistic studies of web-specific forms of communication
    40 * linguistic studies of rare phenomena in web data
    41 * web-specific lexicography, grammaticography, and language documentation
    42 * information extraction & opinion mining
    43 * language modeling, distributional semantics
    44 * machine translation
    45 
     19tba
    4620
    4721== Panel discussion: the !CleanerEval shared task == #cleanereval
     
    7953=== Important dates ===#dates
    8054
    81 tba
     55Workshop day: between 24 and 27 July 2017
    8256
    8357=== Call for papers === #cfp