Changes between Version 10 and Version 11 of WAC9


Ignore:
Timestamp:
11/11/13 15:14:40 (11 years ago)
Author:
Felix Bildhauer
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WAC9

    v10 v11  
    55
    66== Description ==
     7
    78The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data diversity.
    89Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/ text types.
     
    1718== Call for papers ==
    1819
    19 As in previous years, the 9th Web as Corpus workshop (WaC9) invites contributions pertaining to all aspects of web corpora, including data collection, cleaning, duplicate removal, document filtering, linguistic post-processing, and use of web corpora in language technology and linguistics.
     20As in previous years, the 9th Web as Corpus workshop (WaC9) invites original contributions pertaining to all aspects of web corpora, including data collection, cleaning, duplicate removal, document filtering, linguistic post-processing, and use of web corpora in language technology and linguistics.
    2021
    2122However, a major challenge in the construction of web corpora is the question of the quality and the evaluation of both the software used in the construction of web corpora as well as the corpora themselves.
     
    2728* sampling strategies/ crawling algorithms and their effect on corpus composition/ corpus quality
    2829* non-destructive cleaning and normalization of web data  (Currently available web corpora have usually undergone radical cleaning procedures in order to produce "high-quality" data. At least for some uses of the data, aggressive and sometimes arbitrary removal of material in the form of whole documents or parts thereof can be problematic. The same is true for aggressive normalization of the data. To meet such problems, ways of cleaning and normalizing the data transparently, i.e., preserving the non-normalized forms, should be discussed.)
     30
     31=== Submission details ===
     32
     33Abstracts should be
     34
     35* anonymous
     36* no longer than two pages (including figures and references)
     37* in PDF-format
     38* formatted according to the EACL stylesheet (LaTeX and MS Word templates are available [http://www.eacl2014.org/files/eacl-2014-styles.zip here])
     39* in PDF-format
     40* submitted via the [https://www.softconf.com/eacl2014/WaC9/ START online submission system] no later than 23 January 2014
     41
    2942
    3043
     
    5871* Stephen Wattam, Lancaster University
    5972
    60 == Submission details ==
    61 
    62 Abstracts should be
    63 
    64 * anonymous
    65 * no longer than X words, including references
    66 * in PDF-format
    67 * submitted via START (to be made available in due time)
    6873
    6974