Changes between Version 10 and Version 11 of WAC9
- Timestamp:
- 11/11/13 15:14:40 (11 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
WAC9
v10 v11 5 5 6 6 == Description == 7 7 8 The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data diversity. 8 9 Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/ text types. … … 17 18 == Call for papers == 18 19 19 As in previous years, the 9th Web as Corpus workshop (WaC9) invites contributions pertaining to all aspects of web corpora, including data collection, cleaning, duplicate removal, document filtering, linguistic post-processing, and use of web corpora in language technology and linguistics.20 As in previous years, the 9th Web as Corpus workshop (WaC9) invites original contributions pertaining to all aspects of web corpora, including data collection, cleaning, duplicate removal, document filtering, linguistic post-processing, and use of web corpora in language technology and linguistics. 20 21 21 22 However, a major challenge in the construction of web corpora is the question of the quality and the evaluation of both the software used in the construction of web corpora as well as the corpora themselves. … … 27 28 * sampling strategies/ crawling algorithms and their effect on corpus composition/ corpus quality 28 29 * non-destructive cleaning and normalization of web data (Currently available web corpora have usually undergone radical cleaning procedures in order to produce "high-quality" data. At least for some uses of the data, aggressive and sometimes arbitrary removal of material in the form of whole documents or parts thereof can be problematic. The same is true for aggressive normalization of the data. To meet such problems, ways of cleaning and normalizing the data transparently, i.e., preserving the non-normalized forms, should be discussed.) 30 31 === Submission details === 32 33 Abstracts should be 34 35 * anonymous 36 * no longer than two pages (including figures and references) 37 * in PDF-format 38 * formatted according to the EACL stylesheet (LaTeX and MS Word templates are available [http://www.eacl2014.org/files/eacl-2014-styles.zip here]) 39 * in PDF-format 40 * submitted via the [https://www.softconf.com/eacl2014/WaC9/ START online submission system] no later than 23 January 2014 41 29 42 30 43 … … 58 71 * Stephen Wattam, Lancaster University 59 72 60 == Submission details ==61 62 Abstracts should be63 64 * anonymous65 * no longer than X words, including references66 * in PDF-format67 * submitted via START (to be made available in due time)68 73 69 74