Changes between Version 12 and Version 13 of WAC9
- Timestamp:
- 11/11/13 23:03:02 (11 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
WAC9
v12 v13 7 7 8 8 The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data diversity. 9 Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/ 9 Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types. 10 10 However, the field is still new, and a number of issues in web corpus construction still needs much research (fundamental and applied), ranging from questions of corpus design (e.g., corpus composition assessment, sampling strategies and their relation to crawling algorithms, handling of duplicated material) to more technical aspects (e.g., efficient implementation of individual post-processing steps in document cleansing and linguistic annotation, or large-scale parallelization to achieve web-scale corpus construction). 11 11 Similarly, the systematic evaluation of web corpora, for example in the form of task-based comparisons to traditional corpora, has only lately shifted into focus.