Changes between Version 12 and Version 13 of WAC9


Ignore:
Timestamp:
11/11/13 23:03:02 (10 years ago)
Author:
Roland Schäfer
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WAC9

    v12 v13  
    77
    88The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data diversity.
    9 Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/ text types.
     9Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types.
    1010However, the field is still new, and a number of issues in web corpus construction still needs much research (fundamental and applied), ranging from questions of corpus design (e.g., corpus composition assessment, sampling strategies and their relation to crawling algorithms, handling of duplicated material) to more technical aspects (e.g., efficient implementation of individual post-processing steps in document cleansing and linguistic annotation, or large-scale parallelization to achieve web-scale corpus construction).
    1111Similarly, the systematic evaluation of web corpora, for example in the form of task-based comparisons to traditional corpora, has only lately shifted into focus.