Changes between Version 9 and Version 10 of WAC9

11/11/13 14:56:53 (11 years ago)
Felix Bildhauer



  • WAC9

    v9 v10  
    1212For almost a decade, the ACL SIGWAC, and especially the highly successful Web as Corpus (WaC) workshops have served as a platform for researchers interested in building and working with web-derived corpora.
    1313Past workshops have been co-located with major conferences on computational linguistics and/ or corpus linguistics (such as EACL, LREC, WWW, Corpus Linguistics).
     14As part of the workshop, we will have a panel discussion dedicated to the planning of a shared task for WaC10 (2015), including the nomination of organizers of the shared task.
     15The tracks of the shared task will focus on the quality of web corpus creation tools, tools for linguistic annotation (at least lemmatization, possibly also POS tagging, etc.), and the quality of web corpora themselves.
    1517== Call for papers ==
    2628* non-destructive cleaning and normalization of web data  (Currently available web corpora have usually undergone radical cleaning procedures in order to produce "high-quality" data. At least for some uses of the data, aggressive and sometimes arbitrary removal of material in the form of whole documents or parts thereof can be problematic. The same is true for aggressive normalization of the data. To meet such problems, ways of cleaning and normalizing the data transparently, i.e., preserving the non-normalized forms, should be discussed.)
    28 As part of the workshop, we will have a panel discussion dedicated to the planning of a shared task for WaC10 (2015), including the nomination of organizers of the shared task.
    29 The tracks of the shared task will focus on the quality of web corpus creation tools, tools for linguistic annotation (at least lemmatization, possibly also POS tagging, etc.), and the quality of web corpora themselves.
    3131== Organizing Committee ==