Changes between Version 2 and Version 3 of WAC9

Nov 5, 2013, 3:00:21 PM (6 years ago)
Felix Bildhauer



  • WAC9

    v2 v3  
    1919* task-based ("extrinsic") evaluation of web corpora, especially in comparison to traditional corpus resources and n-gram databases (Web 1T 5-Grams, Google Books)
    2020* missing meta data in web corpora: enriching web corpora with data by automatic classification with high accuracy
    21 * sampling strategies\slash crawling algorithms and their effect on corpus composition\slash corpus quality
     21* sampling strategies/ crawling algorithms and their effect on corpus composition/ corpus quality
    2222* non-destructive cleaning and normalization of web data  (Currently available web corpora have usually undergone radical cleaning procedures in order to produce "high-quality" data. At least for some uses of the data, aggressive and sometimes arbitrary removal of material in the form of whole documents or parts thereof can be problematic. The same is true for aggressive normalization of the data. To meet such problems, ways of cleaning and normalizing the data transparently, i.e., preserving the non-normalized forms, should be discussed.)
    4949* Serge Sharoff, University of Leeds
    5050* Sabine Schulte, im Walde, Universität Stuttgart
    51 * Egon Stemle, European Academy of Bozen/Bolzano
     51* Egon Stemle, European Academy of Bolzano
    5252* Yannick Versley, Universität Heidelberg
    5353* Torsten Zesch, Universität Darmstadt
    5454* Stephen Wattam, Lancaster University
     56== Important dates ==
     58* 11 November 2013: First Call for Workshop Papers
     59* 12 December 2013: Second Call for Workshop Papers
     60* 4 January 2014: Final Call for Workshop Papers
     61* 23 January 2014: Workshop Paper Due Date
     62* 20 February 2014: Notification of Acceptance
     63* 3 March 2014: Camera-ready papers due
     64* 26-27 April 2014: EACL Workshop Dates