Changes between Version 23 and Version 24 of WAC-XI


Ignore:
Timestamp:
02/16/17 17:58:04 (7 years ago)
Author:
Roland Schäfer
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WAC-XI

    v23 v24  
    6363
    6464
    65 === !CleanerEval first panel discussion ===
     65== !CleanerEval first panel discussion ==
    6666
    6767As part of the workshop and consistent with its general theme, we plan to organise a panel discussion as the first meeting of the !CleanerEval shared task on combined paragraph and document quality detec­tion for (web) documents. The !CleanerEval shared task follows the successful !CleanEval shared task organised by SIGWAC in 2006. While !CleanEval focused specifically on boilerplate re­moval (the removal of automatically inserted and frequently repeated non-corpus material from web pages), !CleanerEval goes beyond this basic task. Participating systems should be able to determine the linguistic quality of para­graphs and whole documents in an automatic fashion, such that corpus designers and/or users can decide whether to include them in their corpus or not. In the !CleanerEval setting, boilerplate paragraphs are paragraphs with low quality, but there might be other, non-boilerplate paragraphs with low quality as well. !CleanerEval was proposed by the organisers of WAC-XI during the final discussion of WAC-X, where the proposal was met with great interest. The WAC-XI panel discussion is intended to serve as a platform for the development of the operationalisation of the notions of paragraph and document quality, the an­notation guidelines, and the final schedule for the shared task. There can be no doubt that corpus lin­guists should define what counts as good corpus material and what does not. It would be misguided to threat this ques­tion as a purely technical one. The final meeting of the shared task is planned for to be part of WAC-XII in 2018.
    6868
    6969
    70 === Programme committee ===
     70== Programme committee ==
    7171
    7272