Changes between Version 34 and Version 35 of WAC-XI


Ignore:
Timestamp:
07/04/17 09:53:04 (7 years ago)
Author:
Roland Schäfer
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WAC-XI

    v34 v35  
    1111Contact: `wacxi2017@gmail.com`
    1212
     13
     14'''Note: WAC-XI has been merged with: [http://corpora.ids-mannheim.de/cmlc-2017.html CMLC + BigNLP]'''
     15Please refer to the CMLC website for details.
     16
    1317== Organizers ==
    1418
     
    1620* [http://www1.ids-mannheim.de/gra/personal/bildhauer.html Felix Bildhauer (IDS Mannheim)]
    1721* [http://rolandschaefer.net Roland Schäfer (Freie Universität Berlin (DFG))]
     22
     23
     24== Accepted papers ==
     25
     26The accepted papers will appear in the proceedings of [http://corpora.ids-mannheim.de/cmlc-2017.html CMLC + BigNLP].
     27
     28* '''Edyta Jurkiewicz-Rohrbacher, Zrinka Kolaković, Björn Hansen''': ''Web Corpora – the best possible solution for tracking rare phenomena in underresourced languages – clitics in Bosnian, Croatian and Serbian''
     29* '''Vladimir Benko''': ''Are Web Corpora Inferior? The Case of Czech and Slovak''
     30* '''Vit Suchomel''': ''Removing Spam from Web Corpora Through Supervised Learning Using !FastText''
    1831
    1932
     
    4255* Are there differences with regard to the dispersion of linguistic entities in web corpora com­pared to traditionally compiled corpora? If so: Why? Does it matter? How can we deal with it or even profit from it?
    4356* How do very large web corpora compare to smaller, more intentionally stratified web corpora created for a specific task? How can it be decided which type of corpus is better for a given research question?
    44 
    45 === Submission format ===
    46 
    47 We call for '''anonymous''' extended abstracts of up to 1,500 words length (excluding references, tables, and figures).
    48 Submissions must be in PDF format. Authors of accepted papers will receive minimal formatting instructions for the publication of the abstracts on the WAC-XI website in due time.
    49 There will be no proceedings volume, but a successful workshop might lead to a special issue/edited volume on web (and similar) data in linguistics (with a new round of peer reviewing), for which a separate call for (full) papers would be published after the workshop.
    50 
    51 === Submission website ===
    52 
    53 Please use our [https://easychair.org/conferences/?conf=wac11 EasyChair] installation exclusively.
    54 
    55 === Important dates ===#dates
    56 
    57 * 16 February 2017: First call for workshop papers
    58 * 30 March 2017: Second call for workshop papers
    59 * 24 April 2017: Abstract due date (23:59 GMT)
    60 * 12 June 2017: Notification of acceptance
    61 * 24 July 2017: Workshop day
    62 
    63 
    64 
    65 == !CleanerEval first panel discussion ==
    66 
    67 As part of the workshop and consistent with its general theme, we plan to organise a panel discussion as the first meeting of the !CleanerEval shared task on combined paragraph and document quality detec­tion for (web) documents. The !CleanerEval shared task follows the successful !CleanEval shared task organised by SIGWAC in 2006. While !CleanEval focused specifically on boilerplate re­moval (the removal of automatically inserted and frequently repeated non-corpus material from web pages), !CleanerEval goes beyond this basic task. Participating systems should be able to determine the linguistic quality of para­graphs and whole documents in an automatic fashion, such that corpus designers and/or users can decide whether to include them in their corpus or not. In the !CleanerEval setting, boilerplate paragraphs are paragraphs with low quality, but there might be other, non-boilerplate paragraphs with low quality as well. !CleanerEval was proposed by the organisers of WAC-XI during the final discussion of WAC-X, where the proposal was met with great interest. The WAC-XI panel discussion is intended to serve as a platform for the development of the operationalisation of the notions of paragraph and document quality, the an­notation guidelines, and the final schedule for the shared task. There can be no doubt that corpus lin­guists should define what counts as good corpus material and what does not. It would be misguided to threat this ques­tion as a purely technical one. The final meeting of the shared task is planned for to be part of WAC-XII in 2018.
    68 
    6957
    7058== Programme committee ==