[[PageOutline]] = 11th Web as Corpus Workshop (WAC-XI) = at [http://www.birmingham.ac.uk/research/activity/corpus/events/2017/cl2017/index.aspx Corpus Linguistics 2017, Birmingham][[br]] featuring the First !CleanerEval Shared Task panel discussion Endorsed by the Special Interest Group of the ACL on Web as Corpus (SIGWAC) Contact: `wacxi2017@gmail.com` === Organizers === * Adrien Barbaresi (BBAW Berlin/ÖAW Wien) * Felix Bildhauer (IDS Mannheim) * [http://rolandschaefer.net Roland Schäfer (Freie Universität Berlin)] == Main workshop == tba == Panel discussion: the !CleanerEval shared task == #cleanereval As part of the workshop, we plan to organize a panel discussion as the first meeting of the !CleanerEval shared task on combined paragraph and document quality detection for (web) documents. The !CleanerEval shared task follows the successful !CleanEval shared task organized by SIGWAC in 2006. While !CleanEval focussed specifically on so-called boilerplate removal, !CleanerEval goes beyond this and asks for systems that determine the linguistic quality of paragraphs and whole documents in an automatic fashion, such that corpus designers can decide whether to include them in their corpus or not. In the !CleanerEval setting, boilerplate paragraphs are paragraphs with low quality, but there might be other, non-boilerplate paragraphs with low quality as well. !CleanerEval was proposed by the organizers of WAC-XI during the final discussion of WAC-X, where the proposal was met with enthusiasm. The WAC-XI panel discussion is intended to serve as a platform for the development of the operationalization of the notions of paragraph and document quality, the annotation guidelines, and the final schedule for the shared task. The final meeting of the shared task is planned for to be part of WAC-XII in 2018. == Program committee == Confirmed reviewers so far: * Masayuki Asahara, National Institute for Japanese Language and Linguistics * Silvia Bernardini, University of Bologna * Niels Brügger, University of Aarhus * Cédrick Fairon, UC Louvain * William H. Fletcher, U.S. Naval Academy * Jack Grieve, Aston University * Aurelie Herbelot, University of Trento * Miloš Jakubíček, Masaryk University Brno * Iztok Kosem, Trojina, Institute for Applied Slovene Studies * Steffen Remus, TU Darmstadt * Antonio Ruiz Tinoco, Sophia University * Kevin Scannell, Saint Louis University * Serge Sharoff, University of Leeds * Sabine Schulte im Walde, IMS Stuttgart * Klaus Schulz, LMU München * Egon Stemle, EURAC Bozen / Bolzano * Peter Uhrig, FAU Erlangen * Marieke van Erp, VU Amsterdam * Wajdi Zaghouani, CMU, Qatar * Amir Zeldes, Georgetown University, Wahsington * Arne Zeschel, Institut für Deutsche Sprache, Mannheim == Details == === Important dates ===#dates Workshop day: between 24 and 27 July 2017 === Call for papers === #cfp tba === Submission website === tba === Submission format === tba