44 | | |
45 | | === Submission format === |
46 | | |
47 | | We call for '''anonymous''' extended abstracts of up to 1,500 words length (excluding references, tables, and figures). |
48 | | Submissions must be in PDF format. Authors of accepted papers will receive minimal formatting instructions for the publication of the abstracts on the WAC-XI website in due time. |
49 | | There will be no proceedings volume, but a successful workshop might lead to a special issue/edited volume on web (and similar) data in linguistics (with a new round of peer reviewing), for which a separate call for (full) papers would be published after the workshop. |
50 | | |
51 | | === Submission website === |
52 | | |
53 | | Please use our [https://easychair.org/conferences/?conf=wac11 EasyChair] installation exclusively. |
54 | | |
55 | | === Important dates ===#dates |
56 | | |
57 | | * 16 February 2017: First call for workshop papers |
58 | | * 30 March 2017: Second call for workshop papers |
59 | | * 24 April 2017: Abstract due date (23:59 GMT) |
60 | | * 12 June 2017: Notification of acceptance |
61 | | * 24 July 2017: Workshop day |
62 | | |
63 | | |
64 | | |
65 | | == !CleanerEval first panel discussion == |
66 | | |
67 | | As part of the workshop and consistent with its general theme, we plan to organise a panel discussion as the first meeting of the !CleanerEval shared task on combined paragraph and document quality detection for (web) documents. The !CleanerEval shared task follows the successful !CleanEval shared task organised by SIGWAC in 2006. While !CleanEval focused specifically on boilerplate removal (the removal of automatically inserted and frequently repeated non-corpus material from web pages), !CleanerEval goes beyond this basic task. Participating systems should be able to determine the linguistic quality of paragraphs and whole documents in an automatic fashion, such that corpus designers and/or users can decide whether to include them in their corpus or not. In the !CleanerEval setting, boilerplate paragraphs are paragraphs with low quality, but there might be other, non-boilerplate paragraphs with low quality as well. !CleanerEval was proposed by the organisers of WAC-XI during the final discussion of WAC-X, where the proposal was met with great interest. The WAC-XI panel discussion is intended to serve as a platform for the development of the operationalisation of the notions of paragraph and document quality, the annotation guidelines, and the final schedule for the shared task. There can be no doubt that corpus linguists should define what counts as good corpus material and what does not. It would be misguided to threat this question as a purely technical one. The final meeting of the shared task is planned for to be part of WAC-XII in 2018. |
68 | | |