| 2 | |
| 3 | We are happy to announce that WAC-X will be co-located with ACL 2016 in Berlin. More information and a call for papers will be published in due time. |
| 4 | |
| 5 | == Details == |
| 6 | |
| 7 | === Organizers === |
| 8 | |
| 9 | [http://cs.unb.ca/~ccook1/ Paul Cook (University of New Brunswick)] |
| 10 | [http://www.stefan-evert.de/ Stefan Evert (Friedrich-Alexander Universität Erlangen-Nürnberg)] |
| 11 | [http://hpsg.fu-berlin.de/~rsling Roland Schäfer (Freie Universität Berlin)] |
| 12 | [http://iiegn.eu/work Egon Stemle (European Academy of Bozen/Bolzano)] |
| 13 | |
| 14 | |
| 15 | === Program committee (preliminary) === |
| 16 | |
| 17 | The workshop organizers plus: |
| 18 | |
| 19 | * Adrien Barbaresi, ÖAW (AT) |
| 20 | * Silvia Bernardini, University of Bologna (IT) |
| 21 | * Douglas Biber, Northern Arizona University (US) |
| 22 | * Felix Bildhauer, Institut für Deutsche Sprache Mannheim (DE) |
| 23 | * Katrien Depuydt, INL, Leiden (NL) |
| 24 | * Jesse de Does, INL, Leiden (NL) |
| 25 | * Cédrick Fairon, UC Louvain (BE) |
| 26 | * William H. Fletcher, U.S. Naval Academy (US) |
| 27 | * Iztok Kosem, Trojina, Institute for Applied Slovene Studies (SI) |
| 28 | * Simon Krek, Jožef Stefan Institute (SI) |
| 29 | * Lothar Lemnitzer, BBAW (DE) |
| 30 | * Nikola Ljubešić, Sveučilišta u Zagrebu (HR) |
| 31 | * Siva Reddy, University of Edinburgh (UK) |
| 32 | * Steffen Remus, TU Darmstadt (DE) |
| 33 | * Pavel Rychly, Masaryk University (CZ) |
| 34 | * Kevin Scannell, Saint Louis University (US) |
| 35 | * Serge Sharoff, University of Leeds (UK) |
| 36 | * Klaus Schulz, LMU München (DE) |
| 37 | * Kay-Michael Würzner, BBAW (DE) |
| 38 | * Torsten Zesch, University of Duisburg-Essen (DE) |
| 39 | * Pierre Zweigenbaum, LIMSI (FR) |
| 40 | |
35 | | The EmpiriST 2015 shared task aims to encourage the developers of NLP applications to adapt their tools and resources to the processing of German discourse in genres of computer-mediated communication (CMC), including both dialogical (chat, SMS, social networks, etc.) and monological (web pages, blogs, etc.) texts. Since there has been relatively little work in this area for German so far, the shared task focuses on tokenization and part-of-speech tagging as the core annotation steps required by virtually all NLP applications. While we have a particular interest in robust tools that can be applied to dialogical CMC and web corpora alike, participants are allowed to use different systems for the two subsets or submit results for one subset only. |
| 74 | The [https://sites.google.com/site/empirist2015/ EmpiriST 2015 shared task] aims to encourage the developers of NLP applications to adapt their tools and resources to the processing of German discourse in genres of computer-mediated communication (CMC), including both dialogical (chat, SMS, social networks, etc.) and monological (web pages, blogs, etc.) texts. Since there has been relatively little work in this area for German so far, the shared task focuses on tokenization and part-of-speech tagging as the core annotation steps required by virtually all NLP applications. While we have a particular interest in robust tools that can be applied to dialogical CMC and web corpora alike, participants are allowed to use different systems for the two subsets or submit results for one subset only. |
| 78 | |
| 79 | == Panel discussion "Corpora, open science, and copyright reforms" == |
| 80 | |
| 81 | As part of the 10th Web as Corpus workshop (WAC-X), a panel discussion will be organized. Web corpus designers are probably those who are most affected by issues and uncertainties of copyright legislation and intellectual property rights, especially in the EU. While in some countries, such as the U.S., a Fair Use doctrine allows the use of data for non-commercial research purposes, the situation in Europe is more problematic. For example, German copyright law ("Urheberrecht") requires that any re-use of a work which reaches a certain threshold of creativity be explicitly approved by the author. This poses numerous problems for any corpus creator, but it is completely infeasible for large web corpora containing texts written by millions of different authors. Thus, corpora are re-distributed in crippled form as sentence shuffles (e.g. COW and the Leipzig Corpora Collection), and it is not even clear whether there really is a reliable legal exemption for single sentences. In the famous Infopaq case, a Danish court decided that even snippets of 11 words might be protected under EU copyright laws (http://bit.ly/1GYTDjR). |
| 82 | This situation is highly undesirable. Large web corpora have been shown to be indispensable for many tasks in computational linguistics, in the documentation of standard and non-standard language, and in empirically oriented theoretical linguistics. |
| 83 | Reports written by legal experts – such as the one recently commissioned by the German Research Council (http://bit.ly/1PG4Gq6) – only provide an interpretation of the given legal situation. Only active lobbying in favor of a reasonable copyright reform will eventually bring about the necessary changes such that researchers can build corpus resources and share them freely for academic purposes. Therefore, the goal of this panel discussion is to bring together corpus creators, active users of web corpora, and open science activists in order to share and discuss views on the copyright problem as a political rather than a legal problem. Ideally, a first draft of a joint declaration might come out of this discussion. With such a declaration, the (web) corpus community could make sure that its voice is heard, especially in the ongoing discussion about reforms of the European copyright legislation. |
| 84 | |