wiki:WAC8

Context Navigation

Version 31 (modified by egon w. stemle, 13 years ago) ( diff )
add: accepted papers

8th Web as Corpus Workshop (WAC-8) @ Corpus Linguistics 2013

Monday, 22 July 2013 (Lancaster, UK)

Endorsed by ACL SIGWAC.

Web corpora and other Web-derived data have become a gold mine for corpus linguistics and natural language processing. The Web is an easy source of unprecedented amounts of linguistic data from a broad range of registers and text types. However, a collection of Web pages is not immediately suitable for exploration in the same way a traditional corpus is.

Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005 Conference, a highly successful series of yearly Web as Corpus workshops provides a venue for interested researchers to meet, share ideas and discuss the problems and possibilities of compiling and using Web corpora. After a stronger focus on application-oriented natural language processing and Web technology in recent years – with workshops taking place at NAACL-HLT 2010, 2011 and WWW 2012 – the 8th Web as Corpus Workshop returns to its roots in the corpus linguistics community.

Accordingly, the leading theme of this workshop is the application of Web data in language research, including linguistic evaluation of Web-derived corpora as well as strategies and tools for high-quality automatic annotation of Web text. We invite papers on all aspects of building and using Web corpora, with a particular focus on (but not limited to) the following:

applications of Web corpora and other Web-derived data sets for language research
automatic linguistic annotation of Web data such as tokenisation, part-of-speech tagging, lemmatisation and semantic tagging
(the accuracy of currently available off-the-shelf tools is still unsatisfactory for many types of Web data)
critical exploration of the characteristics of Web data from a linguistic perspective and its applicability to language research
presentation of Web corpus collection projects or software tools required for some part of this process (crawling, filtering, de-duplication, language identification, indexing, ...)

Important dates

~~March 3~~ March 7: Submission of extended abstract to be made through EasyChair (closed)
~~March 17~~ ~~March 23~~ March 27: Notification of acceptance
June 23: Submission of full paper
July 22: Workshop

Accepted Papers

(Abstracts)

Andrew Brindle	Thug breaks man's jaw: A Corpus Analysis of Responses to Interpersonal Street Violence
Jesse Egbert and Douglas Biber	Developing a User-based Method of Web Register Classification
Adam Kilgarriff and Vít Suchomel	Web Spam
Sarah Schulz, Verena Lyding and Lionel Nicolas	STirWaC - Compiling a diverse corpus based on texts from the web for South Tyrolean German
Silke Scheible and Sabine Schulte Im Walde	A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus
Alexander Piperski, Vladimir Belikov, Nikolay Kopylov, Vladimir Selegey and Serge Sharoff	Big and diverse is beautiful: A large corpus of Russian to study linguistic variation
Adriano Ferraresi and Silvia Bernardini	The academic Web-as-Corpus
Akshay Minocha, Siva Reddy and Adam Kilgarriff	Feed Corpus : An Ever Growing Up-to-date Corpus
Stephen Wattam, Paul Rayson and Damon Berridge	LWAC: Longitudinal Web-as-Corpus Sampling
Roland Schäfer, Adrien Barbaresi and Felix Bildhauer	The Good, the Bad, and the Hazy: Design Decisions in Web Corpus Construction
David Lutz, Parry Cadwallader and Mats Rooth	A web application for filtering and annotating web speech data
Colleen Crangle	A web-based model of semantic relatedness and the analysis of electroencephalographic (EEG) data

(preliminary) Programme

9.00 - 11:00	Session 1
11:00 - 11:30	Tea Break
11:30 - 13:00	Session 2
13:00 - 14:00	Lunch
14:00 - 15:30	Session 3
15:30 - 16:00	Tea Break
16:00 - 18:00	Session 4
18:00	Pub
19:00	Dinner

Submission Information

Authors are invited to submit extended abstracts on original, unpublished work in the topic area of this workshop. Contributions must be submitted in PDF format and should not exceed two (2) pages, including references. Submissions should be formatted using the format of the ACL 2013 proceedings.

LaTeX	MS Word
acl2013.tex	acl2013.doc
acl2013.sty	acl2013.pdf
acl2013.pdf	acl2013.dot
acl.bst

Authors of those papers that are accepted will be invited to submit full papers (up to eight pages) before the workshop itself and these will appear in an online proceedings.

Organising committee

Stefan Evert, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Egon Stemle, European Academy of Bozen/Bolzano (EURAC)
Paul Rayson, Lancaster University

Programme committee

Organising committee plus:

Silvia Bernardini, U of Bologna, Italy
Paul Cook, U of Melbourne, Australia
Cédrick Fairon, UCLouvain, Belgium
William H. Fletcher, U.S. Naval Academy, USA
Sebastian Hoffmann, U Trier, Germany
Adam Kilgarriff. Lexical Computing Ltd, UK
Preslav Nakov, QCRI, Qatar Foundation
Reinhard Rapp, U Aix-Marseille, France & U Mainz, Germany
Serge Sharoff, U of Leeds, UK
Stephen Wattam, Lancaster U, UK
Eros Zanchetta, U of Bologna, Italy
Pierre Zweigenbaum, LIMSI, France

Attachments (17)

acl2013.dot (33.0 KB ) - added by egon w. stemle 13 years ago.
acl2013.doc (65.5 KB ) - added by egon w. stemle 13 years ago.
acl2013.tex (16.6 KB ) - added by egon w. stemle 13 years ago.
acl2013.sty (14.6 KB ) - added by egon w. stemle 13 years ago.
acl.bst (24.5 KB ) - added by egon w. stemle 13 years ago.
acl2013.latex.pdf (83.8 KB ) - added by egon w. stemle 13 years ago.
acl2013.msword.pdf (83.8 KB ) - added by egon w. stemle 13 years ago.
wac8-proceedings.pdf (7.8 MB ) - added by egon w. stemle 13 years ago. wac8 proceedings (2013071901)
wac8-proceedings.tex (7.6 KB ) - added by egon w. stemle 13 years ago.
talk01.pptx (1.4 MB ) - added by egon w. stemle 13 years ago.
talk02.pdf (614.6 KB ) - added by egon w. stemle 13 years ago.
talk06.pdf (413.7 KB ) - added by egon w. stemle 13 years ago.
talk08.pptx (144.0 KB ) - added by egon w. stemle 13 years ago.
talk04.pdf (462.1 KB ) - added by egon w. stemle 13 years ago.
talk03.pdf (6.9 MB ) - added by egon w. stemle 13 years ago.
talk09.pdf (322.2 KB ) - added by egon w. stemle 13 years ago.
talk07.pdf (1.6 MB ) - added by egon w. stemle 13 years ago.

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text

ACL SIGWAC