Version 71 (modified by 16 months ago) ( diff ) | ,
---|
ACL SIGWAC home page
The Special Interest Group of the Association for Computational Linguistics (ACL) on Web as Corpus.
Join the SIG by signing up to the mailing list!
The Special Interest Group on Web as Corpus aims to research the opportunities and limitations of using textual web data for
- performing linguistic research
- modelling knowledge of language
- modelling extralinguistic knowledge
Objectives
- To build a community around the web-as-corpus research
- To support and promote information exchange and the dissemination of results and best practices
- To organize workshops, hackathons and shared tasks
Download the constitution of ACL SIGWAC.
Officers
- Nikola Ljubešić (co-president)
- Benoît Sagot (co-president)
- Veronika Laippala (co-secretary)
- Pedro Ortiz Suarez (co-secretary)
Resources
Corpora
- CommonCrawl
- OSCAR
- ParaCrawl
- MaCoCu
- CLASSLA South Slavic web corpora
- Aranea web corpora
- CLARIN.SI web corpora
- University of Leeds (CTS) web corpora
- Web corpora on Sketchengine (commercial product)
- WaCKy corpora
Technologies
- A Masaryk University and Lexical Computing list of tools for harvesting and processing web data
- The XGENRE multilingual text genre classifier
- Massively Multilingual Modeling of Web Registers by TurkuNLP
Additional information
- Schäfer and Bildhauer's web corpus book
- Stephanie Evert's WAC website
- CLEANEVAL, a competition for cleaning webpages
Meetings
- WAC-XII at LREC 2020, Marseille, France, 16 May 2020… CANCELLED due to Covid-19 outbreak but proceedings have been published!
- WAC-XI at Corpus Linguistics 2017, Birmingham, UK, 24-27 July 2017
- WAC-X at ACL 2016, Berlin, Germany, 12 August 2016
- WAC@eLex2015, In 2015 we will meet at eLex, Herstmonceux Castle, UK, 10 August 2015
- WAC9, at EACL 2014, Gothenburg, Sweden, 26-27 April 2014
- WAC8, at Corpus Linguistics 2013, Lancaster, UK, 22 July 2013
- WAC7, at WWW12, Lyon, France, 17 April 2012
- BUCC, Building and Using Comparable Corpora, Portland, Oregon, 24 June 2011, In 2011 we will meet at the BUCC workshop at ACL2011
- WAC6, at NAACL-HLT, Los Angeles, USA, 5 June 2010: programme here
- WAC5, at SPLN, San Sebastian, Basque Country, Spain, 7 September 2009
- WAC4 at LREC, Marrakech, Morocco, 1 June 2008
- WAC3, Louvain-la-Neuve, Belgium, 15-16 September 2007
- WAC2, at EACL, Trento, Italy, April 2006
- WAC1, at Corpus Linguistics conference, Birmingham, UK, July 2005
ACL SIGWAC annual reports
Attachments (2)
-
constitution.txt
(1.7 KB
) - added by 17 years ago.
Constitution of the SIGWAC
- at.gif (884 bytes ) - added by 16 years ago.
Download all attachments as: .zip
Note:
See TracWiki
for help on using the wiki.