Changes between Version 1 and Version 2 of WAC-XII


Ignore:
Timestamp:
12/05/19 03:16:22 (4 years ago)
Author:
Roland Schäfer
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WAC-XII

    v1 v2  
    88== Important dates ==
    99
    10   * submission deadline: Sunday, 16 February 2020, 24:00 GMT-12
    11   * notification of acceptance: Friday, 13 March 2020 22:00 GMT+1
     10  * submission deadline: Sunday, 16 February 2020 at 24:00 GMT-12
     11  * notification of acceptance: Friday, 13 March 2020 at 22:00 GMT+1
     12  * camera-ready manuscript due date: Friday, 27 March 2020 at 24:00 GMT-12
    1213  * workshop date: afternoon session of Saturday, 16 May 2020
    1314
     
    2324For almost fifteen years, the ACL SIGWAC, and most notably the Web as Corpus (WAC) workshops, have served as a platform for researchers interested in the compilation, pro-cessing and use of web-derived corpora as well as computer-mediated communication. Past workshops were co-located with major conferences on corpus linguistics and/or computa-tional linguistics (such as ACL, EACL, Corpus Linguistics, LREC, NAACL, WWW).
    2425
    25 In corpus/theoretical linguistics, the World Wide Web has become increasingly popular as a source of linguistic evidence, especially in the face of data sparseness or the lack of varia-tion in traditional corpora of written language. In lexicography, web data have become a major and well-established resource with dedicated research data and specialised tools such as the SketchEngine. In other areas of theoretical linguistics, the adoption rate of web corpora has been slower but steady. Furthermore, some completely new areas of linguistic research dealing exclusively with web (or similar) data have emerged, such as the construc-tion and utilisation of corpora based on short messages. Another example is the (manual or automatic) classification of web texts by genre, register, or – more generally speaking – “text type”, as well as topic area. In computational linguistics, web corpora have become an established source of data for the creation of language models, word embeddings, and for all types of machine learning.
     26In corpus/theoretical linguistics, the World Wide Web has become increasingly popular as a source of linguistic evidence, especially in the face of data sparseness or the lack of varia-tion in traditional corpora of written language. In lexicography, web data have become a major and well-established resource with dedicated research data and commercially available tools. In other areas of theoretical linguistics, the adoption rate of web corpora has been slower but steady. Furthermore, some completely new areas of linguistic research dealing exclusively with web (or similar) data have emerged, such as the construc-tion and utilisation of corpora based on short messages. Another example is the (manual or automatic) classification of web texts by genre, register, or – more generally speaking – “text type”, as well as topic area. In computational linguistics, web corpora have become an established source of data for the creation of language models, word embeddings, and for all types of machine learning.
    2627
    2728The twelfth Web as Corpus workshop (WAC-XII) looks at the past, present, and future of web corpora given the fact that large web corpora are nowadays provided mostly by a few major initiatives and/or companies, and the diversity of the early years appears to have fad-ed slightly. Also, we acknowledge the fact that alternative sources of data (such as data from Twitter and similar platforms) have emerged, some of them only available to large companies and their affiliates, such as linguistic data from social media and other forms of the deep web. At the same time, gathering interesting and/or relevant web data (web crawling) is becoming an ever more intricate task as the nature of the data offered on the web changes (for example the death of forums in favour of more closed platforms).