WAC8: wac8-proceedings.tex

File wac8-proceedings.tex, 7.6 KB (added by egon w. stemle, 5 years ago)
Line 
1\newcommand{\thetitle}{Proceedings of the 8th Web as Corpus Workshop (WAC-8)
2@Corpus Linguistics 2013}
3\newcommand{\authora}{Stefan Evert}
4\newcommand{\authorb}{Egon Stemle}
5\newcommand{\authorc}{Paul Rayson}
6\newcommand{\theauthors}{\authora, \authorb, \authorc}
7% init geometry with these values to have them when fancyhdr loads
8\PassOptionsToPackage{%
9    twoside=false,
10    top=1cm,
11    bottom=1cm,
12    left=2.5cm,
13    right=2.5cm,
14    includeheadfoot}
15{geometry}
16\PassOptionsToPackage{%
17    pdftitle={\thetitle},
18    pdfauthor={\theauthors},
19    pdfsubject={},
20    pdfkeywords={},
21    colorlinks=true,
22    linkcolor=blue,
23    bookmarkstype=pdf
24}
25{hyperref}
26
27% use the easychair style
28\documentclass[a4paper, onesided]{easychair}
29
30% This provides the \BibTeX macro
31\usepackage{doc}
32\usepackage{makeidx}
33
34% allow for inclusion of pdf documents
35\usepackage{pdfpages}
36
37%\makeindex
38
39% from toc.tex
40\usepackage{titletoc}
41\titlecontents{subsubsection}[2pt]{\addvspace{10pt}\bfseries\titlerule[0.5pt]\filright}{}{}{}[]
42\titlecontents{section}[0pt]{\addvspace{5pt}\filright}{}{}{\dotfill\contentspage}[]
43\titlecontents{subsection}[10pt]{\addvspace{1pt}\itshape\filright}{}{}{}[]
44\newcommand{\tocSection}[1]{\contentsline{subsubsection}{#1\\*\titlerule[0.5pt]\vspace{-9pt plus 2pt minus 2pt}}{}{}\nopagebreak[4]}
45\newcommand{\tocTitle}[2]{\contentsline{section}{#1}{#2}{}\nopagebreak[4]}
46\newcommand{\tocAuthors}[1]{\contentsline{subsection}{#1}{}{}}
47
48\DeclareRobustCommand{\insertpdf}[4]{
49\phantomsection 
50\addcontentsline{pdf}{section}{#4}
51\addcontentsline{toc}{section}{#3}
52\addcontentsline{toc}{subsection}{#2}
53\fancyhead[LO,LE]{#2}
54\fancyhead[RO,RE]{#4}
55\includepdf[pagecommand={\thispagestyle{plain}}, pages=1]{#1}
56\includepdf[pagecommand={\thispagestyle{fancy}}, pages=2-]{#1}
57}
58
59%% Document
60%%
61\begin{document}
62
63%% Front Matter
64%%
65\pagenumbering{roman}
66\title{\thetitle}
67
68% Authors are joined by \and. Their affiliations are given by \inst, which indexes
69% into the list defined using \institute
70%
71\author{\authora\inst{1} \and \authorb\inst{2} \and \authorc\inst{3}}
72
73% Institutes for affiliations are also joined by \and,
74\institute{
75    Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),
76    Erlangen, Germany\\
77    %\email{mokhov@cse.concordia.ca}
78\and
79    European Academy of Bozen/Bolzano (EURAC),
80    Bolzano (BZ), Italy\\
81    %\email{geoff@cs.miami.edu}\\
82\and
83    Lancaster University,
84    Lancaster, U.K.\\
85    %\email{andrei@voronkov.com, graham@cs.man.ac.uk}\\
86}
87
88\fancyfoot[LO,LE]
89{S.Evert, E.Stemle, P.Rayson (eds.)}
90\fancyfoot[CO,CE]
91{WAC-8, 2013}
92\fancyfoot[RO,RE]
93{\thepage}
94
95\fancypagestyle{plain}{%
96\fancyhf{} % clear all header and footer fields
97\fancyfoot[R]{{\normalsize\thepage}}
98\renewcommand{\headrulewidth}{0pt}
99\renewcommand{\footrulewidth}{0pt}}
100
101% fine lines above footer and below header
102\renewcommand{\headrulewidth}{0.4pt}\renewcommand{\footrulewidth}{0.4pt}
103
104\clearpage
105%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
106\maketitle
107%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
108\thispagestyle{empty}
109Web corpora and other Web-derived data have become a gold mine for corpus
110linguistics and natural language processing. The Web is an easy source of
111unprecedented amounts of linguistic data from a broad range of registers and
112text types. However, a collection of Web pages is not immediately suitable for
113exploration in the same way a traditional corpus is.
114
115Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005
116Conference, a highly successful series of yearly Web as Corpus workshops
117provides a venue for interested researchers to meet, share ideas and discuss
118the problems and possibilities of compiling and using Web corpora. After a
119stronger focus on application-oriented natural language processing and Web
120technology in recent years – with workshops taking place at NAACL-HLT 2010,
1212011 and WWW 2012 – the 8th Web as Corpus Workshop returns to its roots in the
122corpus linguistics community.
123
124Accordingly, the leading theme of this workshop is the application of Web data
125in language research, including linguistic evaluation of Web-derived corpora as
126well as strategies and tools for high-quality automatic annotation of Web text.
127The workshop brings together presentations on all aspects of building, using
128and evaluating Web corpora, with a particular focus on the following topics:
129
130\begin{itemize}
131    \item applications of Web corpora and other Web-derived data sets for
132        language research
133    \item automatic linguistic annotation of Web data such as tokenisation,
134        part-of-speech tagging, lemmatisation and semantic tagging
135    \item (the accuracy of currently available off-the-shelf tools is still
136        unsatisfactory for many types of Web data)
137    \item critical exploration of the characteristics of Web data from a
138        linguistic perspective and its applicability to language research
139    \item presentation of Web corpus collection projects or software tools
140        required for some part of this process (crawling, filtering,
141        de-duplication, language identification, indexing, ...)
142\end{itemize}
143
144
145\clearpage
146\renewcommand\contentsname{Table of Contents}
147\addcontentsline{pdf}{section}{Table of Contents}
148\tableofcontents
149\thispagestyle{plain}
150\clearpage
151
152%% main matter
153%%
154\thispagestyle{fancy}
155\pagenumbering{arabic}
156% paper_9.pdf paper_10.pdf paper_11.pdf paper_2.pdf paper_3.pdf paper_13.pdf paper_5.pdf paper_7.pdf paper_8.pdf paper_6.pdf paper_1.pdf paper_14.pdf
157
158\insertpdf{paper_9.pdf}{A.Minocha, S.Reddy, A.Kilgarriff}{Feed Corpus : An Ever
159Growing Up-to-date Corpus}{Feed Corpus}
160
161\insertpdf{paper_10.pdf}{S.Wattam, P.Rayson, D.Berridge}{LWAC: Longitudinal
162Web-as-Corpus Sampling}{LWAC}
163
164\insertpdf{paper_11.pdf}{R.Sch\"afer, A.Barbaresi, F.Bildhauer}{The Good, the
165Bad, and the Hazy: Design Decisions in Web Corpus Construction}{The Good, the
166Bad, and the Hazy}
167
168\insertpdf{paper_2.pdf}{J.Egbert, D.Biber}{Developing a User-based Method of
169Web Register Classification}{Developing a User-based Method of Web Register
170Classification}
171
172\insertpdf{paper_7-mod.pdf}{A.Piperski, V.Belikov, N.Kopylov, E.Morozov,
173V.Selegey, S.Sharoff}{Big and diverse is beautiful: A large corpus of Russian
174to study linguistic variation}{Big and diverse is beautiful}
175
176\insertpdf{paper_13.pdf}{D.Lutz, P.Cadwallader, M.Rooth}{A web application for
177filtering and annotating web speech data}{Web application for filtering and
178annotating web speech data}
179
180\insertpdf{paper_5.pdf}{S.Schulz, V.Lyding, L.Nicolas}{STirWaC - Compiling a
181diverse corpus based on texts from the web for South Tyrolean German}{STirWaC}
182
183\insertpdf{paper_3.pdf}{A.Kilgarriff, V.Suchomel}{Web Spam}{Web Spam}
184
185\insertpdf{paper_8.pdf}{A.Ferraresi, S.Bernardini}{The academic
186Web-as-Corpus}{Academic Web-as-Corpus}
187
188\insertpdf{paper_6.pdf}{S.Scheible, S.Schulte Im Walde, M.Weller, M.Kisselew}{A
189Compact but Linguistically Detailed Database for German Verb Subcategorisation
190relying on Dependency Parses from Web Corpora: Tool, Guidelines and
191Resource}{Database for German Verb Subcategorisation}
192
193\insertpdf{paper_1.pdf}{A.Brindle}{Thug breaks man's jaw: A Corpus Analysis of
194Responses to Interpersonal Street Violence}{Thug breaks man's jaw}
195
196\insertpdf{paper_14-mod.pdf}{C.Crangle}{A web-based model of semantic
197relatedness and the analysis of electroencephalographic (EEG) data}{Web-based
198model of semantic relatedness and the analysis of EEG data}
199
200%\insertpdf{}{}{}{}
201
202%------------------------------------------------------------------------------
203\end{document}
204
205% EOF