| 1 | \newcommand{\thetitle}{Proceedings of the 8th Web as Corpus Workshop (WAC-8)
|
|---|
| 2 | @Corpus Linguistics 2013}
|
|---|
| 3 | \newcommand{\authora}{Stefan Evert}
|
|---|
| 4 | \newcommand{\authorb}{Egon Stemle}
|
|---|
| 5 | \newcommand{\authorc}{Paul Rayson}
|
|---|
| 6 | \newcommand{\theauthors}{\authora, \authorb, \authorc}
|
|---|
| 7 | % init geometry with these values to have them when fancyhdr loads
|
|---|
| 8 | \PassOptionsToPackage{%
|
|---|
| 9 | twoside=false,
|
|---|
| 10 | top=1cm,
|
|---|
| 11 | bottom=1cm,
|
|---|
| 12 | left=2.5cm,
|
|---|
| 13 | right=2.5cm,
|
|---|
| 14 | includeheadfoot}
|
|---|
| 15 | {geometry}
|
|---|
| 16 | \PassOptionsToPackage{%
|
|---|
| 17 | pdftitle={\thetitle},
|
|---|
| 18 | pdfauthor={\theauthors},
|
|---|
| 19 | pdfsubject={},
|
|---|
| 20 | pdfkeywords={},
|
|---|
| 21 | colorlinks=true,
|
|---|
| 22 | linkcolor=blue,
|
|---|
| 23 | bookmarkstype=pdf
|
|---|
| 24 | }
|
|---|
| 25 | {hyperref}
|
|---|
| 26 |
|
|---|
| 27 | % use the easychair style
|
|---|
| 28 | \documentclass[a4paper, onesided]{easychair}
|
|---|
| 29 |
|
|---|
| 30 | % This provides the \BibTeX macro
|
|---|
| 31 | \usepackage{doc}
|
|---|
| 32 | \usepackage{makeidx}
|
|---|
| 33 |
|
|---|
| 34 | % allow for inclusion of pdf documents
|
|---|
| 35 | \usepackage{pdfpages}
|
|---|
| 36 |
|
|---|
| 37 | %\makeindex
|
|---|
| 38 |
|
|---|
| 39 | % from toc.tex
|
|---|
| 40 | \usepackage{titletoc}
|
|---|
| 41 | \titlecontents{subsubsection}[2pt]{\addvspace{10pt}\bfseries\titlerule[0.5pt]\filright}{}{}{}[]
|
|---|
| 42 | \titlecontents{section}[0pt]{\addvspace{5pt}\filright}{}{}{\dotfill\contentspage}[]
|
|---|
| 43 | \titlecontents{subsection}[10pt]{\addvspace{1pt}\itshape\filright}{}{}{}[]
|
|---|
| 44 | \newcommand{\tocSection}[1]{\contentsline{subsubsection}{#1\\*\titlerule[0.5pt]\vspace{-9pt plus 2pt minus 2pt}}{}{}\nopagebreak[4]}
|
|---|
| 45 | \newcommand{\tocTitle}[2]{\contentsline{section}{#1}{#2}{}\nopagebreak[4]}
|
|---|
| 46 | \newcommand{\tocAuthors}[1]{\contentsline{subsection}{#1}{}{}}
|
|---|
| 47 |
|
|---|
| 48 | \DeclareRobustCommand{\insertpdf}[4]{
|
|---|
| 49 | \phantomsection
|
|---|
| 50 | \addcontentsline{pdf}{section}{#4}
|
|---|
| 51 | \addcontentsline{toc}{section}{#3}
|
|---|
| 52 | \addcontentsline{toc}{subsection}{#2}
|
|---|
| 53 | \fancyhead[LO,LE]{#2}
|
|---|
| 54 | \fancyhead[RO,RE]{#4}
|
|---|
| 55 | \includepdf[pagecommand={\thispagestyle{plain}}, pages=1]{#1}
|
|---|
| 56 | \includepdf[pagecommand={\thispagestyle{fancy}}, pages=2-]{#1}
|
|---|
| 57 | }
|
|---|
| 58 |
|
|---|
| 59 | %% Document
|
|---|
| 60 | %%
|
|---|
| 61 | \begin{document}
|
|---|
| 62 |
|
|---|
| 63 | %% Front Matter
|
|---|
| 64 | %%
|
|---|
| 65 | \pagenumbering{roman}
|
|---|
| 66 | \title{\thetitle}
|
|---|
| 67 |
|
|---|
| 68 | % Authors are joined by \and. Their affiliations are given by \inst, which indexes
|
|---|
| 69 | % into the list defined using \institute
|
|---|
| 70 | %
|
|---|
| 71 | \author{\authora\inst{1} \and \authorb\inst{2} \and \authorc\inst{3}}
|
|---|
| 72 |
|
|---|
| 73 | % Institutes for affiliations are also joined by \and,
|
|---|
| 74 | \institute{
|
|---|
| 75 | Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),
|
|---|
| 76 | Erlangen, Germany\\
|
|---|
| 77 | %\email{mokhov@cse.concordia.ca}
|
|---|
| 78 | \and
|
|---|
| 79 | European Academy of Bozen/Bolzano (EURAC),
|
|---|
| 80 | Bolzano (BZ), Italy\\
|
|---|
| 81 | %\email{geoff@cs.miami.edu}\\
|
|---|
| 82 | \and
|
|---|
| 83 | Lancaster University,
|
|---|
| 84 | Lancaster, U.K.\\
|
|---|
| 85 | %\email{andrei@voronkov.com, graham@cs.man.ac.uk}\\
|
|---|
| 86 | }
|
|---|
| 87 |
|
|---|
| 88 | \fancyfoot[LO,LE]
|
|---|
| 89 | {S.Evert, E.Stemle, P.Rayson (eds.)}
|
|---|
| 90 | \fancyfoot[CO,CE]
|
|---|
| 91 | {WAC-8, 2013}
|
|---|
| 92 | \fancyfoot[RO,RE]
|
|---|
| 93 | {\thepage}
|
|---|
| 94 |
|
|---|
| 95 | \fancypagestyle{plain}{%
|
|---|
| 96 | \fancyhf{} % clear all header and footer fields
|
|---|
| 97 | \fancyfoot[R]{{\normalsize\thepage}}
|
|---|
| 98 | \renewcommand{\headrulewidth}{0pt}
|
|---|
| 99 | \renewcommand{\footrulewidth}{0pt}}
|
|---|
| 100 |
|
|---|
| 101 | % fine lines above footer and below header
|
|---|
| 102 | \renewcommand{\headrulewidth}{0.4pt}\renewcommand{\footrulewidth}{0.4pt}
|
|---|
| 103 |
|
|---|
| 104 | \clearpage
|
|---|
| 105 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|---|
| 106 | \maketitle
|
|---|
| 107 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|---|
| 108 | \thispagestyle{empty}
|
|---|
| 109 | Web corpora and other Web-derived data have become a gold mine for corpus
|
|---|
| 110 | linguistics and natural language processing. The Web is an easy source of
|
|---|
| 111 | unprecedented amounts of linguistic data from a broad range of registers and
|
|---|
| 112 | text types. However, a collection of Web pages is not immediately suitable for
|
|---|
| 113 | exploration in the same way a traditional corpus is.
|
|---|
| 114 |
|
|---|
| 115 | Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005
|
|---|
| 116 | Conference, a highly successful series of yearly Web as Corpus workshops
|
|---|
| 117 | provides a venue for interested researchers to meet, share ideas and discuss
|
|---|
| 118 | the problems and possibilities of compiling and using Web corpora. After a
|
|---|
| 119 | stronger focus on application-oriented natural language processing and Web
|
|---|
| 120 | technology in recent years – with workshops taking place at NAACL-HLT 2010,
|
|---|
| 121 | 2011 and WWW 2012 – the 8th Web as Corpus Workshop returns to its roots in the
|
|---|
| 122 | corpus linguistics community.
|
|---|
| 123 |
|
|---|
| 124 | Accordingly, the leading theme of this workshop is the application of Web data
|
|---|
| 125 | in language research, including linguistic evaluation of Web-derived corpora as
|
|---|
| 126 | well as strategies and tools for high-quality automatic annotation of Web text.
|
|---|
| 127 | The workshop brings together presentations on all aspects of building, using
|
|---|
| 128 | and evaluating Web corpora, with a particular focus on the following topics:
|
|---|
| 129 |
|
|---|
| 130 | \begin{itemize}
|
|---|
| 131 | \item applications of Web corpora and other Web-derived data sets for
|
|---|
| 132 | language research
|
|---|
| 133 | \item automatic linguistic annotation of Web data such as tokenisation,
|
|---|
| 134 | part-of-speech tagging, lemmatisation and semantic tagging
|
|---|
| 135 | \item (the accuracy of currently available off-the-shelf tools is still
|
|---|
| 136 | unsatisfactory for many types of Web data)
|
|---|
| 137 | \item critical exploration of the characteristics of Web data from a
|
|---|
| 138 | linguistic perspective and its applicability to language research
|
|---|
| 139 | \item presentation of Web corpus collection projects or software tools
|
|---|
| 140 | required for some part of this process (crawling, filtering,
|
|---|
| 141 | de-duplication, language identification, indexing, ...)
|
|---|
| 142 | \end{itemize}
|
|---|
| 143 |
|
|---|
| 144 |
|
|---|
| 145 | \clearpage
|
|---|
| 146 | \renewcommand\contentsname{Table of Contents}
|
|---|
| 147 | \addcontentsline{pdf}{section}{Table of Contents}
|
|---|
| 148 | \tableofcontents
|
|---|
| 149 | \thispagestyle{plain}
|
|---|
| 150 | \clearpage
|
|---|
| 151 |
|
|---|
| 152 | %% main matter
|
|---|
| 153 | %%
|
|---|
| 154 | \thispagestyle{fancy}
|
|---|
| 155 | \pagenumbering{arabic}
|
|---|
| 156 | % paper_9.pdf paper_10.pdf paper_11.pdf paper_2.pdf paper_3.pdf paper_13.pdf paper_5.pdf paper_7.pdf paper_8.pdf paper_6.pdf paper_1.pdf paper_14.pdf
|
|---|
| 157 |
|
|---|
| 158 | \insertpdf{paper_9.pdf}{A.Minocha, S.Reddy, A.Kilgarriff}{Feed Corpus : An Ever
|
|---|
| 159 | Growing Up-to-date Corpus}{Feed Corpus}
|
|---|
| 160 |
|
|---|
| 161 | \insertpdf{paper_10.pdf}{S.Wattam, P.Rayson, D.Berridge}{LWAC: Longitudinal
|
|---|
| 162 | Web-as-Corpus Sampling}{LWAC}
|
|---|
| 163 |
|
|---|
| 164 | \insertpdf{paper_11.pdf}{R.Sch\"afer, A.Barbaresi, F.Bildhauer}{The Good, the
|
|---|
| 165 | Bad, and the Hazy: Design Decisions in Web Corpus Construction}{The Good, the
|
|---|
| 166 | Bad, and the Hazy}
|
|---|
| 167 |
|
|---|
| 168 | \insertpdf{paper_2.pdf}{J.Egbert, D.Biber}{Developing a User-based Method of
|
|---|
| 169 | Web Register Classification}{Developing a User-based Method of Web Register
|
|---|
| 170 | Classification}
|
|---|
| 171 |
|
|---|
| 172 | \insertpdf{paper_7-mod.pdf}{A.Piperski, V.Belikov, N.Kopylov, E.Morozov,
|
|---|
| 173 | V.Selegey, S.Sharoff}{Big and diverse is beautiful: A large corpus of Russian
|
|---|
| 174 | to study linguistic variation}{Big and diverse is beautiful}
|
|---|
| 175 |
|
|---|
| 176 | \insertpdf{paper_13.pdf}{D.Lutz, P.Cadwallader, M.Rooth}{A web application for
|
|---|
| 177 | filtering and annotating web speech data}{Web application for filtering and
|
|---|
| 178 | annotating web speech data}
|
|---|
| 179 |
|
|---|
| 180 | \insertpdf{paper_5.pdf}{S.Schulz, V.Lyding, L.Nicolas}{STirWaC - Compiling a
|
|---|
| 181 | diverse corpus based on texts from the web for South Tyrolean German}{STirWaC}
|
|---|
| 182 |
|
|---|
| 183 | \insertpdf{paper_3.pdf}{A.Kilgarriff, V.Suchomel}{Web Spam}{Web Spam}
|
|---|
| 184 |
|
|---|
| 185 | \insertpdf{paper_8.pdf}{A.Ferraresi, S.Bernardini}{The academic
|
|---|
| 186 | Web-as-Corpus}{Academic Web-as-Corpus}
|
|---|
| 187 |
|
|---|
| 188 | \insertpdf{paper_6.pdf}{S.Scheible, S.Schulte Im Walde, M.Weller, M.Kisselew}{A
|
|---|
| 189 | Compact but Linguistically Detailed Database for German Verb Subcategorisation
|
|---|
| 190 | relying on Dependency Parses from Web Corpora: Tool, Guidelines and
|
|---|
| 191 | Resource}{Database for German Verb Subcategorisation}
|
|---|
| 192 |
|
|---|
| 193 | \insertpdf{paper_1.pdf}{A.Brindle}{Thug breaks man's jaw: A Corpus Analysis of
|
|---|
| 194 | Responses to Interpersonal Street Violence}{Thug breaks man's jaw}
|
|---|
| 195 |
|
|---|
| 196 | \insertpdf{paper_14-mod.pdf}{C.Crangle}{A web-based model of semantic
|
|---|
| 197 | relatedness and the analysis of electroencephalographic (EEG) data}{Web-based
|
|---|
| 198 | model of semantic relatedness and the analysis of EEG data}
|
|---|
| 199 |
|
|---|
| 200 | %\insertpdf{}{}{}{}
|
|---|
| 201 |
|
|---|
| 202 | %------------------------------------------------------------------------------
|
|---|
| 203 | \end{document}
|
|---|
| 204 |
|
|---|
| 205 | % EOF
|
|---|