\newcommand{\thetitle}{Proceedings of the 8th Web as Corpus Workshop (WAC-8)
@Corpus Linguistics 2013}
\newcommand{\authora}{Stefan Evert}
\newcommand{\authorb}{Egon Stemle}
\newcommand{\authorc}{Paul Rayson}
\newcommand{\theauthors}{\authora, \authorb, \authorc}
% init geometry with these values to have them when fancyhdr loads
\PassOptionsToPackage{%
    twoside=false, 
    top=1cm, 
    bottom=1cm, 
    left=2.5cm, 
    right=2.5cm, 
    includeheadfoot}
{geometry}
\PassOptionsToPackage{%
    pdftitle={\thetitle},
    pdfauthor={\theauthors},
    pdfsubject={},
    pdfkeywords={},
    colorlinks=true,
    linkcolor=blue,
    bookmarkstype=pdf
}
{hyperref}

% use the easychair style
\documentclass[a4paper, onesided]{easychair}

% This provides the \BibTeX macro
\usepackage{doc}
\usepackage{makeidx}

% allow for inclusion of pdf documents
\usepackage{pdfpages}

%\makeindex

% from toc.tex
\usepackage{titletoc}
\titlecontents{subsubsection}[2pt]{\addvspace{10pt}\bfseries\titlerule[0.5pt]\filright}{}{}{}[]
\titlecontents{section}[0pt]{\addvspace{5pt}\filright}{}{}{\dotfill\contentspage}[]
\titlecontents{subsection}[10pt]{\addvspace{1pt}\itshape\filright}{}{}{}[]
\newcommand{\tocSection}[1]{\contentsline{subsubsection}{#1\\*\titlerule[0.5pt]\vspace{-9pt plus 2pt minus 2pt}}{}{}\nopagebreak[4]}
\newcommand{\tocTitle}[2]{\contentsline{section}{#1}{#2}{}\nopagebreak[4]}
\newcommand{\tocAuthors}[1]{\contentsline{subsection}{#1}{}{}}

\DeclareRobustCommand{\insertpdf}[4]{
\phantomsection  
\addcontentsline{pdf}{section}{#4}
\addcontentsline{toc}{section}{#3}
\addcontentsline{toc}{subsection}{#2}
\fancyhead[LO,LE]{#2}
\fancyhead[RO,RE]{#4}
\includepdf[pagecommand={\thispagestyle{plain}}, pages=1]{#1}
\includepdf[pagecommand={\thispagestyle{fancy}}, pages=2-]{#1}
}

%% Document
%%
\begin{document}

%% Front Matter
%%
\pagenumbering{roman}
\title{\thetitle}

% Authors are joined by \and. Their affiliations are given by \inst, which indexes
% into the list defined using \institute
%
\author{\authora\inst{1} \and \authorb\inst{2} \and \authorc\inst{3}}

% Institutes for affiliations are also joined by \and,
\institute{
    Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),
    Erlangen, Germany\\
    %\email{mokhov@cse.concordia.ca}
\and
    European Academy of Bozen/Bolzano (EURAC),
    Bolzano (BZ), Italy\\
    %\email{geoff@cs.miami.edu}\\
\and
    Lancaster University,
    Lancaster, U.K.\\
    %\email{andrei@voronkov.com, graham@cs.man.ac.uk}\\
}

\fancyfoot[LO,LE]
{S.Evert, E.Stemle, P.Rayson (eds.)}
\fancyfoot[CO,CE]
{WAC-8, 2013}
\fancyfoot[RO,RE]
{\thepage}

\fancypagestyle{plain}{%
\fancyhf{} % clear all header and footer fields
\fancyfoot[R]{{\normalsize\thepage}}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}}

% fine lines above footer and below header
\renewcommand{\headrulewidth}{0.4pt}\renewcommand{\footrulewidth}{0.4pt}

\clearpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\maketitle
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\thispagestyle{empty}
Web corpora and other Web-derived data have become a gold mine for corpus
linguistics and natural language processing. The Web is an easy source of
unprecedented amounts of linguistic data from a broad range of registers and
text types. However, a collection of Web pages is not immediately suitable for
exploration in the same way a traditional corpus is.

Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005
Conference, a highly successful series of yearly Web as Corpus workshops
provides a venue for interested researchers to meet, share ideas and discuss
the problems and possibilities of compiling and using Web corpora. After a
stronger focus on application-oriented natural language processing and Web
technology in recent years – with workshops taking place at NAACL-HLT 2010,
2011 and WWW 2012 – the 8th Web as Corpus Workshop returns to its roots in the
corpus linguistics community.

Accordingly, the leading theme of this workshop is the application of Web data
in language research, including linguistic evaluation of Web-derived corpora as
well as strategies and tools for high-quality automatic annotation of Web text.
The workshop brings together presentations on all aspects of building, using
and evaluating Web corpora, with a particular focus on the following topics:

\begin{itemize}
    \item applications of Web corpora and other Web-derived data sets for
        language research
    \item automatic linguistic annotation of Web data such as tokenisation,
        part-of-speech tagging, lemmatisation and semantic tagging
    \item (the accuracy of currently available off-the-shelf tools is still
        unsatisfactory for many types of Web data)
    \item critical exploration of the characteristics of Web data from a
        linguistic perspective and its applicability to language research
    \item presentation of Web corpus collection projects or software tools
        required for some part of this process (crawling, filtering,
        de-duplication, language identification, indexing, ...)
\end{itemize}


\clearpage
\renewcommand\contentsname{Table of Contents}
\addcontentsline{pdf}{section}{Table of Contents}
\tableofcontents
\thispagestyle{plain}
\clearpage

%% main matter
%%
\thispagestyle{fancy}
\pagenumbering{arabic}
% paper_9.pdf paper_10.pdf paper_11.pdf paper_2.pdf paper_3.pdf paper_13.pdf paper_5.pdf paper_7.pdf paper_8.pdf paper_6.pdf paper_1.pdf paper_14.pdf

\insertpdf{paper_9.pdf}{A.Minocha, S.Reddy, A.Kilgarriff}{Feed Corpus : An Ever
Growing Up-to-date Corpus}{Feed Corpus}

\insertpdf{paper_10.pdf}{S.Wattam, P.Rayson, D.Berridge}{LWAC: Longitudinal
Web-as-Corpus Sampling}{LWAC}

\insertpdf{paper_11.pdf}{R.Sch\"afer, A.Barbaresi, F.Bildhauer}{The Good, the
Bad, and the Hazy: Design Decisions in Web Corpus Construction}{The Good, the
Bad, and the Hazy}

\insertpdf{paper_2.pdf}{J.Egbert, D.Biber}{Developing a User-based Method of
Web Register Classification}{Developing a User-based Method of Web Register
Classification}

\insertpdf{paper_7-mod.pdf}{A.Piperski, V.Belikov, N.Kopylov, E.Morozov,
V.Selegey, S.Sharoff}{Big and diverse is beautiful: A large corpus of Russian
to study linguistic variation}{Big and diverse is beautiful}

\insertpdf{paper_13.pdf}{D.Lutz, P.Cadwallader, M.Rooth}{A web application for
filtering and annotating web speech data}{Web application for filtering and
annotating web speech data}

\insertpdf{paper_5.pdf}{S.Schulz, V.Lyding, L.Nicolas}{STirWaC - Compiling a
diverse corpus based on texts from the web for South Tyrolean German}{STirWaC}

\insertpdf{paper_3.pdf}{A.Kilgarriff, V.Suchomel}{Web Spam}{Web Spam}

\insertpdf{paper_8.pdf}{A.Ferraresi, S.Bernardini}{The academic
Web-as-Corpus}{Academic Web-as-Corpus}

\insertpdf{paper_6.pdf}{S.Scheible, S.Schulte Im Walde, M.Weller, M.Kisselew}{A
Compact but Linguistically Detailed Database for German Verb Subcategorisation
relying on Dependency Parses from Web Corpora: Tool, Guidelines and
Resource}{Database for German Verb Subcategorisation}

\insertpdf{paper_1.pdf}{A.Brindle}{Thug breaks man's jaw: A Corpus Analysis of
Responses to Interpersonal Street Violence}{Thug breaks man's jaw}

\insertpdf{paper_14-mod.pdf}{C.Crangle}{A web-based model of semantic
relatedness and the analysis of electroencephalographic (EEG) data}{Web-based
model of semantic relatedness and the analysis of EEG data}

%\insertpdf{}{}{}{}

%------------------------------------------------------------------------------
\end{document}

% EOF
