CLEANEVAL is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus, for linguistic and language technology research and development.
The first Cleaneval took place (for Chinese and English) over the summer of 2007, with a workshop in Belgium in September (3rd Web as Corpus workshop (WAC3), proceedings here).
CLEANEVAL is an activity of ACL-SIGWAC, the Association for Computational Linguistics (ACL) Special Interest Group on Web as Corpus.