Resource: CLEF Domain Specific Test Suites (2004-2008) – Evaluation Package

Reference CLEF Domain Specific Test Suites (2004-2008) – Evaluation Package
Date of Submission Jan. 24, 2014, 4:22 p.m.
Status accepted
ISLRN 609-362-685-537-2
Resource Type Primary Text
Media Type Text
Source
Language English, German, Russian
Size 617 mb
Description

The Cross-Language Evaluation Forum (CLEF) promotes R&D in multilingual information access (MLIA) by (i) developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and (ii) creating test-suites of reusable data which can be employed by system developers for benchmarking purposes.

The CLEF Domain SpecificTest Suites (2004-2008) contain the data used for the Domain Specific track of the CLEF campaigns carried out from 2004 to 2008. This track tested the performance of monolingual, bilingual and multilingual Information Retrieval (IR) systems on multilingual collections of scientific articles.

The CLEF Test Suite is composed of:
• Data Collections
• Topics
• Guidelines
• Relevance assessments
• Official campaign results
• Working notes papers

The Data Collections consist of the following datasets:
• German Indexing and Retrieval Test database (302,638 documents, 524 Mb):
Data collection (social sciences) including a German corpus (151,319 documents) and a pseudo-English corpus which is in fact a translation of the German corpus into English (does not contain as much textual information as the German version).
• Cambridge Scientific Abstracts - Sociological Abstracts (20,000 documents, 38.5 Mb):
Database of Sociological Abstracts from Cambridge Scientific Abstracts.
• Russian Social Science Corpus (94,581 documents, 65 Mb):
Russian sociology database data from the Russian Social Science Corpus.
• Institute of Scientific Information for Social Sciences (Russian Academy of Science) (145,802 documents, 12 Mb):
The INION-ISISS corpus consists of bibliographical data from the ISISS database (03.02.2006) covering economics (~99,000 documents) and social sciences (46,000 documents).

The full package consists of 617 Mb and is stored on 1 CD.

Version 1.0
Distributor ELRA