Resource: DEFT'08 Evaluation Package

Reference DEFT'08 Evaluation Package
Date of Submission Jan. 24, 2014, 4:22 p.m.
Status accepted
ISLRN 161-881-080-899-5
Resource Type Primary Text
Media Type Text
Language French

DEFT (DEfi Fouille de Texte – Text Mining Challenge) organizes evaluation campaigns in the field of text mining. The topic of DEFT 2008 edition is related to the classification of texts by topics and genres.

Automatic classification has multiple applications in text mining. Many application fields have been explored, from email orientation to strategic or scientific watch. For a few years, a new problematics on text genre classification has emerged. Beyond document topic recognition, genre recognition is useful to the use that will be made out of the document. Questions that can be raised are: How can we recognize both document topic and genre? Can difference in genre influence the recognition of a document topical category, and conversely, can difference in topic influence the recognition of a document genre?

To evaluate classification software for that prospect, DEFT’08 Evaluation Package enables to compare two corpora with different genres (a newspaper article corpus extracted from Le Monde newspaper and a corpus of encyclopaedic articles extracted from the internet free encyclopaedia, Wikipedia) on the basis of the same set of pre-defined categories. Although a newspaper article highlights news whereas an encyclopaedic article disseminates knowledge, both have a certain amount of general topical categories in common, called “column” for the former and “category” for the latter. It consists in testing, on the one hand, robustness of a topical classification model subjected to variations in text genre, and, on the other hand, possible improvements of a topical classification through the recognition of text genre, on those corpora.

Version 1.0
Distributor ELRA