ISLRN

MAURDOR Evaluation Package

Full Official Name: MAURDOR Evaluation Package

Submission date: Feb. 26, 2015, 5:52 p.m.

The MAURDOR project consists in evaluating systems for automatic processing of written documents. Collected written documents are scanned documents (printed, typewritten or manuscripts). In order to get images for the evaluation of automatic analysis systems, 10,000 original documents were collected and annotated (5000 in French, 2500 in English and 2500 in Arabic). This package contains 8,129 documents out of the 10,000 originally collected. Each of the 8129 documents belongs to one of the 5 following categories: C1: Printed form (completed in manuscript) C2: Commercial, private or professional document, printed or photocopied C3: Manuscript private correspondence C4: Typewritten private or professional correspondence C5: Others Once collected, those documents were submitted to a manual annotation. This human analysis is used as a reference, known as ground truth, for the training and evaluation of automatic processing systems. Annotations aim to highlight the following information: 1. How the document is structured (text zones, images...)? 2. Which writings are present, with their type (manuscript/typewritten) and their language (French, English, Arabic, other)? 3. What is the main information in the documents (author, recipient, subject, date...)? The MAURDOR evaluation campaign provides a common framework for the reporting of current performances of systems for automatic processing of digital documents. This package contains the material provided to the campaign participants: - Consistent development and test data corresponding to the application concerned; - Tools for the automatic measurement of system performances; - A common assessment protocol applicable to each processing stage, along with a complete automatic processing chain for written documents. The documents are provided in TIFF format and the annotations are provided in XML format. The aim of this evaluation package is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

Creator(s)

Distributor(s)

ELRA

Right Holder(s)

Status : Accepted

ISLRN :

364-018-517-901-2

Version

1.0

Source

http://catalog.elra.info/product_info.php?products_id=1242

Resource Type

Primary Text

Media Type

Image

Language(s)

Arabic

English

French

Access Medium