Resource: TC-STAR English Training Corpora for ASR: Recordings of EPPS Speech

Reference TC-STAR English Training Corpora for ASR: Recordings of EPPS Speech
Date of Submission Jan. 24, 2014, 4:31 p.m.
Status accepted
ISLRN 428-162-628-204-7
Resource Type Primary Text
Media Type Audio
Source
Language English
Description

TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS).

This corpus consists of the recordings of around 290 hours from EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English), 92 hours of which were annotated (transcribed) (the transcriptions are not included in the present package). These recordings were obtained from Europe by Satellite (http://europa.eu.it/comm/ebs) from May 2004 until May 2006.

The speech signals were submitted by EbS via internet in Real Media format and via satellite in MPEG1-layer2 format. The signals were decoded, resampled and are stored in WAVE RIFF (Resource Interchange File Format). Each file contains a single channel with 16-bit resolution at a sample rate of 16kHz.

The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.

For corresponding transcriptions, see ELRA-S0249.

Version 1.0
Distributor ELRA