Resource: ISLE Speech Corpus

Reference ISLE Speech Corpus
Date of Submission Jan. 24, 2014, 4:29 p.m.
Status accepted
ISLRN 723-960-059-948-7
Resource Type Primary Text
Media Type Audio
Source
Language English
Description

Approx. 20 minutes of speech (per speaker) from 23 German and 23 Italian intermediate learners of English. Each speaker recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions). The prompts were of varying perplexities.

About 2/3 of the data for each speaker was annotated by one of a team of linguists. The files were corrected first at the word level, and an automatic recognizer was then used to produce phone-level annotations. The annotator then re-annotated each sentence to mark phone and stress errors (e.g., substitutions, insertions, or deletions).

Corpus details:
* a total of 46 speakers (23 German and 23 Italian.)
* 11484 utterances
* 1.92 gigabytes of WAV files (4 CDs)
* 17 hours, 54 minutes, and 44 seconds of speech data

A much more detailed explanation of the ISLE corpus will be available in the proceedings of LREC 2000. An electronic copy of this paper may be obtained by sending an email to Dr. Wolfgang Menzel at <menzel@nats.informatik.uni-hamburg.de>.

W. Menzel, E. Atwell, P. Bonaventura, D. Herron, P. Howarth, R. Morton, and C. Souter. "The ISLE corpus of non-native spoken English", Proc. Second LREC.

Version 1.0
Distributor ELRA