Resource: JV_TDM Corpus

Reference ELRA-S0379
Date of Submission Jan. 5, 2016, 5:48 p.m.
Status accepted
ISLRN 371-240-320-910-4
Resource Type Primary Text
Media Type Text
Source
Language French
Description

The JV_TDM corpus provides a phonetic annotation of 37 chapters of the original French version of "Around the World in 80 Days" by Jules Verne read by a single speaker. Each chapter has been annotated in a separate .TextGrid file. The audio files are not included in this release. They are available under a CC BY-NC-SA licence on the site www.litteratureaudio.com (www.litteratureaudio.com/livre-audio-gratuit-mp3/jules-verne-le-tour-du-monde-en-80-jours.html).

The total audio size is 6h 41mn 36s with 5h 2mn 41s of speech. In the JV_TDM corpus, the speaker uttered 78,876 words at an average speed of 5.82 syllables and 13.49 phones per second. The speaker produced 244,908 phones and 11,352 pauses (short and long). All phonemes except glottal stops and palatal/velar nasals are encountered more than 1000 times.

The .TextGrid files contain several annotation tiers: phoneme, number of alphanumeric characters corresponding to a phone, syllable, transcription, PoS, paragraph break, sentence break, prosodic annotations, breathing pauses.

With the text-to-speech system COMPOST, the original text material was first PoS annotated, phonetically transcribed, syllabified and plausible pauses were inserted. Text-to-speech alignment was then performed on paragraphs which were manually delimited with Praat. The segmentation and all the annotations were manually validated.

Version 1.0
Distributor ELRA