Resource: MULTEXT Prosodic database
|Reference||MULTEXT Prosodic database|
|Date of Submission||Jan. 24, 2014, 4:30 p.m.|
|Resource Type||Primary Text|
|Language||English, French, German, Italian, Spanish, Castilian|
This database comprises one CD-ROM for each five languages (French, English, Italian, German and Spanish), totalling 4 hours and 20 minutes of speech and involving 50 different speakers (5 male and 5 female per language). The recordings on which the corpus is based consist of passages of about five sentences extracted from the EUROM.1 speech corpus (Esprit 2589 project "Multi-lingual Speech Input/output Assessment, Methodology and Standardisation"). The corpus was stylised automatically by an algorithm which factors out microprosodic effects and represents the intonation contour of utterances by a series of target points. Once interpolated by a smooth curve (spline), these points produce a contour indistinguishable from the original when re-synthesised, apart from a few detection errors. A symbolic coding of the 50000 pitch movements of the corpus is also provided, along with the time-alignment of orthographic transcription to signal at word level. The entire corpus was verified and manually corrected by experts for each language.