Resource: MULTEXT Prosodic database

Reference MULTEXT Prosodic database
Date of Submission Jan. 24, 2014, 4:30 p.m.
Status accepted
ISLRN 098-719-242-965-4
Resource Type Primary Text
Media Type Audio
Language English, French, German, Italian, Spanish, Castilian
Size 260 minutes

This database comprises one CD-ROM for each five languages (French, English, Italian, German and Spanish), totalling 4 hours and 20 minutes of speech and involving 50 different speakers (5 male and 5 female per language). The recordings on which the corpus is based consist of passages of about five sentences extracted from the EUROM.1 speech corpus (Esprit 2589 project "Multi-lingual Speech Input/output Assessment, Methodology and Standardisation"). The corpus was stylised automatically by an algorithm which factors out microprosodic effects and represents the intonation contour of utterances by a series of target points. Once interpolated by a smooth curve (spline), these points produce a contour indistinguishable from the original when re-synthesised, apart from a few detection errors. A symbolic coding of the 50000 pitch movements of the corpus is also provided, along with the time-alignment of orthographic transcription to signal at word level. The entire corpus was verified and manually corrected by experts for each language.
The CD-ROMs contain for each passage:
· the signal file from EUROM.1,
· the alignment of orthographic transcription to signal at word level,
· the Fo file,
· the stylisation files,
· the re-synthesis using the stylised Fo,
· the symbolic coding file,
· the residual Fo, i.e. the difference between the Fo and the stylised curve,
· a description file for the recording.
Additional information: Campione, E., Véronis, J. (1998). A multilingual prosodic database. Proceedings of ICSLP'98, Sidney, Australia.
(download PDF version):

Version 1.0
Distributor ELRA