MULTEXT Prosodic database

Full Official Name: MULTEXT Prosodic database
Submission date: Jan. 24, 2014, 4:30 p.m.

This database comprises one CD-ROM for each five languages (French, English, Italian, German and Spanish), totalling 4 hours and 20 minutes of speech and involving 50 different speakers (5 male and 5 female per language). The recordings on which the corpus is based consist of passages of about five sentences extracted from the EUROM.1 speech corpus (Esprit 2589 project "Multi-lingual Speech Input/output Assessment, Methodology and Standardisation"). The corpus was stylised automatically by an algorithm which factors out microprosodic effects and represents the intonation contour of utterances by a series of target points. Once interpolated by a smooth curve (spline), these points produce a contour indistinguishable from the original when re-synthesised, apart from a few detection errors. A symbolic coding of the 50000 pitch movements of the corpus is also provided, along with the time-alignment of orthographic transcription to signal at word level. The entire corpus was verified and manually corrected by experts for each language. The CD-ROMs contain for each passage: · the signal file from EUROM.1, · the alignment of orthographic transcription to signal at word level, · the Fo file, · the stylisation files, · the re-synthesis using the stylised Fo, · the symbolic coding file, · the residual Fo, i.e. the difference between the Fo and the stylised curve, · a description file for the recording. Additional information: Campione, E., Véronis, J. (1998). A multilingual prosodic database. Proceedings of ICSLP'98, Sidney, Australia. (download PDF version): http://www.elda.org/catalogue/fr/speech/doc/icslp98_mult.pdf)

Creator(s)
Distributor(s)
Right Holder(s)