PAIDIALOGOS (NEOLOGOS Project)

Full Official Name: PAIDIALOGOS (NEOLOGOS Project)
Submission date: Jan. 24, 2014, 4:30 p.m.

The PAIDIALOGOS database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The databases produced in the framework of the NEOLOGOS project are designed for the development and the assessment of French speech or speaker recognizers and speech synthesizers. They consist in: 1) the IDIOLOGOS databases are made of adults voices and are available in 2 subsets: - the “Bootstrap” database (catalogue ref. ELRA-S0226-01), - the “Eingenspeakers” database (catalogue ref. ELRA-S0226-02); 2) the PAIDIALOGOS database (catalogue ref. ELRA-S0227) is made of children’s and teenagers’ voices. The PAIDIALOGOS database contains 37,364 utterances from 1010 child French speakers (510 males and 500 females) recorded over the French fixed telephone network. This database is distributed as 1 DVD-ROM. The speech files are stored as sequences of 8-bit, 8kHz A-law speech files and are not compressed, according to the specifications of NEOLOGOS. Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file. This speech database was validated by SPEX (the Netherlands) to assess its compliance with the NEOLOGOS format and content specifications. Each speaker uttered the following items: - 3 application words (set of 42) - 4 connected digits: 2 sequence of 3 isolated digits, 1 sheet number (7 digits), 1 telephone number (10 digits) - 3 dates (1 spontaneous date e.g. birthday, 1 word style prompted date, 1 relative and general date expression) - 2 isolated digits - 3 spelled words (1 surname, 1 directory assistance city name, 1 real/artificial name for coverage) - 1 currency money amount - 1 natural number - 4 directory assistance names (1 spontaneous, e.g. own surname, 1 city of where the call is made from, 1 most frequent French city out of a set of 40, 1 “forename surname”) - 2 yes/no questions (1 predominantly “yes” question, 1 predominantly “no” question) - 6 phonetically rich sentences - 2 time phrases (1 time of call, 1 word style time phrase) - 2 phonetically rich words The following age distribution has been obtained: 6 speakers are under 7, 541 speakers are between 7 and 11, 308 speakers are between 12 and 14, 154 speakers are between 15 and 16, and 1 speaker is over 16. A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

Creator(s)
Distributor(s)
Right Holder(s)