Resource: Acoustic database for Polish concatenative speech synthesis

Reference Acoustic database for Polish concatenative speech synthesis
Date of Submission Jan. 24, 2014, 4:17 p.m.
Status accepted
ISLRN 305-222-372-690-4
Resource Type Primary Text
Media Type Audio
Language Polish
Size 159 mb

This database consists of 1443 nonsense words including all the diphones for the Polish language. The diphone is always placed at an unstressed syllable. The neighbourhood doesn’t influence the co-articulation of the diphone.

The database includes information such as: the name of the diphone, context of the diphone, phonetic transcription in SAMPA, identifier of the wave file where it is placed, and three numbers corresponding to the beginning, the middle and the end of the diphone.

The recordings were taken in an anechoic chamber using one table stand dynamic microphone (Sennheiser M104) by a female speaker. A 16 kHz sampling frequency and 16 bit resolution was used. The total duration of the recordings is 1.27 hours with prompts varying in length from 2 to 6 seconds.

The signal was manually aligned with the position of the diphone, i.e. each prompt contains the boundary of the chosen diphone. The database was validated manually.

For a more detailed description, see:
- Szklanny K. (2002) MBROLA – Creating Polish diphone database for speech synthesis, 3rd European Master School on Language and Speech, Leuven, Belgium
- Szklanny K. (2003) Preparing the Polish diphone database for speech synthesis in MBROLA. 50. Otwarte Seminarium z Akustyki Szczyrk, Poland

Version 1.0
Distributor ELRA