The "SIVA" Speech Database for Speaker Verification and Identification

The Italian speech database SIVA (?Speaker Identification and Verification Archives: SIVA?), is a database comprising more than two thousands calls, collected over the public switched telephone network, and available very soon via ELRA. The SIVA database consists of four speaker categories: male users, female users, male impostors, female impostors. Speakers were contacted via mail before the test, and they were asked to read the information and the instructions provided carefully before making the call. About 500 speakers were recruited using a company specialized in selection of population samples. The others were volunteers contacted by the institute concerned. Speakers access the recording system by calling a toll free number. An automatic answering system guides them through the three sessions that make up a recording. In the first session, a list of 28 words (including digits and some commands) is recorded using a standard enumerated prompt. The second session is a simple unidirectional dialogue (the caller answers prompted questions) where personal information is asked (name, age, etc.). In the third session, the speaker is asked to read a continuous passage of phonetically balanced text that resembles a short curriculum vitae. The signal is a standard 8kHz sampled signal, coded using 8 bits mu-law format. The data collected so far consists of: · MU: male users 18 speakers, 20 repetitions · FU: female users 16 speakers, 26 repetitions · MI: male impostors: 189 speakers, 2 repetitions, and 128 speakers, 1 repetition · FI: female impostors: 213 speakers, 2 repetitions, and 107 speakers, 1 repetition.

