Resource: The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication

Reference The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication
Date of Submission Jan. 24, 2014, 4:31 p.m.
Status accepted
ISLRN 934-733-835-065-0
Resource Type Primary Text
Media Type Audio
Source
Language English
Description

This database has been collected and packaged under the auspices of the IST-EU STREP project HIWIRE (Human Input that Works In Real Environments). The database was designed to be used as a tool for development and test of speech processing and recognition techniques dealing with robust non-native speech recognition.

The database contains 8,099 English utterances pronounced by non-native speakers (31 French, 20 Greek, 20 Italian, and 10 Spanish speakers). The collected utterances correspond to human input in a command and control aeronautics application. The data was recorded in studio with a close-talking microphone and real noise recorded in an airplane cockpit was artificially added to the data. The signals are provided in clean (studio recordings with close talking microphone), low, mid and high noise conditions. The three noise levels correspond approximately to signal-to-noise ratios of 10dB, 5dB and -5 dB respectively.

Clean audio data has been recorded in different office rooms using a close-talking microphone for lowest ambient acoustic effects (Plantronics USB-45). The used sampling frequency is 16 kHz and data is stored in Windows PCM WAV 16 bits mono format.

Recordings correspond to prompts extracted from an aeronautic command and control application. A total of 8,099 utterances have been recorded corresponding to 81 speakers pronouncing 100 utterances each. The speaker distribution is as follows:

<table border="0" width="100%" cellspacing="0" cellpadding="2" class="infoBoxContents">
<tr align=center><td>Country</td><td># Speakers</td><td># Utterances</td></tr>
<tr align=center><td>France</td><td>31 (38.3%)</td><td>3100</td></tr>
<tr align=center><td>Greece</td><td>20 (24.7%)</td><td>2000</td></tr>
<tr align=center><td>Italy</td><td>20 (24.7%)</td><td>2000</td></tr>
<tr align=center><td>Spain</td><td>10 (12.3%)</td><td>999</td></tr>
<tr align=center><td>Total</td><td>81</td><td>8099</td></tr>
</table>

To generate the noisy data utterances, the speech level is maintained and only the noise amplitude is modified to obtain the desired SNR. The noise amplitude is adjusted to obtain three different averaged SNR values of 10dB, 5dB and -5dB which are referenced as low noise (LN), mid noise (MN) and high noise (HN) conditions. For each given condition the noise level remains constant.

The speech data are pcm-wav files (16kHz / 16 bits / mono) stored on one DVD. The total size is 3.03 Gbytes for 33.053 files.

Version 1.0
Distributor ELRA