Resource: SmartWeb Motorbike Corpus (SMC)

Reference SmartWeb Motorbike Corpus (SMC)
Date of Submission Jan. 24, 2014, 4:31 p.m.
Status accepted
ISLRN 500-054-408-686-6
Resource Type Primary Text
Media Type Audio
Language German

The SMARTWEB UMTS data collection was created within the publicly funded German SmartWeb project in the years 2004-2006. It comprises a collection of user queries to a naturally spoken Web interface with the main focus on the soccer world series in 2006. The recordings include field recordings using a hand-held UMTS device (one person, SmartWeb Handheld Corpus SHC, ref. ELRA-S0278), field recordings with video capture of the primary speaker and a secondary speaker (SmartWeb Video Corpus SVC, ref. ELRA-S0279), as well as mobile recordings performed on a BMW motorbike (one speaker, SmartWeb Motorbike Corpus SMC, ref. ELRA-S0280).

This corpus corresponds to mobile recordings performed on a BMW motorbike (SmartWeb Motorbike Corpus SMC) and contains recordings spoken by 36 speakers in a human-machine query situation on a running motor cycle (BMW). Bikers were asked to solve several tasks with a spoken query system to the WWW using an integrated system connected to a speech server via an UMTS connection. Recorded channels are the Bluetooth helmet microphone over UMTS (telephone quality), and - partly - the Bluetooth helmet microphone and an additional neck microphone in high quality.

The corpus contains:
- Total number of recorded queries: 2,315
- Total duration segmented speech: 377 minutes
- Formats: WAV 44,1kHz, 16 bit, ALAW 8kHz 8bit, Verbmobil transliteration, BAS Partitur Format (BPF)
- Segmentation: automatic segmentation into queries by the recording server
- Distribution: 3 DVD-R

See also ELRA-S0278 and ELRA-S0280.

Version 1.0
Distributor ELRA