Resource: IBNC - An Italian Broadcast News Corpus

Reference IBNC - An Italian Broadcast News Corpus
Date of Submission Jan. 24, 2014, 4:29 p.m.
Status accepted
ISLRN 133-155-327-792-1
Resource Type Primary Text
Media Type Audio
Source
Language Italian
Size 30 hours
Description

The Italian Broadcast News Corpus (IBNC) was produced by the ITC-IRST (Italy) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). RAI, the major Italian broadcast company, supplied studio quality recordings of radio news programs sampled from its internal digital archive. The collection consists of 150 programs, for a total time of about 30 hours, issued in 36 different days, between 1992 and 1999. Recordings were supplied by RAI on Digital Audio Tapes (DAT), with 44kHz sampling rate and 16 bit resolution. Each DAT was manually processed to transfer each single program issue into a single file. During this operation, the signal was down-sampled to 16kHz with a resolution of 16 bits, and encoded into the NIST Sphere PCM format. Speech recordings present variations of topic, speaker, acoustic channel, speaking mode, etc. The corpus has been segmented, labelled and transcribed manually using the tool developed by DGA (Délégation Générale pour l'Armement, France) and LDC (Linguistic Data Consortium, USA), called "Transcriber", with conventions similar to those adopted by LDC for the DARPA HUB-4 corpora.The transcription text consists of mixed-case ASCII characters of the ISO-8859-1 extended set. A validation work was carried out by an external validator. It consisted of checking audio files, documentation and transcriptions.

Version 1.0
Distributor ELRA