CHIEDE Corpus: a spontaneous child language corpus of Spanish

Submission date: Jan. 24, 2014, 4:22 p.m.

The spontaneous child language corpus, CHIEDE, consists of 58,163 words, in 30 texts, with 7 hours and 53 minutes of recordings and 59 child participants. About a third of the whole corpus is formed by child language and the remaining two thirds by adult speech. The main feature of CHIEDE is the interactions spontaneity: texts are recordings of communicative situations in their natural context. The resource is presented in different formats: an orthographic transcription, an automatic phonological transcription, a XML tagged version and the text-sound alignment. Results obtained after the extraction of data from the annotated texts, through statistical methods, are also provided. The corpus presents a final design formed by two kinds of interactions: spontaneous collective conversations, recorded at a daily activity in classroom, and personal interviews, made by an adult to a single child, where the conversation loses spontaneity, as it is guided with questions.

