Resource: Fundamental Portuguese Corpus

Reference Fundamental Portuguese Corpus
Date of Submission Jan. 24, 2014, 4:29 p.m.
Status accepted
ISLRN 812-337-422-842-3
Resource Type Primary Text
Media Type Audio
Language Portuguese

The Fundamental Portuguese Corpus is a corpus of spoken language, collected between 1970 and 1974, composed of 1800 recordings (500 hours) made in Continental Portugal and the Islands. Of these 1800 conversations, a sample was selected and transcribed.

The corpus consists of audio files in .wav format, aligned transcriptions in XML Exmaralda format and transcriptions in plain text. The plain text files also have automatically assigned POS-tag information. The transcriptions of the corpus are also available in html format. The characters have been encoded in UTF-8.

Version 1.0
Distributor ELRA