Resource: Spoken Portuguese Corpus
|Reference||Spoken Portuguese Corpus|
|Date of Submission||Jan. 24, 2014, 4:31 p.m.|
|Resource Type||Primary Text|
The Spoken Portuguese corpus was collected among sociolinguistically diverse speakers having Portuguese as mother tongue or as second language. In a total of 86 recordings, the texts exemplify the Portuguese spoken in Portugal (30), in Brazil (20), in the African countries with Portuguese as its official language: Angola, Cape Verde, Guinea-Bissau, Mozambique and Sao Tome and Principe (5 each), in Macao (5), in Goa (3) and in East-Timor (3), corresponding to a total of 8h44m of recording.
The corpus consists of audio files in .wav format, aligned transcriptions in XML Exmaralda format and transcriptions in plain text. The plain text files also have automatically assigned POS-tag information. The transcriptions of the corpus are also available in html format. The characters have been encoded in UTF-8.