Resource: LECTRA (LECture TRAnscriptions in European Portuguese)
Reference | LECTRA (LECture TRAnscriptions in European Portuguese) |
Date of Submission | Oct. 14, 2015, 1:38 p.m. |
Status | accepted |
ISLRN | 298-379-572-530-5 |
Resource Type | Primary Text |
Media Type | Audio |
Source | |
Language | Portuguese |
Format/MIME Type | raw audio |
Size | 21 hours |
Description | This corpus is composed of the audio and the manual transcriptions of the LECTRA Corpus: classroom LECture TRAnscriptions in European Portuguese. The corpus includes seven 1-semester University courses. All lectures were taught at Technical University of Lisbon (IST), recorded in the presence of students, except IICT, recorded in another university and in a quiet office environment, targeting an Internet audience. The corpus contains a total of 28 hours of audio speech that were manually transcribed by several trained annotators. The corpus is comprised of technical University lectures: Production of Multimedia Contents (PMC), Economic Theory I (ETI), Linear Algebra (LA), Introduction to Informatics and Communication Techniques (IICT), Object Oriented Programming (OOP), Accounting (CONT), Graphical Interfaces (GI). Two files per lecture are provided: The TRS files have a total of 220K word tokens (Training set: 179K word tokens, Development set: 21K word tokens, Test set: 20K word tokens). The whole resource occupies 3.3 GB. For a complete description of the corpus and the report of Automatic Speech Recognition results, the reader may refer to: |
Version | 1.0 |
Distributor | ELRA |