Resource: COPLE2

Reference Learner Corpus of Portuguese L2/FL - COPLE2
Date of Submission Oct. 24, 2016, 3:53 p.m.
Status accepted
ISLRN 642-718-655-040-0
Resource Type Primary Text
Media Type Text, Audio
Source
Language Portuguese
Format/MIME Type XML, wav
Size 978 free essays; 182,474 tokens
Description

The COPLE2 corpus is a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format.

Version 1.1
Creator Amália Mendes - Centro de Linguística da Universidade de Lisboa
Distributor Amália Mendes - Centro de Linguística da Universidade de Lisboa
Rights Holder Amália Mendes - Centro de Linguística da Universidade de Lisboa