Resource: Reference Corpus of Contemporary Portuguese (CRPC)

Reference Corpus of Contemporary Portuguese (CRPC)
Date of Submission May 5, 2014, 9:34 a.m.
Status accepted
ISLRN 151-982-545-991-0
Resource Type Primary Text
Media Type Text, Audio
Language Portuguese
Format/MIME Type text/txt, audio/Exmaralda format
Size 311,4 million words
Access Medium Accessible online.

The CRPC is a large electronic corpus of European Portuguese and other varieties of Portuguese (from Brazil, Angola, Cape Verde, Guinea-Bissau, Mozambique, S. Tome and Principe, Goa, Macao and East-Timor). It contains 311,4 million words (309,8M of written texts and 1,6M of spoken recordings and transcriptions) taken from texts from the second half of the 19th century up until 2006, although most of the texts have been produced after 1970. The CRPC covers several types of written texts, such as fiction, newspaper, technical, scientific, didactic, leaflets, decisions of the supreme court of justice, parliament sessions, and it includes a spoken subpart of both formal and informal speech (monologues, dialogues, conversations, phone conversations, lectures, homilies, etc.). The corpus is tokenized, lemmatized and annotated with PoS and NP chunks information.
The written subpart of the CRPC can be searched online through CQPweb at:

Version 2.3
Creator Amália Mendes - Centro de Linguística da Universidade de Lisboa , Fernanda Bacelar do Nascimento - CLUL