Reference Corpus of Contemporary Portuguese (CRPC)

Full Official Name: Corpus of Contemporary Portuguese (CRPC)
Submission date: May 5, 2014, 9:34 a.m.

The CRPC is a large electronic corpus of European Portuguese and other varieties of Portuguese (from Brazil, Angola, Cape Verde, Guinea-Bissau, Mozambique, S. Tome and Principe, Goa, Macao and East-Timor). It contains 311,4 million words (309,8M of written texts and 1,6M of spoken recordings and transcriptions) taken from texts from the second half of the 19th century up until 2006, although most of the texts have been produced after 1970. The CRPC covers several types of written texts, such as fiction, newspaper, technical, scientific, didactic, leaflets, decisions of the supreme court of justice, parliament sessions, and it includes a spoken subpart of both formal and informal speech (monologues, dialogues, conversations, phone conversations, lectures, homilies, etc.). The corpus is tokenized, lemmatized and annotated with PoS and NP chunks information. The written subpart of the CRPC can be searched online through CQPweb at: http://alfclul.clul.ul.pt/CQPweb/

Creator(s)
Distributor(s)
Right Holder(s)