Resource: CINTIL-PropBank

Reference CINTIL-PropBank
Date of Submission Jan. 24, 2014, 4:22 p.m.
Status accepted
ISLRN 723-486-478-286-6
Resource Type Primary Text
Media Type Text
Language Portuguese

The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) used for regression testing of the computational grammar that supported the annotation of the corpus.
For the creation of this PropBank we adopted a semi-automatic analysis with a double-blind annotation followed by adjudication. The resulting dataset contains three information levels: phrase constituency, grammatical functions, and phrase semantic roles.
The main motivation behind the creation of this resource was to build a high quality data set with semantic information that could support the development of automatic semantic role labelers for Portuguese.

Version 1.0
Distributor ELRA