Resource: Italian Syntactic-Semantic Treebank (ISST)
|Reference||Italian Syntactic-Semantic Treebank (ISST)|
|Date of Submission||Jan. 24, 2014, 4:29 p.m.|
|Resource Type||Primary Text|
|Format/MIME Type||Plain text|
ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in XML.
ISST has a five-level structure covering orthographic, morpho-syntactic, syntactic and semantic levels of linguistic description. Syntactic annotation is distributed over two different levels: the constituent structure level and the functional relations level. The fifth level deals with lexico-semantic annotation, which is carried out in terms of sense tagging of lexical heads (nouns, verbs and adjectives) augmented with other types of semantic information: ItalWordNet (see ELRA-M0018) is the reference lexical resource used for the sense tagging task. Both syntactic and lexico-semantic annotations refer to the morpho-syntactically annotated text, which in turn is linked to the orthographic file with the text and mark-up of macrotextual organisation (e.g. titles, subtitles, summary, body of article, paragraphs).
The multi-level structure of ISST shows two main novelties with respect to other treebanks:
The adopted morpho-syntactic annotation scheme conforms to the EAGLES international standard.
The ISST functional annotation scheme is based on FAME (Lenci et al. 1999, 2000) whose main features can be summarised as follows: a) hierarchical organisation of functional relations which makes provision for underspecified representations of highly ambiguous functional analyses; b) modular coding architecture which is articulated over different information layers, each factoring out different but possibly interrelated linguistic facets of syntactic annotation. FAME originated as a revision of a de facto standard, i.e. the functional annotation scheme developed in the framework of the LE-2111 SPARKLE project, revision which was first done for better complying with the basic requirements of parsing evaluation (in the framework of the LE-8340 ELSE project), and then for making the scheme suitable for annotation of unrestricted Italian texts.
Lenci A., Montemagni S., Pirrelli V., Soria C., Where opposites meet. A Syntactic Meta-scheme for Corpus Annotation and Parsing Evaluation, in Proceedings of LREC-2000, 31/5-2/6 2000, Athens, 625-632.
Articles describing ISST:
Simonetta Montemagni, Francesco Barsotti, Marco Battista, Nicoletta Calzolari, Ornella Corazzari, Alessandro Lenci, Vito Pirrelli, Antonio Zampolli, Francesca Fanciulli, Maria Massetani, Remo Raffaelli, Roberto Basili, Maria Teresa Pazienza, Dario Saracino, Fabio Zanzotto, Nadia Mana, Fabio Pianesi, Rodolfo Delmonte, 2003, “The syntactic-semantic treebank of Italian. An overview”, Linguistica Computazionale XVI-XVII, pp. 461-492