Resource: PAROLE English lexicon

Reference PAROLE English lexicon
Date of Submission Jan. 24, 2014, 4:30 p.m.
Status accepted
ISLRN 382-062-733-311-0
Resource Type Lexicon
Media Type Text
Source
Language English
Description

The English PAROLE Lexicon has been compiled by two partners, Sheffield University and the Corpus Linguistic Group (CLG) at Birmingham University.

The Lexicon was compiled from existing resources: CRL-LKB and the COBUILD dictionary database. Both have restricted availability and contain extensive syntactic, semantic and morphological information.

The lexicon contains 22,000 morphological units, of which 12998 are common nouns, 40 proper nouns 4195 verbs, 3208 adjectives, 606 adverbs, 71 adpositions, 2 articles, 21 conjunctions, 25 determiners, 53 pronouns.

The English PAROLE lexicon comprises the following information:

- morphological encoding for all nouns, verbs, adverbs, adjectives and functions words;

- syntactic encoding of all verbs, nouns, adjectives and adverbs.

The organizational procedure was as follows:

1. Selection: Lemmata were mostly selected on the basis of frequency from the COBUILD corpus. Most proper nouns were deselected and some verbs were added because of the decision to encode deverbal nominalisations and compound information.
2. Coverage: the headword list was checked against the resources to make sure there was adequate coverage of syntactic and morphological information.
3. Composition: the nominal lemmata were checked for derivations and compounds. These were extracted and analyzed into their constituent parts and compounds were checked for lexicalisation. Components were flagged with their base forms and grammatical class.
4. Conversion: Morphosyntactic information was either directly transferred from existing resources or, in the case of inflectional information and subcategorisation patterns, programs were written to extract information and convert it into the PAROLE format.
5. Cross-reference: all components contained in nominal derivations and compounds were cross-referenced with their base PoS.

Integrity checks were made and the lexicon was parsed using nsgmls.

Version 1.0
Distributor ELRA