Resource: Persian Lexicon

Reference Persian Lexicon
Date of Submission Jan. 24, 2014, 4:30 p.m.
Status accepted
ISLRN 547-614-436-004-7
Resource Type Lexicon
Media Type Text
Language Persian

This is a Persian (Farsi) lexicon of more than 40,000 entries of non-inflected forms of words. Each word is transliterated based on the proposed framework from MBROLA (Text-To-Speech synthesizer). The database includes a large variety of descriptors for each entry (plural, homograph, ...).

This lexicon has been made out from a corpus of newspaper publications collected during a period of six months from the Shargh Newspaper, a publication containing articles from diverse topics: art, culture, policy, social, sport, etc. Due to its coverage, this lexicon can be in particular interesting for Persian TTS systems, as the pronunciation of Persian words cannot be derived directly from their transcription due to the omission of short vowels in Persian writing systems.

The number of records is distributed as follows:
Adjectives: 11,955
Adverbs: 2,047
Classifiers: 164
Conjunctions: 129
Indexes: 85
Names: 36,651
Numbers: 88
Verb-Past Stem: 455
Verb-Present Stem: 435
Prepositions: 223
Pronouns: 141
Semi-Sentence: 352

The lexicon is provided in a MS Access database.

Version 1.0
Distributor ELRA