Resource: GlobalPhone Bulgarian Pronunciation Dictionary 260k entries (extended version)
Reference | GlobalPhone Bulgarian Pronunciation Dictionary 260k entries (extended version) |
Date of Submission | April 6, 2017, 5:50 p.m. |
Status | accepted |
ISLRN | 799-402-906-876-5 |
Resource Type | Lexicon |
Media Type | Audio |
Source | |
Language | Bulgarian |
Access Medium | Downloadable |
Description | This extended version of the Bulgarian Pronunciation Dictionary called Bulgarian-Dict260k contains pronunciations of more than 260,000 word forms. The dictionary matches in phone set and format the original GlobalPhone Bulgarian Pronunciation Dictionary (see ELRA-S0351) of 20,000 word forms. Bulgarian-Dict260k was built based on the extension of the Bulgarian GlobalPhone text database to improve language modeling and to reduce the high Out-Of-Vocabulary rate resulting from the rich morphology of the Bulgarian language. For this purpose, roughly 9 Million word tokens were collected from the internet sources of national, international, and economic news available from the online newspapers "Banker" (http://www.banker.bg/), "Kesh" (http://www.cash.bg), and �Sega" (http://www.segabg.com/). After text cleaning and normalization, all word forms were extracted. Pronunciations were created in an automatic process using hand-crafted grapheme-to-phoneme rules. The generated pronunciations were manually cross-checked by native speakers, correcting potential errors of the automatic generation. |
Version | 1.0 |
Distributor | ELRA |