DiaLEX – Emirati (DiaLEX-UA)

Full Official Name: DiaLEX – Emirati (DiaLEX-UA)
Submission date: Dec. 4, 2023, 5:14 p.m.

The Emirati Arabic Full-Form Lexicon (DiaLEX-UA) is a comprehensive computational lexicon covering the Emirati Arabic dialect. Featuring over 28,000,000 forms for 29,000 lemmas, this full-form lexicon provides exhaustive treatment of all inflected forms. DiaLEX-UA has several features that make it ideally suited to support natural language processing applications for Emirati Arabic, especially morphological analysis and speech technology, including: 1. Extremely comprehensive coverage – over 28 million entries 2. Comprehensive treatment of all inflected forms, enclitics, proclitics, case endings, declensions, and conjugated forms. 3. Full and accurate diacriticization (vocalization), essential for speech technology. 4. Extensive coverage of variants which is necessary since dialects don't have a standard orthography. Please note: Phonetic transcriptions, IPA and/or SAMPA, fine-tuned to the licensor’s specifications, are available upon request. Quantity and size: 28,513,888 lines / 3,841 MB (3.8 GB) File format: flat TSV text files Samples and a specifications document are available upon request.

Creator(s)
Distributor(s)
Right Holder(s)