DiaLEX – Saudi Arabian Hijazi (DiaLEX-HA)

Full Official Name: DiaLEX – Saudi Arabian Hijazi (DiaLEX-HA)
Submission date: Dec. 4, 2023, 5:14 p.m.

The Hijazi Arabic Full-Form Lexicon (DiaLEX-HA) is a comprehensive computational lexicon covering the Hijazi Arabic dialect. Featuring over 21,000,000 forms for 30,000 lemmas, this full-form lexicon provides exhaustive treatment of all inflected forms. DiaLEX-HA has several features that make it ideally suited to support natural language processing applications for Hijazi Arabic, especially morphological analysis and speech technology, including: 1. Extremely comprehensive coverage – over 21 million entries 2. Comprehensive treatment of all inflected forms, enclitics, proclitics, case endings, declensions, and conjugated forms. 3. Full and accurate diacriticization (vocalization), essential for speech technology. 4. Extensive coverage of variants which is necessary since dialects don't have a standard orthography. Please note: Phonetic transcriptions, IPA and/or SAMPA, fine-tuned to the licensor’s specifications, are available upon request. Quantity and size: 20,247,655 lines / 2,835 MB (2.8 GB) File format: flat TSV text files Samples and a specifications document are available upon request.

Creator(s)
Distributor(s)
Right Holder(s)