Resource: LC-STAR English-Hebrew (Israel) Bilingual Aligned Phrasal lexicon

Reference LC-STAR English-Hebrew (Israel) Bilingual Aligned Phrasal lexicon
Date of Submission Jan. 24, 2014, 4:30 p.m.
Status accepted
ISLRN 054-368-636-873-4
Resource Type Lexicon
Media Type Text, Audio
Source
Language English, Hebrew
Description

The LC-STAR English-Hebrew (Israel) Bilingual Aligned Phrasal lexicon was created within the scope of the LC-STAR project (IST 2001-32216) which was sponsored by the European Commission. It was designed for SST (Speech-to-Speech Translation) and ASR (Automatic Speech Recognition) applications.

The lexicon comprises 10,520 phrases from the tourist domain. It is based on a list of short sentences obtained by translation from a 10,449 US-English phrase corpus. The total number of unique separate words is 13,320.

The lexicon contains the following information:
- US-English phrase (orthography),
- its translation into Hebrew (orthography),
and for each token in Hebrew a phrase provides the following:
- orthography of a word,
- part of speech,
- lemma,
- whether the phrase is idiomatic or not,
- if a word is a foreign word. In this lexicon, foreign words were only tagged if they were written with foreign orthography (e.g. English characters).

The lexicon is provided in XML format. The database is stored on 1 CD.

Version 1.0
Distributor ELRA