Resource: Collins Multilingual database (MLD) - WordBank
|Reference||Collins Multilingual database (MLD) - WordBank|
|Date of Submission||July 18, 2016, 5:50 p.m.|
|Resource Type||Language description|
|Language||Arabic, Bengali, Chinese, Croatian, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Modern (1453-), Hindi, Italian, Japanese, Korean, Malayalam, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, Thai, Turkish, Ukrainian, Vietnamese|
The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank) and a multilingual set of sentences in 28 languages (the PhraseBank, distributed separately under reference ELRA-T0377).
The WordBank contains 10,000 words for each language (Arabic, Chinese, Croatian, Czech, Danish, Dutch, American English, British English, Finnish, French, German, Greek, Italian, Japanese, Korean, Norwegian, Polish, Portuguese (Iberian), Portuguese (Brazilian), Russian, Spanish (Iberian), Spanish (Latin American), Swedish, Thai, Turkish, Vietnamese, Hindi, Tamil, Bengali, Malayalam, Romanian, Ukrainian), XML-annotated for part-of-speech, gender, irregular forms and disambiguating information for homographs. An additional dataset of 10,000 headwords is included for 12 languages (Chinese, American and British English, French, German, Italian, Japanese, Korean, Iberian and Brazilian Portuguese, Iberian and Latin American Spanish).
All English headwords contain Cobuild learner’s dictionary style definitions and one or more examples of the word in context.
Lemmatized lists and verb tables are available for English, French, German, Spanish and Italian. Romanization is provided for Chinese, Japanese, Korean and Thai.
The corresponding audio files are available for 26 languages of the 32 languages (thus excluding Hindi, Tamil, Bengali, Malayalam, Romanian and Ukrainian) and are distributed in a package referenced ELRA-S0382.