Resource: Maltese-English website parallel corpus (Processed)

Reference Maltese-English website parallel corpus (Processed)
Date of Submission March 2, 2020, 11:46 a.m.
Status accepted
ISLRN 693-091-524-649-2
Resource Type Primary Text
Media Type Text
Source
Language English, Maltese
Format/MIME Type application/x-tmx+xml
Size 26622 translationUnits
Access Medium downloadable
Description

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu.
This is a parallel corpus of bilingual texts crawled from multilingual websites, which contains 26,622 TUs.
Date of crawling : 16/12/2016
A strict validation process has been followed, which resulted in discarding:
- TUs from crawled websites that do not comply with the PSI directive,
- TUs identified during the manual validation process and all the TUs from websites which error rate in the sample extracted for manual validation are strictly above the following thresholds:
50% of TUs with language identification errors,
50% of TUs with alignment errors,
50% of TUs with tokenization errors,
20% of TUs identified as machine translated content,
50% of TUs with translation errors.

Version 2.0
Distributor ELRA
Rights Holder Government of Malta , Treasury Department of the Government of Malta , Ministry for Justice, Culture and Local Government, Government of Malta , Malta - EU Steering & Action Committee , Parlament Ta' Malta , Agenzija Zghazagh, Government of Malta , Commerce Department, Government of Malta , Malta Medicine Authority , Broadcasting Authority, Malta , Ministry for Finance, Malta , Malta Police Force - Government of Malta , Public Administration HR Office, Government of Malta , Ministry for Education and Employment, Malta , Malta Communications Authority , Ministry for Gozo, Malta