Resource: An-Nahar Newspaper Text Corpus

Reference An-Nahar Newspaper Text Corpus
Date of Submission Jan. 24, 2014, 4:17 p.m.
Status accepted
ISLRN 083-457-618-309-8
Resource Type Primary Text
Media Type Text
Source
Language Arabic
Description

The An-Nahar Lebanon Newspaper Text Corpus comprises articles in standard Arabic from 1995 to 2000 (6 years) stored as HTML files on CDRom media. Each year contains 45 000 articles and 24 million words. Each article includes information such as title, newspaper's name, date, country, type, page, etc. For each year, the size in byte is as follows:
1995 : 128 MB
1996 : 138 MB
1997 : 152 MB
1998 : 140 MB
1999 : 130 MB
2000 : 118 MB

Version 1.0
Distributor ELRA