Resource: NoReC

Reference Norwegian Review Corpus
Date of Submission Oct. 3, 2017, 2:47 p.m.
Status accepted
ISLRN 376-574-408-554-7
Resource Type Primary Text
Media Type Text
Source
Language Norwegian Nynorsk, Nynorsk, Norwegian
Size 35,194 documents / 14,819,248 tokens
Access Medium Web download
Description

The Norwegian Review Corpus (NoReC) was created for the purpose of training and evaluating models for document-level sentiment analysis. More than 35,000 full-text reviews (approx. 15 million tokens) have been collected from several major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author. The reviews are pre-processed using UDPipe and distributed in the CoNLL-U format.

Version 1.0.0
Creator Language Technology Group, Department of Informatics, University of Oslo
Distributor Language Technology Group, Department of Informatics, University of Oslo
Rights Holder Language Technology Group, Department of Informatics, University of Oslo