GUM

Full Official Name: GUM: The Georgetown University Multilayer Corpus
Submission date: Jan. 17, 2022, 9:54 a.m.

GUM is an open source multilayer corpus of richly annotated texts from twelve text types (interviews, news stories, travel guides, how-to guides, academic writing, biographies, fiction, forum discussions, conversations, political speeches, CC Vlogs, textbooks). Annotations include: * Multiple POS tags, morphological features and lemmatization * Sentence segmentation and rough speech act * Document structure in TEI XML (paragraphs, headings, figures, etc.) * Normalized ISO date/time annotations * Speaker and addressee information (where relevant) * Constituent and (enhanced) Universal Dependencies syntax * Information status (given, accessible, new, split antecedent) * Entity and coreference annotation, including bridging anaphora * Entity linking (Wikification) * Discourse parses in Rhetorical Structure Theory and discourse dependencies

Creator(s)
Distributor(s)
Right Holder(s)