Resource: BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

Reference BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
Date of Submission Nov. 15, 2021, 8:12 p.m.
Status accepted
ISLRN 038-972-205-386-7
Resource Type Primary Text
Media Type Text
Source
Language Egyptian Arabic
Format/MIME Type text/plain
Size 104462 KB
Access Medium Web Download
Description

<h3>Introduction</h3>
<p>BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the <a href="https://www.colorado.edu/lab/clear/">University of Colorado Boulder - CLEAR (Computational Language and Education Research)</a> and consists of propbank annotation on Egyptian Arabic discussion forum (DF), SMS/Chat and conversational telephone speech (CTS) data.</p>
<p>The DARPA <a href="https://www.ldc.upenn.edu/collaborations/current-projects/bolt">BOLT</a> (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference.</p>
<h3>Data</h3>
<p>DF data was collected from the web using a manual process. SMS/Chat material was donated or collected via live platforms. CTS data was taken from LDC's Egyptian Arabic CALLHOME and CALLFRIEND telephone collections.</p>
<p>Propbank annotation provides a layer of semantic annotation over treebank. In this release, it was applied to BOLT phrase structure treebank annotation and was carried out in two phases: (1) a frame file for each predicate was created, and (2) the predicate argument structure was annotated using the frame file as a reference.</p>
<p>Annotation files are presented as UTF-8 encoded and are in either plain text or XML formats.</p>
<h3>Sponsorship</h3>
<p>This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.</p>
<h3>Samples</h3>
<p>Please view this <a href="desc/addenda/LDC2021T18.prop.txt">PropBank sample (TXT)</a> and <a href="desc/addenda/LDC2021T18.frame.xml">frame sample (XML)</a>.</p>
<h3>Updates</h3>
<p>None at this time.</p>

Version 1.0
Creator Martha Palmer , Tim O'Gorman , Claire Bonial , Jena D. Hwang , James Gung , Aous Mansouri
Distributor Linguistic Data Consortium
Rights Holder Portions © 1996, 1997, 2002, 2012-2017, 2019, 2021 Trustees of the University of Pennsylvania