![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC99T42 |
| Metadata | ||
| Title: | Treebank-3 | |
| Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
| Bibliographic Citation: | Marcus, Mitchell P., et al. Treebank-3 LDC99T42. Web Download. Philadelphia: Linguistic Data Consortium, 1999 | |
| Contributor: | Marcus, Mitchell P. | |
| Santorini, Beatrice | ||
| Marcinkiewicz, Mary Ann | ||
| Taylor, Ann | ||
| Date (W3CDTF): | 1999 | |
| Description: | *Introduction* This release contains the following Treebank-2 Material: * One million words of 1989 Wall Street Journal material annotated in Treebank II style. * A small sample of ATIS-3 material annotated in Treebank II style. * A fully tagged version of the Brown Corpus. and the following new material: * Switchboard tagged, dysfluency-annotated, and parsed text * Brown parsed text The Treebank bracketing style is designed to allow the extraction of simple predicate/argument structure. Over one million words of text are provided with this bracketing applied. *Data* The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation. These 2,499 stories have been distributed in both Treebank-2 (LDC95T7) and Treebank-3 (LDC99T42) releases of PTB. Treebank-2 includes the raw text for each story. Three "map" files are available in a compressed file (pennTB_tipster_wsj_map.tar.gz) as an additional download for users who have licensed Treebank-2 and provide the relation between the 2,499 PTB filenames and the corresponding WSJ DOCNO strings in TIPSTER. *Samples* Please view the following samples: * Part-of-Speech Tags * Dysfluency Annotation * Dysfluency Annotation & Part-of-Speech Tags * Dysfluency Annotation, Part-of-Speech Tags & Turns Joined * Syntactic Annotation * Syntactic Annotation & Part-of-Speech Tags *Updates* After publication, it was discovered that not all of the postscript (*.ps) files had been converted to pdfs and that some of the converted pdfs contained errors. For pdf copies of the documentation files, please go to addenda for a list of the files available. As of October 5, 2016 252 wsj files from Treebank-2 were added that were previously missing. As of February, 2017, 2,499 "raw" wsj files were added from Treebank-2 (LDC95T7). Corpus downoads after these dates will include these missing files. | |
| Extent: | Corpus size: 264192 KB | |
| Identifier: | LDC99T42 | |
| https://catalog.ldc.upenn.edu/LDC99T42 | ||
| ISBN: 1-58563-163-9 | ||
| ISLRN: 141-282-691-413-2 | ||
| DOI: 10.35111/gq1x-j780 | ||
| Language: | English | |
| Language (ISO639): | eng | |
| License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
| Medium: | Distribution: Web Download | |
| Publisher: | Linguistic Data Consortium | |
| Publisher (URI): | https://www.ldc.upenn.edu | |
| Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC99T42 | |
| Rights Holder: | Portions © 1987-1989 Dow Jones & Company, Inc., © 1993-1995, 1999 Trustees of the University of Pennsylvania | |
| Type (DCMI): | Text | |
| Type (OLAC): | primary_text | |
OLAC Info |
||
| Archive: | The LDC Corpus Catalog | |
| Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:www.ldc.upenn.edu:LDC99T42 | |
| DateStamp: | 2020-11-30 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Marcus, Mitchell P.; Santorini, Beatrice; Marcinkiewicz, Mary Ann; Taylor, Ann. 1999. Linguistic Data Consortium. | |
| Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_primary_text | |