OLAC Record oai:www.ldc.upenn.edu:LDC2020T23 |
Metadata | ||
Title: | Corpus of Law, Academic, and News | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Mohammadi, Ariana Negar. Corpus of Law, Academic, and News LDC2020T23. Web Download. Philadelphia: Linguistic Data Consortium, 2020 | |
Contributor: | Mohammadi, Ariana Negar | |
Date (W3CDTF): | 2020 | |
Date Issued (W3CDTF): | 2020-10-15 | |
Description: | *Introduction* Corpus of Law, Academic, and News consists of 400 Persian documents divided into three genres: legal, academic, and news. The legal section contains texts from official publications, including the civil penal code, the criminal penal code, and the constitution of the Islamic Republic of Iran. The academic sub-corpus is comprised of published academic abstracts in various disciplinary areas, such as Art and Humanities, Social Sciences, and Natural Sciences. The news sub-corpus was extracted from an archive of ten Iranian news outlets spanning the period 2010- 2020. *Data* The document and token counts are as follows: 48 legal documents, 88,170 tokens; 274 academic documents, 85,765 tokens; and 78 news documents, 101,055 tokens. Each document contains metadata in the file's header with information such as specific text type, dates and source, and also contains annotations marking title and body paragraphs. All documents are presented as UTF-8 encoded XML with internal DTDs. *Samples* Please view this sample (XML). *Updates* None at this time. | |
Extent: | Corpus size: 4780 KB | |
Identifier: | LDC2020T23 | |
https://catalog.ldc.upenn.edu/LDC2020T23 | ||
ISBN: 1-58563-947-8 | ||
ISLRN: 903-821-836-195-4 | ||
DOI: 10.35111/wcbv-pj21 | ||
Language: | Persian | |
Language (ISO639): | fas | |
License: | Corpus of Law, Academic, and News Agreement: https://catalog.ldc.upenn.edu/license/corpus-of-law-academic-and-news-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2020T23 | |
Rights Holder: | Portions © 2020 Ariana N. Mohammadi, © 2020 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2020T23 | |
DateStamp: | 2021-01-01 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Mohammadi, Ariana Negar. 2020. Linguistic Data Consortium. | |
Terms: | dcmi_Text iso639_fas olac_primary_text |