OLAC Record oai:www.ldc.upenn.edu:LDC95T9 |
Metadata | ||
Title: | Spanish News Text | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Graff, David, and Gustavo Gallegos. Spanish News Text LDC95T9. Web Download. Philadelphia: Linguistic Data Consortium, 1995 | |
Contributor: | Graff, David | |
Gallegos, Gustavo | ||
Date (W3CDTF): | 1995 | |
Description: | The Spanish News Corpus consists of journalistic text data from one newspaper (El Norte, Mexico) and from the Spanish-language services of three newswire sources: Agence France Presse, Associated Press Worldstream, and Reuters. (The Reuters collection comprises two distinct services: Reuters Spanish Language News Service and Reuters Latin American Business Report). All text data are stored in a standard compressed form. The fours sets of newswire data (AFP, APWS and two Reuters services) are each organized as one data file per day of collection. The period covered by these collections runs from December 1993 (for APWS and Reuters) or May 1994 (APWS) through December 1995. (The El Norte data, provided to us by INFOSEL Mexico, are arbitrarily grouped into files of about 1 megabyte in size when uncompressed; date information is not available for individual articles, but the general period of the collection is 1993). The approximate amounts of data per source (when uncompressed) is indicated below (in total megabytes and millions of words of text): Source MB MW AFP 345 44 APWS 253 33 REUSL 333 41 REULA 233 23 INFOSEL 209 31 The presentation of text data in these collections is modeled on the TIPSTER corpus. Within each data file, SGML tagging is used (1) to mark article boundaries, (2) to delimit the text portion within each article and (3) to label various pieces of information about the article that are external to the text content (e.g. headlines, bylines and so on). The copyright holders of this text have requested that it be made available to LDC members only. Due to the release date this corpus is available to 1995 and 1996 members. In order to obtain this corpus, current LDC members must submit a signed User Agreement Form. | |
Identifier: | LDC95T9 | |
https://catalog.ldc.upenn.edu/LDC95T9 | ||
ISBN: 1-58563-056-X | ||
ISLRN: 673-814-501-585-5 | ||
DOI: 10.35111/4dex-xm86 | ||
Language: | Spanish | |
Language (ISO639): | spa | |
License: | LATINO 40 Agreement: https://catalog.ldc.upenn.edu/license/latino-40-license-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC95T9 | |
Rights Holder: | Portions © 1994 Agence France Presse, © 1993-1994 The Associated Press, © 1987-1993 INFOSEL, © 1993-1994 Reuters Latin American Business Report, © 1993-1994 Reuters Spanish Language News Service, © 1995 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC95T9 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Graff, David; Gallegos, Gustavo. 1995. Linguistic Data Consortium. | |
Terms: | area_Europe country_ES dcmi_Text iso639_spa olac_primary_text |