OLAC Record: SenSem Databank

OLAC Record
oai:www.ldc.upenn.edu:LDC2015T02

Metadata

Title: SenSem Databank

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Fernández, Ana, and Gloria Vázquez. SenSem Databank LDC2015T02. Web Download. Philadelphia: Linguistic Data Consortium, 2015

Contributor: Fernández, Ana

Vázquez, Gloria

Date (W3CDTF): 2015

Date Issued (W3CDTF): 2015-01-15

Description: *Introduction* SenSem (Sentence Semantics) Databank was developed by GRIAL, the Linguistic Applications Inter-University Research Group that includes the following Spanish institutions: the Universitat Autonoma de Barcelona, the Universitat de Barcelona, the Universitat de Lleida and the Universitat Oberta de Catalunya. It contains syntactic and semantic annotation for over 35,000 sentences, approximately one million words of Spanish and approximately 700,000 words of Catalan translated from the Spanish. GRIAL's work focuses on resources for applied linguistics, including lexicography, translation and natural language processing. Each sentence in SenSem Databank was labeled according to the verb sense it exemplifies, the type of complement it takes (arguments or adjuncts) and the syntactic category and function. Each argument was also labeled with a semantic role. Further information about the SenSem project can be obtained from the GRIAL website at http://grial.uab.es/sensem/corpus. *Data* The Spanish source data includes texts from news journals (30,000 sentences) and novels (5,299 sentences). Those sentences represent around 1,000 different verb meanings that correspond to the 250 most frequent Spanish verbs. Verb frequencies were retrieved from a quantitative analysis of around 13 million words. The Catalan corpus was developed by translating the news journal portion of the Spanish data set, resulting in a resource of over 700,000 sentences from which 391,267 sentences were annotated. Sentences were automatically translated and manually post-edited; some were re-annotated for sentence complements. Semantic information was the same for both languages. The Catalan sentences represent close to 1,300 different verbs. Data is presented in a single XML file per language. *Samples* Please view this sample. *Updates* None at this time.

Extent: Corpus size: 98736 KB

Identifier: LDC2015T02

https://catalog.ldc.upenn.edu/LDC2015T02

ISBN: 1-58563-701-7

ISLRN: 969-347-223-333-4

DOI: 10.35111/94px-a654

Language: Spanish

Catalan

Language (ISO639): spa

cat

License: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 (NFP, Non-Member): https://catalog.ldc.upenn.edu/license/creative-comons-attribution-noncommercial-sharealike-3-dot-0-unported.pdf

LDC For-Profit Membership Agreement: https://catalog.ldc.upenn.edu/license/ldc-for-profit-membership.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2015T02

Rights Holder: Portions © 2015 Dr. Ana Fernandez Montraveta, Dr. Gloria Vázquez-Garcia, Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2015T02

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Fernández, Ana; Vázquez, Gloria. 2015. Linguistic Data Consortium.
Terms: area_Europe country_ES dcmi_Text iso639_cat iso639_spa olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2015T02
Up-to-date as of: Wed Oct 29 7:01:30 EDT 2025

Metadata
Title:		SenSem Databank
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Fernández, Ana, and Gloria Vázquez. SenSem Databank LDC2015T02. Web Download. Philadelphia: Linguistic Data Consortium, 2015
Contributor:		Fernández, Ana
Contributor:		Vázquez, Gloria
Date (W3CDTF):		2015
Date Issued (W3CDTF):		2015-01-15
Description:		Introduction SenSem (Sentence Semantics) Databank was developed by GRIAL, the Linguistic Applications Inter-University Research Group that includes the following Spanish institutions: the Universitat Autonoma de Barcelona, the Universitat de Barcelona, the Universitat de Lleida and the Universitat Oberta de Catalunya. It contains syntactic and semantic annotation for over 35,000 sentences, approximately one million words of Spanish and approximately 700,000 words of Catalan translated from the Spanish. GRIAL's work focuses on resources for applied linguistics, including lexicography, translation and natural language processing. Each sentence in SenSem Databank was labeled according to the verb sense it exemplifies, the type of complement it takes (arguments or adjuncts) and the syntactic category and function. Each argument was also labeled with a semantic role. Further information about the SenSem project can be obtained from the GRIAL website at http://grial.uab.es/sensem/corpus. Data The Spanish source data includes texts from news journals (30,000 sentences) and novels (5,299 sentences). Those sentences represent around 1,000 different verb meanings that correspond to the 250 most frequent Spanish verbs. Verb frequencies were retrieved from a quantitative analysis of around 13 million words. The Catalan corpus was developed by translating the news journal portion of the Spanish data set, resulting in a resource of over 700,000 sentences from which 391,267 sentences were annotated. Sentences were automatically translated and manually post-edited; some were re-annotated for sentence complements. Semantic information was the same for both languages. The Catalan sentences represent close to 1,300 different verbs. Data is presented in a single XML file per language. Samples Please view this sample. Updates None at this time.
Extent:		Corpus size: 98736 KB
Identifier:		LDC2015T02
		https://catalog.ldc.upenn.edu/LDC2015T02
		ISBN: 1-58563-701-7
		ISLRN: 969-347-223-333-4
		DOI: 10.35111/94px-a654
Language:		Spanish
Language:		Catalan
Language (ISO639):		spa
Language (ISO639):		cat
License:		Creative Commons Attribution-NonCommercial-ShareAlike 3.0 (NFP, Non-Member): https://catalog.ldc.upenn.edu/license/creative-comons-attribution-noncommercial-sharealike-3-dot-0-unported.pdf
License:		LDC For-Profit Membership Agreement: https://catalog.ldc.upenn.edu/license/ldc-for-profit-membership.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2015T02
Rights Holder:		Portions © 2015 Dr. Ana Fernandez Montraveta, Dr. Gloria Vázquez-Garcia, Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2015T02
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Fernández, Ana; Vázquez, Gloria. 2015. Linguistic Data Consortium.
Terms:		area_Europe country_ES dcmi_Text iso639_cat iso639_spa olac_primary_text