OLAC Record: CELEX2

OLAC Record
oai:www.ldc.upenn.edu:LDC96L14

Metadata

Title: CELEX2

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Baayen, R H., R Piepenbrock, and L Gulikers. CELEX2 LDC96L14. Web Download. Philadelphia: Linguistic Data Consortium, 1995

Contributor: Baayen, R H.

Piepenbrock, R

Gulikers, L

Date (W3CDTF): 1995

Description: *Introduction* CELEX2 contains updated versions of the CELEX lexical databases of English (Version 2.5), Dutch (Version 3.1) and German (Version 2.0) developed by the University of Nijmegen, the Institute for Dutch Lexicology in Leiden, the Max Planck Institute for Psycholinguistics in Nijmegen, and the Institute for Perception Research in Eindhoven. For each language, this data set contains detailed information on: * orthography (variations in spelling, hyphenation) * phonology (phonetic transcriptions, variations in pronunciation, syllable structure, primary stress) * morphology (derivational and compositional structure, inflectional paradigms) * syntax (word class, word class-specific subcategorizations, argument structures) * word frequency (summed word and lemma counts, based on recent and representative text corpora) The databases were not tailored to fit any particular database management program. They are presented in ASCII files in a UNIX directory tree that can be queried with tools such as AWK or ICON. Unique identity numbers allow the linking of information from different files. Some information must be computed online; where necessary, AWK functions are provided to recover this information. README files specify the details of their use. A detailed User Guide describing the lexical information available is included in the documentation accompanying this release. All sections of this guide are POSTSCRIPT files except for some additional ASCII notes on the German lexicon. *Data* This release contains an enhanced, expanded version of the German lexical database (2.5). Approximately 1,000 new lemmas were added for a total of 51,728; their inflected forms number 365,530. Also included are revised morphological parses, verb argument structures, inflectional paradigm codes and a corpus type lexicon. A complete PostScript version of the Germanic Linguistic Guide is included in the documentation accompanying this release. Phonetic syllable frequencies were added for the English and Dutch databases along with frequency information alongside every lexical feature. No other changes were made to these lexicons. Complete AWK-scripts are provided to compute representations not found in the ASCII lexical data files corresponding to the features described in CELEX User Guide. *Samples* Please view these samples: * German Orthography, Lemmas * German Phonology, Lemmas * German Morphology, Lemmas * German Syntax, Lemmas * German Frequency, Lemmas * German Orthography, Wordforms * German Phonology, Wordforms * German Morphology, Wordforms * German Frequency, Wordforms * German Corpus Types *Updates* Petra Stiener has developed a number of scripts to modify and update CELEX2 to a modern format. They are available on her github page. LREC papers related to these updates are accessible at the following urls: http://aclweb.org/anthology/W17-7619 & http://www.lrec-conf.org/proceedings/lrec2016/summaries/761.html.

Extent: Corpus size: 287744 KB

Identifier: LDC96L14

https://catalog.ldc.upenn.edu/LDC96L14

ISBN: 1-58563-085-3

ISLRN: 204-698-863-053-1

DOI: 10.35111/gs6s-gm48

Language: English

German

Dutch

Language (ISO639): eng

deu

nld

License: CELEX Agreement: https://catalog.ldc.upenn.edu/license/celex-user-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC96L14

Rights Holder: Portions © 1995, 1996 Centre for Lexical Information, © 1995, 1996 Trustees of the University of Pennsylvania

Subject: German language

English language

Dutch language

Subject (ISO639): deu

eng

nld

Type (DCMI): Text

Type (OLAC): lexicon

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC96L14

DateStamp: 2026-05-07

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Baayen, R H.; Piepenbrock, R; Gulikers, L. 1995. Linguistic Data Consortium.
Terms: area_Europe country_DE country_GB country_NL dcmi_Text iso639_deu iso639_eng iso639_nld olac_lexicon

Inferred Metadata
Country: Germany United Kingdom Netherlands
Area: Europe

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC96L14
Up-to-date as of: Wed Jul 8 7:30:24 EDT 2026

Metadata
Title:		CELEX2
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Baayen, R H., R Piepenbrock, and L Gulikers. CELEX2 LDC96L14. Web Download. Philadelphia: Linguistic Data Consortium, 1995
Contributor:		Baayen, R H.
		Piepenbrock, R
		Gulikers, L
Date (W3CDTF):		1995
Description:		Introduction CELEX2 contains updated versions of the CELEX lexical databases of English (Version 2.5), Dutch (Version 3.1) and German (Version 2.0) developed by the University of Nijmegen, the Institute for Dutch Lexicology in Leiden, the Max Planck Institute for Psycholinguistics in Nijmegen, and the Institute for Perception Research in Eindhoven. For each language, this data set contains detailed information on: * orthography (variations in spelling, hyphenation) * phonology (phonetic transcriptions, variations in pronunciation, syllable structure, primary stress) * morphology (derivational and compositional structure, inflectional paradigms) * syntax (word class, word class-specific subcategorizations, argument structures) * word frequency (summed word and lemma counts, based on recent and representative text corpora) The databases were not tailored to fit any particular database management program. They are presented in ASCII files in a UNIX directory tree that can be queried with tools such as AWK or ICON. Unique identity numbers allow the linking of information from different files. Some information must be computed online; where necessary, AWK functions are provided to recover this information. README files specify the details of their use. A detailed User Guide describing the lexical information available is included in the documentation accompanying this release. All sections of this guide are POSTSCRIPT files except for some additional ASCII notes on the German lexicon. Data This release contains an enhanced, expanded version of the German lexical database (2.5). Approximately 1,000 new lemmas were added for a total of 51,728; their inflected forms number 365,530. Also included are revised morphological parses, verb argument structures, inflectional paradigm codes and a corpus type lexicon. A complete PostScript version of the Germanic Linguistic Guide is included in the documentation accompanying this release. Phonetic syllable frequencies were added for the English and Dutch databases along with frequency information alongside every lexical feature. No other changes were made to these lexicons. Complete AWK-scripts are provided to compute representations not found in the ASCII lexical data files corresponding to the features described in CELEX User Guide. Samples Please view these samples: * German Orthography, Lemmas * German Phonology, Lemmas * German Morphology, Lemmas * German Syntax, Lemmas * German Frequency, Lemmas * German Orthography, Wordforms * German Phonology, Wordforms * German Morphology, Wordforms * German Frequency, Wordforms * German Corpus Types Updates Petra Stiener has developed a number of scripts to modify and update CELEX2 to a modern format. They are available on her github page. LREC papers related to these updates are accessible at the following urls: http://aclweb.org/anthology/W17-7619 & http://www.lrec-conf.org/proceedings/lrec2016/summaries/761.html.
Extent:		Corpus size: 287744 KB
Identifier:		LDC96L14
		https://catalog.ldc.upenn.edu/LDC96L14
		ISBN: 1-58563-085-3
		ISLRN: 204-698-863-053-1
		DOI: 10.35111/gs6s-gm48
Language:		English
		German
		Dutch
Language (ISO639):		eng
		deu
		nld
License:		CELEX Agreement: https://catalog.ldc.upenn.edu/license/celex-user-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC96L14
Rights Holder:		Portions © 1995, 1996 Centre for Lexical Information, © 1995, 1996 Trustees of the University of Pennsylvania
Subject:		German language
		English language
		Dutch language
Subject (ISO639):		deu
		eng
		nld
Type (DCMI):		Text
Type (OLAC):		lexicon
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC96L14
DateStamp:		2026-05-07
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Baayen, R H.; Piepenbrock, R; Gulikers, L. 1995. Linguistic Data Consortium.
Terms:		area_Europe country_DE country_GB country_NL dcmi_Text iso639_deu iso639_eng iso639_nld olac_lexicon
Inferred Metadata
Country:		Germany United Kingdom Netherlands
Area:		Europe