OLAC Record
oai:catalogue.elra.info:ELRA-W0325

Metadata
Title:Wojood - A corpus for nested Arabic Named Entity Recognition
Access Rights: Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):2022-09-27
Date Issued (W3CDTF):2022-09-27
Description:Wojood consists of about 550,000 tokens (Modern Standard Arabic and dialect) that are manually annotated with 21 entity types (person, group of people, occupation, organization, geopolitical entity, location, facility, event, date, time, language, website, law, product, cardinal number, ordinal number, percent, quantity, unit, money, currency). It covers multiple domains (Media, History, Culture, Health, Finance, ICT, Law, Elections, Politics, Migration, Terrorism, social media) and was annotated with nested entities. The corpus contains about 75K entities and 22.5% of which are nested. The corpus was annotated using the IOB2 tagging scheme and is available in CSV format.
Identifier:ELRA-W0325
ISLRN: 688-718-284-176-0
Identifier (URI):https://catalog.elra.info/en-us/repository/browse/ELRA-W0325/
Language:Arabic
Language (ISO639):ara
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0325
DateStamp:  2022-09-27
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2022. ELRA (European Language Resources Association).
Terms: dcmi_Text iso639_ara olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0325
Up-to-date as of: Fri Apr 19 6:30:17 EDT 2024