OLAC Record: MUSCIMA++

OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-2372

Metadata

Title: MUSCIMA++

Bibliographic Citation: http://hdl.handle.net/11372/LRT-2372

Creator: Hajič jr., Jan

Date (W3CDTF): 2017-08-25T08:18:16Z

Date Available: 2017-08-25T08:18:16Z

Description: MUSCIMA++ is a dataset of handwritten music notation for musical symbol detection. It contains 91255 symbols, consisting of both notation primitives and higher-level notation objects, such as key signatures or time signatures. There are 23352 notes in the dataset, of which 21356 have a full notehead, 1648 have an empty notehead, and 348 are grace notes. For each annotated object in an image, we provide both the bounding box, and a pixel mask that defines exactly which pixels within the bounding box belong to the given object. Composite constructions, such as notes, are captured through explicitly annotated relationships of the notation primitives (noteheads, stems, beams...). This way, the annotation provides an explicit bridge between the low-level and high-level symbols described in Optical Music Recognition literature. MUSCIMA++ has annotations for 140 images from the CVC-MUSCIMA dataset [2], used for handwritten music notation writer identification and staff removal. CVC-MUSCIMA consists of 1000 binary images: 20 pages of music were each re-written by 50 musicians, binarized, and staves were removed. We had 7 different annotators marking musical symbols: each annotator marked one of each 20 CVC-MUSCIMA pages, with the writers selected so that the 140 images cover 2-3 images from each of the 50 CVC-MUSCIMA writers. This setup ensures maximal variability of handwriting, given the limitations in annotation resources. The MUSCIMA++ dataset is intended for musical symbol detection and classification, and for music notation reconstruction. A thorough description of its design is published on arXiv [2]: https://arxiv.org/abs/1703.04824 The full definition of the ground truth is given in the form of annotator instructions.

Identifier (URI): http://hdl.handle.net/11372/LRT-2372

Language: No linguistic content

Language (ISO639): zxx

Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)

Rights: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

http://creativecommons.org/licenses/by-nc-sa/4.0/

Subject: Optical Music Recognition

Music Notation

Graph-Based Representation

Symbol Detection

Type: corpus

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Description: http://www.language-archives.org/archive/lindat.mff.cuni.cz

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:lindat.mff.cuni.cz:11372/LRT-2372

DateStamp: 2021-06-29

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Hajič jr., Jan. 2017. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: dcmi_Text iso639_zxx olac_primary_text

http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-2372
Up-to-date as of: Sun May 4 0:11:18 EDT 2025

Metadata
Title:		MUSCIMA++
Bibliographic Citation:		http://hdl.handle.net/11372/LRT-2372
Creator:		Hajič jr., Jan
Date (W3CDTF):		2017-08-25T08:18:16Z
Date Available:		2017-08-25T08:18:16Z
Description:		MUSCIMA++ is a dataset of handwritten music notation for musical symbol detection. It contains 91255 symbols, consisting of both notation primitives and higher-level notation objects, such as key signatures or time signatures. There are 23352 notes in the dataset, of which 21356 have a full notehead, 1648 have an empty notehead, and 348 are grace notes. For each annotated object in an image, we provide both the bounding box, and a pixel mask that defines exactly which pixels within the bounding box belong to the given object. Composite constructions, such as notes, are captured through explicitly annotated relationships of the notation primitives (noteheads, stems, beams...). This way, the annotation provides an explicit bridge between the low-level and high-level symbols described in Optical Music Recognition literature. MUSCIMA++ has annotations for 140 images from the CVC-MUSCIMA dataset [2], used for handwritten music notation writer identification and staff removal. CVC-MUSCIMA consists of 1000 binary images: 20 pages of music were each re-written by 50 musicians, binarized, and staves were removed. We had 7 different annotators marking musical symbols: each annotator marked one of each 20 CVC-MUSCIMA pages, with the writers selected so that the 140 images cover 2-3 images from each of the 50 CVC-MUSCIMA writers. This setup ensures maximal variability of handwriting, given the limitations in annotation resources. The MUSCIMA++ dataset is intended for musical symbol detection and classification, and for music notation reconstruction. A thorough description of its design is published on arXiv [2]: https://arxiv.org/abs/1703.04824 The full definition of the ground truth is given in the form of annotator instructions.
Identifier (URI):		http://hdl.handle.net/11372/LRT-2372
Language:		No linguistic content
Language (ISO639):		zxx
Publisher:		Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:		Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Rights:		http://creativecommons.org/licenses/by-nc-sa/4.0/
Subject:		Optical Music Recognition
		Music Notation
		Graph-Based Representation
		Symbol Detection
Type:		corpus
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:		http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:lindat.mff.cuni.cz:11372/LRT-2372
DateStamp:		2021-06-29
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Hajič jr., Jan. 2017. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms:		dcmi_Text iso639_zxx olac_primary_text