This document explains the statistical information contained in the Archive Report Cards, generated by archiveReportCard.php.
The archive star rating is a representation of the average item score for
the archive. It is caluculated:
round( (Average item score out of 10)/2 )
to give a star rating out of five.
For the subject and type fields, these percentages show:
Diversity = (Distinct code values / Number instances of element) * 100
This gives an indication of the diversity of the information held by the
archive.
Graph showing the frequency of record scores within the archive.
The quality of metadata is assessed against best practice guidelines as at http://www.language-archives.org/REC/olac-extensions.html as well as the existence of certain XML elements according to their usage statistics. Each item receives a score between 0 and 10, used for results ordering.
The scoring of metadata is contained in the source file metadataScoring.php.
For each element which has an associated extension code from a controlled vocabulary, one point is scored if a code attribute is used. This is converted into a proportion of elements which use codes against the total elements in a record which have an associated controlled vocabulary.
Code exists score =This returns a fraction of code usage between 0 and 1.
Points are deducted when a record does not contain any instances of elements which are deemed important to any metadata record. The following elements have been deemed necessary in every record based upon element usage:
For each of these elements which is absent, a score of (1/5) is deducted from the record score. This implies equal weighting of the deduction of points for absence of any of the core elements.
Element absent deductions =This results in a score between 0 and 1.
These scores are then weighted:
Score = 10 * ( (1/1) * (code exists score) - (1/5) * (element absent deductions) )
to return an integer score out of 10 for each record. These scores are held in
a table relating each item to a score out of 10. At the time of searching, this
score is combined with the element usage score to order search results.
See archiveReportCard.php for a summary of record quality scores across OLAC archives.
The percentage of records which have n of the core elements present at least once.
Percentage of records which contain the named elements at least once. Red highlights elements which are not used in all records from this archive.
Displays the number of times a element (which has an associated code attribute) was used by the archive, and the percentage of those elements which used a code attribute. Red highlights elements which did not contain code attributes in all instances of that element.
Number of times a element is used. Where applicable, the number of times that a code attribute is used with that element. Red highlights elements which do not use attributes in all instances of that element.
Recommended metadata extensions http://www.language-archives.org/REC/olac-extension.html |
|
Baden Hughes, 2004. Metadata Quality Evaluation: Experience from the Open Language Archives Community. Proceedings of the 7th International Conference on Asian Digital Libraries (ICADL 2004). Lecture Notes on Computer Science 3334. pp 320-329. Springer-Verlag. | |
 l | Baden Hughes and Amol Kamat, 2005. A Metadata Search Engine for Digital Language Archives. DLib Magazine 11(2), February 2005. [Online] |