MARC Report and subject validation

Subject heading validation is a new, somewhat experimental feature added in 242. It is controlled by a single checkbox named Validate subjects on the right-hand side of the Program Automation tab of the options:

If this option is selected, the program will attempt to validate subject headings in edit sessions.

By 'experimental' we mean test the functionality and feasibility of real-time validation of data via HTTP. Subjects were chosen because the amount of data needed is relatively small and manageable. (The initial goal was to provide real-time validation of name headings. But the dataset needed for that project is too large for our current resources. Hence this 'experiment').

The processing used for this validation is as follows: the program contacts the TMQ server (www.marcofquality.com) with a list of subject headings in the record currently being viewed or edited. The server then queries its database for the presence of these headings, and returns the result of each lookup. When the program receives these results, it then marks the TAG column (in the main record view) of each subject checked, as follows:

Bold font: True, the subject heading validated
Red font:  False, the subject heading did not validate
Gray font: the subject heading matched an alternate label

Here is a (fabricated) example that displays these three status types:

Childrens stories should be

Children's stories

and is thus rendered in red font as invalid. Fairytales is a 450 to

Fairy tales

and is thus rendered in gray font as an alternate heading.

At present, the following subjects sources are supported:

650 I2 = 0 $a    Validates against LCSH (from the skos-rdf file at id.loc.gov)
650 I2 = 1 $a    Validates against the above and the LC Children's Subjects (from id.loc.gov)
650 I2 = 2       Validates against the MESH descriptor file (from NLM)
655 $2=gsafd     Validates against the gsafd.mrc (from Northwestern University)
655 $2=lcfgt     Validates against the LC Genre/Form terms (from id.loc.gov)

NOTES

This feature is not active in Batch Mode.

Its important to note that this feature can be quickly turned on and off inside an Edit Session by clicking the 'S' button on the (bottom right of the) navigation panel:

If you cannot see the 'S' button you may need to increase the width of your Edit session window.

If you want to quickly scroll through a file of records, you should toggle Subject validation off. The program will not advance to the next record until it receives the search results from the server for the current record.

Re: LCSH, we validate only $a. Validating this subject source is problematic because of its design. There is no way to know if a given heading has an authority record or not. It would be much better if all possible valid subject headings were backed by authorities, and hopefully that will be the case in the future. Failing that though, we do not wish to make validating LCSH a full-time occupation, so we are just going to stick with $a for now.

Re: LCSH alternate labels (aka 'See-From'). If a 650 I2=0 heading does not validate, the same heading is then searched a second time in a table of alternate headings. Thus, subject validation will take longer if your I2=0 subjects do not validate (since each heading will be searched twice).

Re: MESH, we are able to provide a more comprehensive result, validating the $a and all $x subfields, with the exception of the MESH 'publication types' (similar to the LCSH form subdivisions but coded as $x instead of $v). The latter are removed from the subject string before validation.

If you have suggestions for files to add to the list above, please send them to us.

246:subject_validation

242/subject_validation.txt · Last modified: 2016/08/24 22:28 by richard
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed