MATCHING ITEMS FROM A LIST IN MARC REVIEW

MARC Review has the ability to match MARC Data (specified on the pattern form) against a list of items entered into a text file.

The first step is decide what 'kind' of match you want to perform: simple string matching, or value list matching.

SIMPLE STRING MATCH

Simple string matching is the default MARC Review match behavior. This type of matching simply checks for the presence of a string anywhere in the MARC data specified on the pattern form.

For example, if the pattern form specifies

TAG=650
SUBF=a
DATA=librar
CASE=False

then the program will find all records that contains a 650 $a with the term 'librar' in it:

$aDigital libraries $aFriends of the library $aInternational librarianship $aLibrarians $aLibraries and people with disabilities $aLibrary catalogs

We can anchor a match to the beginning or end of the MARC data by using regular expressions. Continuing with the 650 $a example, if we change DATA to:

DATA=^librar
CASE=False
RegEx=True

it then matches only subfields beginning with the string:

$aLibrarians $aLibraries and people with disabilities $aLibrary catalogs

Matching to the end of a field is a bit harder, because a pattern like:

DATA=librar.*$
CASE=False
RegEx=True

will still match a heading like:

$aLibraries and people with disabilities

because of the 'greediness' of the regular expression support in the program. So what we would have to do is something like this:

DATA=libraries$||library$||librarians$||librarianship$
CASE=True
RegEx=True

turning on case-sensitivity so that we do not match '$aLibraries' (additionally, we could add a blank space in front of our search terms).

VALUE LIST MATCH

Value list string matching was added to the program in version 236. The purpose of a value list is to support a controlled vocabulary. Examples of value lists are everything from the 'MARC Code List for Languagess' to the Library of Congress Subject Headings.

Ideally, MARC fields that are to contain data from a controlled vocabulary should be entered using dropdown menus that contain all available values. For example, in MARC Report we may click on the 008 element for 'Language' and press <F1> to select from a list of all valid Language codes. However, this type of data entry may not be feasible for a large list of subject headings, which may contain many thousands (or hundreds of thousands) of items.

Searching of value lists in MARC Review is somewhat different than the default string matching described above. Whereas above, we asked the question 'is the specified data (“librar”) present in the field we are searching (“650$a”)?', value list support asks the question 'is the field I am searching present in the value list I have specified?'.

Value list matching is always left-anchored in MARC Review. We assume, by definition, that a subject heading of

650 $aLibraries.

should never match a value list item like

Technical services (Libraries)

Thus, we do not match strings within strings when validating a term in a MARC record against a value list. This has a benefit in that we can programmatically support very large lists when the search term is left-anchored.

Also, in value list matching there is no Regular expression option, as matching is always left-anchored.

Instead, there are the options 'Partial' and 'Complete'. If the 'Partial' option is selected, then the MARC field

650 $aLibraries.

would match all of the following items from a LCSH value list:

Libraries
Libraries (Rooms)
Libraries and adult education
Libraries and booksellers
Libraries and colleges
Libraries and community
Libraries and distance education.
Libraries and education
Libraries and electronic publishing
Libraries and families.
...
Libraries, Medical

But if the 'Complete' option was selected, then the MARC field above would match only the value list item

Libraries

Note: the Case sensitive option is also supported in value list matching.

phelp/helplistmatch.txt · Last modified: 2011/03/24 16:57 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed