MCU: Basic Concepts §

The MCU operates on a file of MARC records that have been processed by PLP or Self-PLP. Among other things, PLP sequences the records in the file so that MCU can display them in logical groups. The default 'logic' of these groups is that each one is composed of records that matched on a primary key (ISBN, LCCN, TITLE, etc), but failed on one or more crosschecks.  

Important concept 1: There can be more than two records in a group of matching records. Although we are used to thinking of duplicates as pairs of records, this is not always the case. Depending on the size and nature of the database being searched, this assumption may be the exception rather than the rule. The number of records in a match group are limited only by the corresponding limits set in the PLP match rules1)

Important concept 2: The more that you know about PLP, the better you will be able to use MCU. The two programs are complimentary. Although PLP generates files for MCU, all of your work in MCU should feed back into the rules that PLP uses, so that the next PLP run creates a more refined set of records for review in MCU, and so on. 

Record sequencing §

The mechanism used by PLP to order the file for MCU is very straightforward. PLP adds a 952 tag to every record it processes. This field contains the following five subfields: 

  • $a: The Tag/Subfield of the primary key; eg. $a010a 
  • $b: The data used in the query; eg. ' 2002044056' 
  • $c: The number of records in the match group; eg. '003' 
  • $d: The sequence number of the current record in the group; eg. '001' 
  • $e: A unique identifier for each record in the file 

Notes and Examples 

Subfield $b may contain a list of values. For example, if a record contains more than one ISBN, PLP will search all of them in the database. In this case, the 952 $b may contains a string of numbers separated by semi-colons 

952 9|$a020a$b0340751274; 0340751282; 1559706872$c003$d000$e1003250100000135000 

Subfield $c and subfield $d contain normalized or zero-justified numbers (like a MARC tag, they will always contain three digits). The counter in $c begins with 1; the counter in $d begins with 0. In the 952 above, we are looking at the first record ($d000) of three ($c003). 

In the 952 below, we are looking at the third record ($d0022)) of three ($c003): 

952 9|$a020a$b0340751274; 0340751282; 1559706872$c003$d002$e1003250100000135002 

Note: to avoid conflicting with a locally-defined 952 field, PLP uses indicators of '9|'. 

The unqiue identifier in the 952 $e is composed of three parts: 

  • the PLP/MCU Session ID: 10032501 
  • a normalized record counter: 00000135 
  • the sequence number from $d: 002 

The Session Id is the fundamental piece of data used to organize PLP/MCU data on your computer.  

Next: MCU Sessions 

1) In theory, there can be up to 999 records in a match group; in the current version of PLP, record groups are limited to 99. The purpose of this limit is to protect the database–if a SQL query retrieves more than 99 records, the record is dumped to a 'Too Many' hits file, and the query is abandoned
2) ie. counting 0, 1, 2
mcu/concepts.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed