Avoiding XC Fails in Self-PLP §

There are times when we need to split a file cleanly, so that all records that are dupes end up in one file, and all remaining records are directed to another file. 

The 'Self-PLP: Avoid XC Fails' run option will make this possible. 

If this option is selected, Self-PLP runs more or less normally on a file: each record is processed according to the match rules, records that match and pass all crosschecks are merged together, etc.  

The difference is that Self-PLP will redirect records that fail crosschecks to the NoMatch file, instead of the XC Fail file. As a result, there will be only two principal files to deal with at the end of the run1)

  1. An 'Splp-Matches' file: containing the records that matched  
  2. An 'Splp-NoMatches' file: containing both no matches and XC Fails 

When using the 'Avoid' option, statistics about XC Fails will be logged to the report file, but no records will be output to the XC Fail file; no assumptions should thus be made about the records that are found in the NoMatch file–its likely that a fair few of them will be duplicates that are resolvable in the MCU. 

Back To Top

Identify 'Complex' XC Fails in Self-PLP §

A second run option, 'Self-PLP: Identify Complex XC Fails', will behave very similarly to the one above: all XC Fails, except those covered by the situation described below, will be redirected to the NoMatch file. 

Consider the following example: four records with the same LCCN, and two of these records have an OCLC number, but that OCLC number is different in both records. (The records pass all other crosschecks). Schematically, this example may be illustrated as: 

Record A: LCCN 99123456
Record B: LCCN 99123456   OCLC 12345678
Record C: LCCN 99123456   OCLC 23456789
Record D: LCCN 99123456 

Self-PLP processes the file sequentially. For each record, Self-PLP searches its database and groups matching records together; and if there are matches, it processes the crosschecks.  

In the example above, Record A is crosschecked against Record B, and then against Record C, and finally against Record D. As all of Record A's crosschecks succeed (as record A's lack of an OCLC number does not fail the match with Records B and C), all records are queued to be merged. 

The problem here is that although it may be OK to merge Record B to Record A, or to merge Record C to record A, etc., it is not a good idea to merge both records B and C to Record A–as they have different OCLC numbers (which might be important, you never know:-).  

Fortunately, Self-PLP does know about this problem, so before it runs the merge code, it crosschecks all of the 'siblings' of the match against one another: Record B will be crosschecked against Record C, and Record D; Record C will be crosschecked against Record D. 

If any records fail this second phase of the crosschecks, then Self-PLP will invalidate these records. For example, Record B and Record C will not pass crosschecks against each other, because of the OCLC number difference; but Record D will still succeed, and be merged to Record A. 

But what of B and C?  

Since the library may want to know about these types of issues, Self-PLP will, after Record D is merged to Record A, output the remaining group of matching records (A, B, C) to the XC Fail file, so the problem can be reviewed in MCU. 

This is the default behavior. 

If you do not want matching records being re-directed to the XC Fails in cases like the above, then you should select the 'Avoid XC Fails' option. 

If, on the other hand, you want to find only the matching record groups with the problem described above, then select the run option 'Self-PLP: Identify complex XC Fails'. 

1) apart from the 'Splp-Merge-Froms' file which is always created–as a diagnostic–when records match and merge
self-plp/avoiding_xcfails.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed