Verify option to check for 'good records' within 'bad'


The Verify utility processes a MARC file one byte at a time1). When it sees the MARC end_of_record delimiter (x1D), it creates a stream of data that begins with the byte following the last successfully verified record, and ends with the x1D just found. The utility then tries to verify the MARC structure of that stream.

This has proven a good approach throughout the years, but it can be fooled by the following scenario. If, after a 'good' record has been parsed, the directory of the next succeeding record is truncated, such that a new record begins within the directory of the truncated record, Verify will spit out the 'second' record as an error, even if it proves to be valid.

To deal with this problem, (which we hope is rare!), we have added a new option to Verify called 'Check for good within bad':


When this option is selected, Verify will, in the above scenario, rescan the data stream looking for a piece of data that resembles the MARC leader, and if found, check whether that 'second' leader marks the beginning of a valid MARC record. In this case, the truncated directory will still be spit out as an error, but the valid record will not be lost.

The new option is turned off by default; we recommend that you enable it only when you are dealing with a problematic file–that is, a file with MARC errors, which, on inspection, do not have another obvious cause. A telltale sign of this problem is an empty 'MarcErr' record, like the following:

Filename: D:\Marc\test records.mrc
Record Number: 20
File Offset: 35019
Last Good EOR: 33596

Usually, what can be read of the record is printed above the three dashes.

1) This isn't completely true, since it reads the file from the disk in large chunks, and then processes each chunk one byte at a time
235/verify_good-inside-bad.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed