Differences
This shows you the differences between two versions of the page.
phelp:helpverify [2015/10/27 00:39] |
phelp:helpverify [2021/12/29 16:21] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | MARC VERIFY | ||
+ | MARC Verify provides a quick way to remedy common errors in MARC files, along with a few utilities for users interested in batch processing. Under normal circumstances, | ||
+ | |||
+ | WHEN TO USE MARC VERIFY | ||
+ | |||
+ | Use this utility on your file if you get a 'MARC error' when running MARC Report. | ||
+ | |||
+ | WHAT IT DOES | ||
+ | |||
+ | The main result of running MARC Verify will be a clean USMARC file that can be read by any software that reads USMARC data. This file will be written to your Batch Report directory and named ' | ||
+ | |||
+ | HOW IT WORKS | ||
+ | |||
+ | MARC Verify reads each byte of the input file and writes it to an output file. During this process, it attempts to remove any non-MARC formatting from the input file. | ||
+ | |||
+ | If the incoming byte is a null character (x00) or DOS EOF character (x1A), it is converted to a blank space (x20) before being written to the output file. | ||
+ | |||
+ | If the program encounters a MARC record terminator in a position that does not agree with where the leader says the terminator should be, and the leader appears to be correct, the ' | ||
+ | |||
+ | Any bytes that are found in between records (ie. bytes that occur after the MARC record terminator (x1D) and before the next leader), or bytes that occur at the end of the file following the last MARC record, are not written to the output file. | ||
+ | |||
+ | Also, while reading the file, if a record is found that cannot be processed (ie the record was truncated, or the MARC structure of the record is in error), the record, or part thereof, is written to a textfile instead of the output file. We refer to this as a ' | ||
+ | |||
+ | VERIFICATION OPTIONS | ||
+ | |||
+ | Remove records with MARC errors: This option appears grayed out because it is always in effect. As noted above, a MARC error occurs when the MARC structure (leader, directory, data) of the record is incomplete or in error. | ||
+ | |||
+ | Each record with a MARC error is dumped to a separate textfile named MarcErr### | ||
+ | |||
+ | Remove records with null characters: Select this option to remove records that contain null characters or multiple MARC record terminators from the file. The records will be written to a file called ' | ||
+ | |||
+ | Remove record status = ' | ||
+ | |||
+ | Remove non-bibliographic records: Select this option to remove all records where the leader Record Type value is one of ' | ||
+ | |||
+ | Remove, or Count, records with invalid indicators/ | ||
+ | |||
+ | Remove, or Count, records without an 001: Select the ' | ||
+ | |||
+ | Remove, or Count, records without an Local Holdings tag: Select the ' | ||
+ | |||
+ | Remove, or Count, records without an a System Control tag: Select the ' | ||
+ | |||
+ | Remove, or Count, records with Record Length >= x, where x represents a valid MARC record length from 40 to 99999. This option allows you to remove or count records greater than a certain size; removed records will be written to a file called ' | ||
+ | |||
+ | To specify a tag in either of the two preceding options, enter the three digit MARC tag, (optionally) followed by the subfield code. For example: ' | ||
+ | |||
+ | NOTE: For each of the above options, records are removed from the source file in the order that the options are listed. For example, if you have selected both Remove records without an 001, and Remove records without the Local Holdings tag 852, if a record lacks both the 001 and the 852, it will be written to ' | ||
+ | |||
+ | Reset Leader: Vendors often ' | ||
+ | |||
+ | By default, MARC Verify does not change any byte in the leader (with the exceptions of a null character, or a DOS EOF character, each of which will automatically be converted to a blank). | ||
+ | |||
+ | However, if you want to clean-up ' | ||
+ | |||
+ | The second ' | ||
+ | 000/08: Type of control (usually blank) | ||
+ | 000/09: Character coding scheme (usually blank, unless unicode) | ||
+ | 000/17: Encoding level (various codes apply; Note that blank=Full Level record) | ||
+ | 000/18: Descriptive cataloging form (usually ' | ||
+ | 000/19: Linked record requirement (usually blank) | ||
+ | |||
+ | Note that the 'Reset Leader' | ||
+ | |||
+ | Identify unused local fields: This option will list, at the end of the Verify run, every 9xx tag that has not been used in the file. This may be helpful during database processing, where sometimes data must be added/ | ||
+ | |||
+ | NOTE: You can also get a list of unused 9XX fields (and a whole lot more) by running the MARC Analysis utility with the default options. | ||
+ | |||
+ | Control Code translation: | ||
+ | |||
+ | NOTE: This option is safe for UTF-8 data because Control Code values are not used in UTF-8. This option would NOT ordinarily be safe for true unicode data (eg. UTF-16 as opposed to UTF-8). But because the current version of MARC Report does not recognize an unicode file header (if you try to select a file with a BOM header using MARC Report, the 'This does not seem to be a MARC file' error will pop up), this potential problem is averted. | ||
+ | |||
+ | Repair Terminator Problems: If selected, the utility will repair fields that include multiple field terminators (x1E). It will also try to fix records where the field terminator has been omitted from the last field, and records where the record terminator was counted as part of the length of the last field. By default, this option is not selected. NOTE: This option is incompatible with ' | ||
+ | |||
+ | Repair Delimiter Problems: In this section, the term ' | ||
+ | control fields that begin with subfield $a (the $a will be deleted) | ||
+ | control fields that contain a subfield $a in position 3 and 4 (the first four bytes will be deleted) | ||
+ | control fields that contain the delimiter byte (it will either be deleted or replaced with a blank) | ||
+ | variable fields that begin with subfield $a (two blank spaces (indicators) will be inserted) | ||
+ | variable fields that contain an ' | ||
+ | variable fields that begin with alphabetic data (two blanks spaces and subfield $a will be inserted) | ||
+ | variable fields that contain two consecutive delimiters (the second delimiter will be deleted) | ||
+ | variable fields that contain a dollarsign after the indicators (the dollarsign will be replaced by a delimiter byte) | ||
+ | variable fields for numbers that begin with subfield $A ($a is replaced with $a) | ||
+ | indicator positions that contain invalid indicators (replace the invalid indicator(s) with blank space(s); a valid indicator is defined as a blank space, the numbers 0-9, or the fill character) | ||
+ | Any changes made via this option are logged to a file called vrfy_fix_log.txt (in the same directory as the results). By default, this option is not selected. | ||
+ | |||
+ | NOTE: As some systems use alphabetic indicators to mark local fields, the indicator fix for invalid indicators is no longer applied to any tag containg a ' | ||
+ | |||
+ | Pause on MARC error: If selected, MARC Verify will stop on any MARC error and open a message window which contains the following information: | ||
+ | |||
+ | Stop processing if MARC error count exceeds [10]: MARC Verify will stop reading the file if more than this number (default is 10) of MARC errors are encountered. The reason for this limit is that each record with a MARC error is dumped to a separate text file, and this limit ordinarily prevents thousands of records from being dumped in this manner if there is a problem with the file. | ||
+ | |||
+ | TMQ Option 1 | ||
+ | |||
+ | This option is not selected by default. If selected, this option changes the behavior of two items in Verify: 1) how the default results filename is generated, and 2) whether or not a machine-readable statistics report is generated. | ||
+ | |||
+ | ALPHABETIC TAG SUPPORT | ||
+ | |||
+ | MARC21 does not support alphabetic tags, although the ISO standard (2709) upon which MARC21 is based does make a provision for this. The default action in MARC Verify is to treat any non-numeric character in the directory as an error, and to dump the record to a text error file. Therefore, if you have a file containing alphabetic tags, you will need to translate the alphabetic tags to numeric tags in order to use the file with MARC Report. | ||
+ | |||
+ | NOTE: We are using ' | ||
+ | |||
+ | The ' | ||
+ | |||
+ | ALPHABETIC TAG TABLE FORMAT | ||
+ | |||
+ | The format of the alphabetic tag conversion table must be (exactly) as follows: a 3-byte alphabetic tag in uppercase, followed by an ' | ||
+ | |||
+ | A01=961 | ||
+ | CAT=962 | ||
+ | FIN=963 | ||
+ | FMT=964 | ||
+ | LCS=965 | ||
+ | SRC=966 | ||
+ | |||
+ | Anything that does not match this pattern will be ignored. Thus, you can add comments before or after each line in the table; however, do not add anything on a line containing a valid entry (unless you wish to disable it). | ||
+ | |||
+ | Only uppercase alphabetic tags are supported at present. The table does not need to be in alphabetical order, although that is probably a good idea for ease of maintenance. | ||
+ | |||
+ | The numeric tags used on the right side of this table do not need to be 9XX tags--any numeric tags greater than ' | ||
+ | |||
+ | NOTES | ||
+ | |||
+ | You can create as many conversion tables as you want. Save these conversion tables to the folder: MyDocuments\MarcReport\Options. The program will use whatever conversion table was most recently selected (if any). To switch from one conversion table to another, you must toggle the ' | ||
+ | |||
+ | If the data field referenced by the directory entry for an alphabetic tag lacks indicators or subfields, then two blank spaces and a subfield $a will be inserted into the tag. | ||
+ | |||
+ | When the program concludes, brief statistics on the alpha-to-numeric conversion will be added to the verify log. These statistics are added to the logfile only--they do not appear in the window that appears when the verify session completes. Therefore, check the log after every run, since any alphabetic tag in a record that is not in the conversion table will force the record to be regarded as an error and removed from the file. | ||
+ | |||
+ | If you need help getting a list of alphabetic tags for a file, check the MARC Report program folder for the utility called ' | ||
+ | |||
+ | COPY ALPHABETIC TAGS TO $9 | ||
+ | |||
+ | If selected, any 3-byte alphabetic tags that are converted will be copied to a subfield $9 and inserted at the start of the corresponding numeric field (ie. the first subfield, following the indicators) in the record when it is output. This option has no effect unless ' | ||
+ | |||
+ | |||
+ | FIXING MARC ERRORS | ||
+ | |||
+ | The text file that is created for a MARC error has the following format: | ||
+ | |||
+ | 000 00921nam | ||
+ | 001 | ||
+ | 003 DLC | ||
+ | 005 20000227211754.0 | ||
+ | 008 880509s1988 | ||
+ | 010 | ||
+ | 020 | ||
+ | 040 | ||
+ | 042 | ||
+ | 050 00$aQL737.M3$bB46 1988 | ||
+ | 082 00$a599.2$220 | ||
+ | 100 1 $aBender, Lionel. | ||
+ | 245 10$aKangaroos and other marsupials /$cLionel Bender ; illustrations, | ||
+ | TAG $aNew York : | ||
+ | TAG $a31 p. :$bill. (some col.) ;$c30 cm. | ||
+ | TAG 0$aFirst sight | ||
+ | TAG 0$aMarsupials$vJuvenile literature. | ||
+ | TAG 0$aKangaroos$vJuvenile literature. | ||
+ | TAG 1$aMarsupials. | ||
+ | TAG 1$aKangaroos. | ||
+ | TAG 1 $aThompson, George, | ||
+ | TAG 1 $aRobson, Denny. | ||
+ | TAG 1 $aStidworthy, | ||
+ | |||
+ | --- | ||
+ | Record Number: 4 | ||
+ | File Offset: 3965 | ||
+ | Last Good EOR: 3042 | ||
+ | |||
+ | In this example, all of the data in the record was recovered, but not all of the tags. The word ' | ||
+ | |||
+ | It is quite easy to import this record into a MARC Report Edit Session. To do this, open the MarcErr textfile, edit the record (if necessary, but being careful to preserve the simple formatting outlined above) and copy it to your clipboard (select the record with your mouse, then press < | ||
+ | |||
+ | Next, run an Edit Session in MARC Report on the MARC file you want to add this record to, and press <F9>. The record will be added to the session. You can then edit the record again in MARC Report if necessary. | ||
+ | |||
+ | Alternately, | ||
+ | |||
+ | |||
+ | PROBLEMS WITH RECORDS GREATER THAN 99999 BYTES | ||
+ | |||
+ | Some software that exports MARC records from library systems does not seem to realize that the largest number that is valid in the record length defined by MARC21 is 99999. We have seen leaders with record lengths that are 6 digits (not a good idea, as all offsets then become misaligned), | ||
+ | |||
+ | This happens more and more as systems export serial records with huge numbers of holdings. Alas, if only we switched to MarcXml five years ago :-) | ||
+ | |||
+ | Best wishes aside, Verify will not able to make sense of these records, and it is going to try to skip the record and dump it to a text ' | ||
+ | |||
+ | In this case, after picking up the pieces, you might try the following. Start Verify, fill out the options form, and save it--a suggested name is ' | ||
+ | |||
+ | customBlocksize=1024000 | ||
+ | |||
+ | Save the file. Start Verify, click Options|Load, | ||
+ | |||
+ | customBlocksize=5120000 | ||
+ | |||
+ | Strange, but we've found this technique actually gets Verify to process files to completion that it would not process otherwise. |