BATCH REPORTS

The 'Batch Reports' or batchmode option allows you to run all of MARC Report's error-checking capability against a file of MARC records. This option processes the currently selected MARC file, and on completion, generates a report of the problems found. This report may be configured using the options that follow.

Notes

In version 247, a number of options were added requiring the addition of a second options form for batch mode. This form is launched by pressing “More display/output options” (on the left, in the 'Report Display' section); 'Help' for all of the options on this new form is found therein.

Also in 247, the following three options previously found on the main Batch option page have been moved to the new 'More options' form:

Include Tag Occurrence numbers, 
Add Message Ids to Notes, and 
Show Full Pathnames of 'Sets'

REPORT ORGANIZATION

There are two ways to organize a Batch Report: 'By Record' and 'By Problem'.

'By Record' reports problems as they are encountered, reading the file one record at a time.

'By Problem' first reads the file and finds all the problems, and then sorts the results according to the type(s) of problems found.

REPORT DISPLAY OPTIONS

There are three types of displays that are available in a Batch Report: 'Statistics Only', 'Brief Message', and 'Full Message'.

'Statistics Only' displays a summary of the results. For each problem found, it displays the brief message for the problem followed by a count of the number of times that problem occurred in the file.

'Brief Message' displays, for each problem, a brief description of the problem (this is the message that displays in the top right window when running in Record by Record mode).

'Full Message' displays both the brief message and a more descriptive Note. When a data element does not validate, this Note will contain a list of valid values. If the data element is Obsolete (etc.), this note will contain a full explanation of the problem.

CONTROL NUMBER

In the 'Control Number' field box you should enter the tag/subfield that represents a unique system identifier for your database. If you would like your reports to use the LCCN as a control number, you would enter '010a' here. The default is 001.

Note: If this field is set to a tag that is not present in a record, the program will (attempt to) display the contents of the 001. Also, if you specify a variable field and forget to specify a subfield (i.e. '010' instead of '010a') the program will revert to its default, the 001.

DISPLAY MARC FIELD

The default is to not display MARC data. To display the data from the MARC field that is being reported as a problem, check the 'Display Field Being Reported' option. Note that long MARC fields will be truncated; the default field length is 128 characters, but you can make this larger or smaller on the 'More display/output options' form. A tilde character (~) will be appended to any field in the report that is truncated.

TAG AND MESSAGE FILTER

This filter is applied against the brief messages that the program returns when it validates a record. It behaves in exactly the opposite way as the Cancelled Message feature–it returns only messages that MATCH the filter, and discards all others.

The purpose of this filter is to make it easy to target specific problem areas without having to perform major surgery on the cataloging check options–whether the objective is simply to review/research usage, or to undertake a subsequent cleanup project.

In its simplest case, if you specify a single tag (for example, '100'), Batch Mode will run as it usually does, but only error messages that begin with '100' will be reported.

LIST OF TAGS

In addition to single tags, this filter also accepts multiple tags separated by commas:

Example: 600,610,611,630,650,651

Do not enter a blank space after the commas.

This filter also accepts tag ranges, and XX-tag specifications:

Example: 600-651
Example: 6XX

And you can bundle all of the above in a single filter:

Example: 1XX,700-710

MESSAGE STRINGS

The 'Tag' filter will accept strings (i.e. partial brief messages). For example, if you enter “value is not RDA” in the Tag filter (without the quotes), the batch run will only return errors that contain the specified text:

100: Subfield $e value is not RDA 
700: Subfield $e value is not RDA
700: Subfield $i value is not RDA
710: Subfield $e value is not RDA

etc.

This makes it easy to find all messages with the same problem, even though they are in different tags.

Another good example of this type of filter would be to find all messages reporting a problem with character encoding; to achieve this objective, one would simply enter the phrase 'invalid character' into the filter box.

Strings are not case-sensitive, but should otherwise match the brief message text (including punctuation).

Occurrence specifications are ignored when matching brief message strings. So if you enter

  700: Subfield $e value is not RDA

it will match the following brief messages

  700-01: Subfield $e value is not RDA
  700-02: Subfield $e value is not RDA

but not this one:

  710: Subfield $e value is not RDA

Beginning in version 250, multiple strings can be combined if they are separated by a fill character, eg.:

needs matching 00|needs matching 33|value is not rda

MESSAGE IDS

Also, beginning with version 246 of MARC Report, the 'Tag' filter will accept a list of comma-delimited Message Ids, eg.:

47000002,47000009,47600002,47600004

This makes it possible to target a very specific error message or messages.

NOTES

If you want to mix message Ids and message strings in the same filter, be sure to use '|' as a delim and not a comma, eg.:

47000002|value is not rda|47600004

It is easy to set up something like this in the options and later forget that you have done so. For this reason, when you start a Batch Mode run and the program finds a 'Tag filter' has been defined, you will be asked if you want to activate it or not. If this prompt becomes annoying, simply clear the filter in the options.

The functionality of this filter is also available inside an Edit session–for more information, see the Brief Message Filter.

MARC FILE OUTPUT

There are two types of MARC file output available in a Batch Report run: 'Write records with errors', and 'Split'. MARC output is turned off by setting this option to 'None' (which is the default).

Click 'Write Records With Errors' to write every MARC record with errors to a MARC file. To additionally write all records without errors to a separate file, click 'Split'. The 'Split' option takes the source file and writes it into two new files, one containing records with errors, and the other containing records without errors.

MARC FILENAMES

Default names for each of these two MARC files described above can be specified in the edit boxes below. The program will write these files in the Batch Reports directory (which is set under the File and Directories tab).

ASSORTED PROCESSING OPTIONS

IGNORE RECORDS WITHOUT ERRORS

Check this box if you do not wish the reports to list records without errors. Checking this can create a more useful report when running on a relatively clean file.

NOTE: This option affects only the report itself. Checking this option will not affect the statistics produced by MARC Report, nor will it cancel the writing of good records to a MARC file when running the Split Option for MARC Output.

AUTOMATICALLY GENERATE REPORT FILENAME

If this option is selected, which is the default, the program will automatically generate a unique filename and proceed directly to the Batch Mode task. When the run completes, either the name of the report file, or the report itself, will be displayed, depending on the setting of the next option. If you want to set the name of the report file before Batch Mode begins, then disable this option.

NOTE: The format of the automatically generated filename is as follows:

marcrept-771701.txt

where '7' represents the year, '7' represents the month, '07' represents the day, and '01' is an automatically incrementing serial number. The months are given in hexadecimal, so October is 'A', November is 'B', and Decemeber is 'C'. Whenever the day changes, the serial number is reset to '01'.

AUTOMATICALLY SHOW REPORT ON COMPLETION

If this option is selected, which is the default, the program will automatically try to open the report when Batch Mode completes. The report will be opened in whatever program you have associated with the .txt file type (Notepad by default). If this behavior causes problems, then you should disable this option and see if the problems go away.

AUTOMATICALLY GENERATE REPORT SUITABLE FOR EXCEL

If this option is selected, a second version of the report will be created using the row and column format generally associated with programs that support tables, like Excel, but equally amenable to any SQL software. For details on configuring this output, vist the “More display/output options” form help. Note that the traditional report (whether By Problem, or By Record) will still be generated even if the Excel option is selected.

LIMITATIONS AND SUGGESTIONS FOR VERY LARGE FILES

In 'By record' mode, or 'Statistics-only' mode, there are no limits. Batch mode will run to completion on files of any size in these modes.

In 'By Problem' mode, a physical limit may be reached after a certain number of error messages accumulate (the number of records in the file does not affect the program). The reason for this is that all of the error-tracking performed in this mode must be held in memory until the file has been completely processed. The limit is difficult to specify because it depends to some degree on the machine being used. (In our tests, on an x64 laptop running Windows 10, very large clean files (eg. 8 million error messages from 5 million records) were fully processed without reaching this limitation.)

When the program reaches the point where all memory is exhausted, it should display an exception message that tells the user the number of the last record that was validated. If this happens, split the file into smaller pieces–using the Split utility–and then run batch mode on each piece.

Also note that–even in 'By Record' mode–a report can be generated (by turning on every display option) that is so large that it will be nigh impossible to open the usual Windows programs used for this (there are, however, third-party utilities that will open large text files without difficulty). Thus, the usefulness of reports generated by huge MARC files, apart from the 'Statistics-only' version, is questionable.

Therefore, when running batch mode on very large files, turn off messages for problems you are not interested in, because the configuration of the validation and cataloging checks have a direct impact on the number of messages generated. For example, start a batch mode run on your file, then interrupt after processing 100,000 records or so, and look at the results. There are probably large numbers of items that you are not interested in (such as obsolete indicator messages, local tag conventions, etc.). Simply turning these off–using the options on the validation and cataloging check pages–will create a much more meaningful, and manageable, report.

phelp/helpbatchreports.txt · Last modified: 2021/12/29 16:21 (external edit)
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki