'List Search': Using MARC Review to search a list of items

If you have a long list of items that you want to search, its possible to search them in a single step instead of creating a new pattern for each item, or entering each item by hand.

For example, you could search your database for a list of LCCNs or ISBNs, or a list of OCLC symbols, or a list of call numbers, subject headings, and so on. Or, you could validate the contents of a field against a value list or vocabulary.

To search a list of items in your MARC file, a text file meeting the following criteria is needed:

  • Each item, or string, in the list must be on a separate line.
  • Each item should be entered exactly as it would be entered in the DATA box of a MARC Review pattern–this includes regular expressions where applicable.
  • The list must not contain any null (empty) lines, as a null line represents the end of the list to the program.
  • The maximum list size is 10,000 items. If your list is larger than this, an error message will appear and the list will be rejected1).


In addition to the above, if you plan on saving this review, make sure the file containing the list of items is in a folder that is not likely to be moved. The reason for this is that, in a saved review, MARC Review saves the name of the file, and if the file were to be moved, the saved review would become invalid.

Once you have created the text containing your list, start MARC Review, goto the Pattern form, and enter the TAG, and SUBFIELD where the data from the list will be found.

Tab down to the DATA box, and right-click on it. Navigate to the text file that contains your list, and select it. MARC Review will validate the list, and if there are no problems, display the filename in the 'Data' box and flip the 'Data' caption to blue (a signal that what is present is a filename and not literal data):

Now, proceed to setup the output options, then run the job as you would any other review.

The performance of a list search is not, perhaps, as optimized as it might be in a database, but it will not take that much longer than a typical review.

'List search' options

The List search feature supports the normal Case sensitive option. If selected, items in the list will be matched against the MARC data with regard to case (eg. 'cancer' will not match '$aCancer'); if not selected, the program disregards case during the matching process.

The List search feature does not support regular expressions per se. Instead, the 'regular expression' checkbox is changed to one that reads Whole words only. If this option is selected, then items in your list must exactly match the MARC data (eg. 'Cancer' will not match 'Cancer research'); if not selected, then the matching process is left-anchored: items in your list must appear at the beginning of the MARC fields specified in the pattern (eg. 'Cancer' matches 'Cancer research' but not 'Lung Cancer').

Keep in mind, that whatever options are selected (or not), the selection will be applied to each item in the list.

Only two Rules are supported in a List search: 'And', and 'Not'. Use 'And' when you are looking for records that match items in your list (searching); use 'Not' when you are looking for records that do not match items in your list (validation).

If you have a list and need to use regular expressions, then a workaround would be to use 'Embedded booleans'–see the MARC Review help for how to do this.

tba


Case Study: Using the MARC Review 'List search' feature to find errors reported by MARC Analysis

Here is a practical application of MARC Review's list search feature.

MARC Analysis (one of the free utilities included with MARC Report) is a useful tool for discovering information about the MARC Databases we work with. It can also highlight problem areas, and even very specific errors in our records.

For example, the following except from a MARC Analysis run on a newspaper index database of 300,000 records revealed these strings and occurrence counts in the 008 Date 1 element:

        29 O:   1
        3   :   1
        33  :   1
        4   :   28
        4-JA:   1
        5   :   71
        5 Ma:   1
        6   :   21
        7   :   120
        7/JA:   1
        8   :   83
        8997:   1
        9   :   1
        91  :   1
        94  :   17
        95  :   11
        9550:   1
        96  :   6
        97  :   4
        98  :   3
        994 :   13
        995 :   13
        996 :   4
        997 :   3
        AR 1:   1
        E 19:   5
        ER 1:   2
        June:   1
        Sept:   1
        Spri:   2
        T 19:   2
        UG 1:   3
        UL 1:   2
        V 19:   1
        Y 19:   3
        er 1:   7
        erfo:   1
        ly -:   1
        r 19:   2
        t 19:   1
        th c:   1
        ug 1:   1
        une :   7
        y 19:   5
        y/Ma:   1

Although it was enlightening for the customer to find out about these problems, they (of course) then wanted to get a file containing only these records so they could fix them.

Here are the steps we followed to match and extract only the records with the bad 008/Date1 fields:

  1. Cut out the rows containing the 008 Date1 errors from the MARC Analysis results and paste them into a new text file
  2. Open that file in a text editor and remove the leading spaces and everything after the colon; save the file
  3. Start Marc Review, press Next, enter '008', press TAB, click on 'Date 1', then press Save
  4. Back on the pattern form, right-click on the Data box, and select the text file created above (#2)
  5. Press Next, click 'MARC Output', then 'Matching records only', and Next, then Run.

Thats it!

Note: as the data being searched in this example is fixed (i.e. each item is four bytes), no regular expression is necessary in the pattern.

1) If this happens, open the list in your text editor, make it smaller, and load it again
help/mr_list_search.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed