phelp:helpsamerecorddupes

SAME RECORD DUPLICATE DATA

It is possible to use MARC Global to identify duplicate data within the same record.

Note: If you want to identify records in a file that have a duplicate title, match key (ISBN, LCCN, OCLC), or any other MARC Field, use the SORT Utility instead.

To run this type of task, start MARC Global, press Skip, select the 'Identify duplicate data' task, then press Next.

On the Options page, enter the tag that you want to check for duplicate data. You can specify a single tag, or an 'X' tag here (for example, 65X, 6XX, XXX).

NORMALIZATION

Several normalization options are also available on this form; these normalizations will be applied to each field before it is compared.

Ignore blanks–remove leading and trailing blanks, and compress two blanks to a single blank.

Ignore case–all data (except for MARC subfield codes) will be shifted to uppercase.

Ignore indicators–if whole tags are being compared, the indicators will be removed.

Ignore punctuation–all punctuation marks will be converted to blank spaces, and then multiple blanks will be compressed to a single blank.

Ignore subfield codes–all MARC subfield delimiters (x1F) and the byte that immediately follows them (hopefully a subfield code) will be replaced with a single blank space.

ADDITIONAL OPTIONS

Two additional options are available for this type of job. 'Find duplicate data only in matching tags' is applicable when when the TAG is 'XXX'. If not selected, which is the default, all duplicating data within the record is identified. If selected, then the search will discard all duplicate data fields that do not have the same tag number (for example, a 650 that is repeated as a 655 will not be reported as a duplicate because the tags are different).

The other option, 'delete duplicate data', will remove any duplicate data identified by the preceding options. There are some restrictions on this. First, only duplicate data in the same tag will be removed; therefore, if the TAG entered above includes an 'X', then the 'Find duplicate data only in matching tags' option must also be selected. Second, there is no way to tell the program which duplicate tag to delete–it will always delete the one that occurs second in the record. If a tag repeats three times, then the second and third duplicate copies of the tag are deleted, and so forth.

NOTES

This type of task can be saved to the saved reviews file.

Two types of Text Output are available for this task: 'Custom', and 'Full record'. For best results, set the Output Format to 'Custom record', and enter at least one tag (eg. a control number) in the 'Tags to Output' box. 'Full record' can also be used here, but that might make it difficult to quickly see which tags are being duplicated.

Some of the normalization options may have dependencies. For example, if a subfield is specified for the Tag being searched, then indicators are always ignored. On the other hand, if a subfield is specified for the Tag being searched, the 'Ignore subfield codes' option is effectively itself ignored by the program.

By default, an 'XXX' search will find all data that is repeated in a record, regardless of the tag. For example, if your library code is present in an 049 field, and again in a 9XX field, these fields will be displayed as dupes.

An 'XXX' search will usually turn up some unexpected results (its interesting to see how much data is repeated in some MARC records) For example: -Control number fields are often repeated (001 and 010; 035 and 9XX) -Call number fields are often repeated (050 and 090; 082 and 092) -Authors in 100 often appear in 600 and 700 fields -Title fields (22X-24X) often duplicate after normalization is applied -Edition (250) statements are sometimes repeated in XXX notes -Note (500) fields are often repeated as 650 fields (eg. 'Compact discs') -Subjects (6XX) commonly repeat verbatim with only a change in indicators, -Untraced series (490) headings are almost always repeated in 830 -Title (24X) and Series (4XX) headings are often repeated in 7XX fields

The list goes on.