MARC Global: Split Long Tags

“Split long tags” breaks long fields into smaller fields.

Some OPACs may truncate a display field at a certain number of characters, and some systems may return an error when trying to load a record with a very long field. This MARC Global job will identify these long fields and split them into shorter fields.

In the top part of the form, enter a pattern that specifies the length of tags to match.

Split long tags--Options

Break at

The 'Break At' option is a number that contains the requested maximum length of the tags created by the 'split' processing. This may be the same as the maximum length specified in your pattern, or it may be shorter–but it cannot be longer.

Field link and sequence number

The 'Use Field link' option uses MARC subfield $8, Field Link and sequence number, to order the tags. This is checked by default. In our testing, we found the resulting data easily became jumbled up without this information.

Try to break

The '(Try to) break on a delimiter' option modifies the default behavior, described above. When this is checked, whenever the program finds a 'break at' position, it continues searching (backward through the string) until it finds a MARC subfield delimiter. If the new position falls within a pre-defined range (currently set to 20% of whatever the 'Break At' option is set to), the next tag created will begin with the subfield identified; else the original break point (the first blank space) will be used. This option is only useful for fields containing many subfields, like Enhanced 505s. If its more important that the result fields be of equal length, then do not select this option.

Add/retain blank

The 'Add/retain blank space' option tells the program whether to leave a blank space at the end of each new tag it creates, except the last. Depending on how your system reconstructs these fields for display, a blank space might be needed when the fields are re-joined.

Notes and caveats

First, please keep in mind that this is a machine process, and the split fields produced may not be 'pretty' in some cases. It is generally not possible to get fields that exactly match the 'break at' length using this routine; but all result fields, except for the last, should be approximately the 'break at' length, while never exceeding.

This task was designed for MARC fields that contain “words” (like the 505 or 520); thus, fields that do not contain blank spaces cannot be split using this option.

The minimum break at position is 100 bytes.

To preserve a copy of the original tag before it is split, run the 'Copy a Tag' task before running the 'Split long tags' task. (It might be useful to first run MARC Analysis on the file to get a list of all unused tags, and to check whether copying these long tags might overflow the MARC record length boundary).

If the tag length pattern specified on the top part of this form matches a field that has been linked (using Linkage subfield $6, that field will be excluded from the split operation. Subfield $6 implies that there is another field in the record which would also need to be split in an exactly parallel manner (which might be impossible to programatically determine if the linked field was in a different script).

This option may not work as expected with long fields that have already been split, since there is no way to tell if, for example, two 505s in an existing record are the result of a previous split by a vendor process (none of the example records we have seen take measures to indicate the sequencing of split tags). Some manual re-ordering may be neccessary in this case, especially if the previously split tags are not in order to begin with.

You may wonder if it makes sense to set the 'break at' position to a value smaller than the length specified in the pattern. For example, if you know that your opac display chokes on fields longer than 4000 bytes, then breaking the tag at that point could conceivably generate a tag that is only a couple of bytes long (if the original tag was, say, 4008 bytes long). But breaking these tags at, for example, 3800 bytes, would mean that the shortest length of a split tag would be about 201 bytes, which should create a readable portion of text. Using this option wisely may require a bit of trial and error: find out if you have any tags with lengths that are near the 'break at' position and perhaps divert them to a separate file and use a separate pass of this job to split them.

245/splitlongtags.txt · Last modified: 2021/12/29 16:21 (external edit)
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki