MARC-8 coding of foreign script and diacritics

In previous versions, MARC Report did not check 880 fields in records using the MARC-8 encoding technique.

Beginning in version 2.32, if the leader position 09 is set to '#' (a blank space), MARC Report will check every character in the 880 field (as well as as any field containing diacritics anywhere else in the record) against the specifications on the LC website:
MARC-8 Code Tables

The procedure used to check these international MARC-8 characters is:

  1. Convert the subfield into UTF-8
  2. Validate the UTF-8 version of the subfield

If an error is found, a brief message like 'Tag: Invalid MARC-8 sequence' will be displayed, and the note will give the approximate position where the first invalid character appears in the field. Note that some invalid characters may not be visible; in this case, save the record to a separate file (Press <F6>), then load that file in your favorite hex editor.

Viewing international script in MARC Report

To view international script in MARC Report, press <F5> in an Edit session. This action will convert the current MARC record into an XML document, then attempt to load the XML document into a mini-browser.

Pressing <F5> also adds a thorough validation of the character coding in MARC-8 records since it requires a full conversion of the record into UTF-8.

In the past, when pressing <F5> to view the record as XML, the program would silently ignore any encoding errors and decline to display the document. The reason for this was that an xml-compliant browser would choke if the xml version of the record contained any invalid UTF-8 data.

In version 2.32, when <F5> is pressed we will now testload the record into an in-memory xml document. This will allow MARC Report to display more detailed error messages if there is a problem parsing the UTF-8 character stream.

232/diacriticsmarc8.txt · Last modified: 2021/12/29 16:21 (external edit)
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki