Searching for special characters in MARC Review

There is now an easier way to find all records with special characters, such as diacritics, using MARC Review.

Start Marc Review, go to the Pattern form, and enter the following pattern:

If you run this review, it will match all records that have characters outside the normal ASCII range.

This review is possible because MARC Review now supports regular expression ranges that include non-printing characters. The example above uses the hex numbers: {x7F} and {xFF}; you can also use decimal notation: {127} and {255}.

At present (233 beta), this type of review is only possible for a whole record search (ie. a pattern where TAG='XXX').

If your records use MARC-8 encoding, you could refine this review to find all invalid characters:

TAG=XXX 
DATA= [{x01}-{x1A}{x1C}{x80}-{x87}{x8A}-{x8C}{x8F}{xAF}{xBB}{xBE}{xBF}{xC9}-{xDF}{xFC}{xFD}{xFF}] 
REGULAR EXPRESSION=True 

The character set specified above is simply the reverse of the MARC-8 characters listed as valid by LC: http://lcweb2.loc.gov/diglib/codetables/45.html

233/marc_review_diacritics.txt · Last modified: 2021/12/29 16:21 (external edit)
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki