Customizing the Diacritics in MARC Report

This page describes diacritics customization in version 246 and later, including the changes to the options made in 248. For previous versions of the program,see this page.

In MARC Report, the diacritics menu is invoked by pressing <F3> when editing. The characters that are displayed when F3 is pressed may be customized using a plain text file.

In versions before 246, a file named keycodes.txt was distributed with the program and loaded from the installation folder when the session started. The code points in this file consist of all MARC-8 diacritics that do not use escape sequences, and their corresponding UTF8 code points (see: MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media ).

Beginning in version 246, all of the codes from keycodes.txt are embedded in the program itself (the file is no longer needed).

Customization of diacritics now applies exclusively to UTF-8 code points.

How diacritics work in MARC Report

In MARC Report, when the hot-key to show the diacritic menu is pressed (F3), the program displays a list of diacritic characters. When the user selects a character from the list and presses <Enter>, the selected diacritic is pasted into the record at the current cursor position.

The contents of the diacritic list is based on the value of Leader/09:

  • If Leader/09 is blank, only diacritics that have MARC-8 codes are included
  • If Leader/09 is 'a', only diacritics that are unicode are included

In the default setup, the lists are exactly the same.

However, you can create a custom list of UTF8 diacritics, and tell the program to use this list instead of the default. (The default list will always be used for MARC-8 records).

Customizing the diacritics

To create your own file of UTF-8 diacritics, add the code points for the diacritics (as described below) to a text file. For best results, the file should be saved (or moved) to your MARC Report Options folder (but it can be located anywhere)1).

The format of a custom diacritics file is:

column data length
Column 1 MARC-8 placeholder 2
Column 2 UTF-8 Code (hex) 6
Column 3 Character name or label Variable
Column 4 UTF-16 code (hex) 4

Each column should be separated from the next by a single Tab character. Do not enter a tab after the last column.

Notes about the format

Column 1 (MARC-8 placeholder) must contain '00'.

UTF-8 codes are presented by 4-byte or 6-byte hex sequences. Four-byte sequences must be padded to six by appending '00'.

The fourth column is needed only if you are using the Unicode diacritic tool. If you do not plan to use this, you need only three columns.

For example, to add the UTF-8 code for the Greek letter alpha (lowercase), enter:

00    CEB100    Greek Small Letter Alpha    03B1

The label–whatever is in the third column (here 'Greek Small Letter Alpha') will then display in the diacritics list. If selected, the code 'CEB1' will be pasted into the record at the cursor position.

You can use a spreadsheet program to make your list, and then save it as tab-delimited text. Just be sure to check it in a text editor afterwards, as sometimes this type of software adds punctuation to the output, or misinterprets text as formulas, etc.

Here's an example taken from the first few rows of the file named 'diacritics-common-western.txt', which is now distributed with the program:

M-82) UTF-8 Label UTF-16
00 C2A900 Copyright Sign 00A9
00 C3A900 Latin Small Letter E With Acute 00E9
00 CC8100 Combining Acute Accent 0301
00 C3B300 Latin Small Letter O With Acute 00F3
00 CC8400 Combining Macron 0304
00 C3AD00 Latin Small Letter I With Acute 00ED
00 C3A100 Latin Small Letter A With Acute 00E1
00 CC8800 Combining Diaeresis 0308
00 E28497 Sound Recording Copyright 2117

Its not difficult to find all of the information needed to produce a file like this on the web; some good sites are: Wikipedia (search for 'list of unicode characters'), Unicode consortium (go to the section called 'code charts'), and 'The digital Rosetta stone' (go to the section named 'unicode').

Options for diacritics

MARC Report 2.48 adds a new section for diacritics in the Record Display page of the Options:

The first option tells the program to use the visual diacritic tool instead of the default text/label diacritic tool. This option was on the left side of the Record Display tab in previous versions of the program.

Next comes the option to tell the program to use a custom diacritics file: click the filing cabinet button on the right and navigate to your file and select it. This capability was present in earlier versions but it required giving the custom diacritic file a specific name and placing it in a specific folder.

Next is the button that allows you to 'Test' the unicode tool. This is here so that you can see how it works before deciding to use it. It also lets you quickly test a custom diacritic file and make sure everything in it is working.

Finally, you may choose the sort order for the diacritics. There are four options, one for each column in the file, with the default being the diacritic name or label. This option applies to both the default and any custom file that is specified.

If you take pains to make your own file of diacritics, and organize it in the most efficient order, be sure to select the Do not sort! option in this list.

Notes

  • If the program does not find your custom diacritics file when it starts, it will fall back to the default. You can then go back into the Options and fix the problem.
  • Any row not containing valid values in all three columns is discarded at runtime. Thus, the following row will be discarded because a 'description' column is required:
00    E281B6 
  • Remember to add two zeroes to the UTF8 code, if applicable, to pad it out to the required length of 6 bytes.

For example, the following row will be discarded:

00	C2B9	Superscript digit one

Whereas this row will be accepted:

00	C2B900	Superscript digit one
  • Add as many rows as you need to your custom diacritics file.
  • Arrange, or sort, the keycodes file in whatever order you will find most efficient.

Complete text of the default keycodes.txt file

Note that regardless of the Leader/09 value, the diacritic for a given character will always look the same in the 'F3' list, even though its codepoint is different. For example, in MARC8, the code for copyright is xC3, whereas in UTF-8, the code for copyright is xC2 xA9. But in either case, the list will display ©

1)
To locate your Options folder, start the program, and select Tech support info under the Help menu
2)
MARC-8
help/customizing_diacritics.txt · Last modified: 2021/12/29 16:21 (external edit)
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki