Table of Contents
Customizing the Diacritics in MARC Report
This page describes diacritics customization in version 246 and later, including the changes to the options made in 248. For previous versions of the program,see this page.
In MARC Report, the diacritics menu is invoked by pressing <F3> when editing. The characters that are displayed when F3 is pressed may be customized using a plain text file.
In versions before 246, a file named keycodes.txt was distributed with the program and loaded from the installation folder when the session started. The code points in this file consist of all MARC-8 diacritics that do not use escape sequences, and their corresponding UTF8 code points (see: MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media ).
Beginning in version 246, all of the codes from keycodes.txt are embedded in the program itself (the file is no longer needed).
Customization of diacritics now applies exclusively to UTF-8 code points.
How diacritics work in MARC Report
In MARC Report, when the hot-key to show the diacritic menu is pressed (F3), the program displays a list of diacritic characters. When the user selects a character from the list and presses <Enter>, the selected diacritic is pasted into the record at the current cursor position.
The contents of the diacritic list is based on the value of Leader/09:
- If Leader/09 is blank, only diacritics that have MARC-8 codes are included
- If Leader/09 is 'a', only diacritics that are unicode are included
In the default setup, the lists are exactly the same.
However, you can create a custom list of UTF8 diacritics, and tell the program to use this list instead of the default. (The default list will always be used for MARC-8 records).
Customizing the diacritics
To create your own file of UTF-8 diacritics, add the code points for the diacritics (as described below) to a text file. For best results, the file should be saved (or moved) to your MARC Report Options folder (but it can be located anywhere)1).
The format of a custom diacritics file is:
|Column 1||MARC-8 placeholder||2|
|Column 2||UTF-8 Code (hex)||6|
|Column 3||Character name or label||Variable|
|Column 4||UTF-16 code (hex)||4|
Each column should be separated from the next by a single Tab character. Do not enter a tab after the last column.
Notes about the format
Column 1 (MARC-8 placeholder) must contain '00'.
UTF-8 codes are presented by 4-byte or 6-byte hex sequences. Four-byte sequences must be padded to six by appending '00'.
The fourth column is needed only if you are using the Unicode diacritic tool. If you do not plan to use this, you need only three columns.
For example, to add the UTF-8 code for the Greek letter alpha (lowercase), enter:
00 CEB100 Greek Small Letter Alpha 03B1
The label–whatever is in the third column (here 'Greek Small Letter Alpha') will then display in the diacritics list. If selected, the code 'CEB1' will be pasted into the record at the cursor position.
You can use a spreadsheet program to make your list, and then save it as tab-delimited text. Just be sure to check it in a text editor afterwards, as sometimes this type of software adds punctuation to the output, or misinterprets text as formulas, etc.
Here's an example taken from the first few rows of the file named 'diacritics-common-western.txt', which is now distributed with the program:
|00||C3A900||Latin Small Letter E With Acute||00E9|
|00||CC8100||Combining Acute Accent||0301|
|00||C3B300||Latin Small Letter O With Acute||00F3|
|00||C3AD00||Latin Small Letter I With Acute||00ED|
|00||C3A100||Latin Small Letter A With Acute||00E1|
|00||E28497||Sound Recording Copyright||2117|
Its not difficult to find all of the information needed to produce a file like this on the web; some good sites are: Wikipedia (search for 'list of unicode characters'), Unicode consortium (go to the section called 'code charts'), and 'The digital Rosetta stone' (go to the section named 'unicode').
Options for diacritics
MARC Report 2.48 adds a new section for diacritics in the Record Display page of the Options:
The first option tells the program to use the visual diacritic tool instead of the default text/label diacritic tool. This option was on the left side of the Record Display tab in previous versions of the program.
Next comes the option to tell the program to use a custom diacritics file: click the filing cabinet button on the right and navigate to your file and select it. This capability was present in earlier versions but it required giving the custom diacritic file a specific name and placing it in a specific folder.
Next is the button that allows you to 'Test' the unicode tool. This is here so that you can see how it works before deciding to use it. It also lets you quickly test a custom diacritic file and make sure everything in it is working.
Finally, you may choose the sort order for the diacritics. There are four options, one for each column in the file, with the default being the diacritic name or label. This option applies to both the default and any custom file that is specified.
If you take pains to make your own file of diacritics, and organize it in the most efficient order, be sure to select the Do not sort! option in this list.
- If the program does not find your custom diacritics file when it starts, it will fall back to the default. You can then go back into the Options and fix the problem.
- Any row not containing valid values in all three columns is discarded at runtime. Thus, the following row will be discarded because a 'description' column is required:
- Remember to add two zeroes to the UTF8 code, if applicable, to pad it out to the required length of 6 bytes.
For example, the following row will be discarded:
00 C2B9 Superscript digit one
Whereas this row will be accepted:
00 C2B900 Superscript digit one
- Add as many rows as you need to your custom diacritics file.
- Arrange, or sort, the keycodes file in whatever order you will find most efficient.
Note that regardless of the Leader/09 value, the diacritic for a given character will always look the same in the 'F3' list, even though its codepoint is different. For example, in MARC8, the code for copyright is xC3, whereas in UTF-8, the code for copyright is xC2 xA9. But in either case, the list will display ©