Customizing the Diacritics in MARC Report

This page describes diacritics customization in versions before 246. For versions 246 and later, please use This Link

In MARC Report, the items that display in the diacritics menu (invoked by pressing <F3> when editing) are controlled by plain text files distributed with the program.

Diacritics are defined in two support files:

keycodes.txt, which contains all codes defined in the EXTENDED LATIN range1), and
keycodes_utf8.txt, which is used to customize UTF8 codes (for which, see below).


keycodes.txt

The rows in this file are sorted by the diacritic's caption, with a repeated entry for the new copyright and phonograph copyright symbols appearing at the very top.

Here is brief excerpt from the current version of keycodes.txt:

C3    C2A900    Copyright sign 
C2    E28497    Phonogram recording copyright

E2    CC8100    Acute accent
E6    CC8600    Breve
B9    C2A300    British pound sign 
...

The format of this file is:

  • Column 1: MARC-8 Code (hex); Length=2
  • Column 2: UTF-8 Code (hex); Length=6
  • Column 3: Description; Length=variable


In MARC Report, when the hot-key to show the diacritic menu is pressed2), the program displays the description columns from this file in a menu. When the user makes a selection from this menu, the resulting action depends on the value of Leader/09:

  • If Leader/09 is blank, the associated MARC-8 code (from column 1) is pasted into the record
  • If Leader/09 is 'a', the associated UTF-8 code (from column 2) is pasted into the record

Customizing the keycodes_utf8 file

A second diacritics file named keycodes_utf8.txt is also distributed with the program to allow for greater user customization of unicode diacritics. Unlike the default file described above, this version contains every possible code, in ASCII order3):

A0
A1	C58100	Polish L, uppercase
A2	C39800	Scandinavian O, uppercase
A3	C49000	D with crossbar, uppercase
A4	C39E00	Icelandic thorn, uppercase
A5	C38600	Digraph AE, uppercase
...

Codes that are not defined in the EXTENDED LATIN range (eg. “A0” in the above example) are represented by a placeholder. This means that you may re-use that row for any diacritic that you want to add to the menu.

This _utf8.txt file should only be used to customize UTF-8 codes because the first column of this file is not used to construct a MARC-8 entry, as above; on the contrary, the first column is used by the program to index the row in the corresponding UTF8 menu.

The format of keycodes_utf8.txt is:

  • Column 1: ASCII Code (hex); Length=2
  • Column 2: UTF-8 Code (hex); Length=6
  • Column 3: Description; Length=variable

Another key difference is that the keycodes_utf8.txt file is never read from the installation folder; to use this file, the user must copy it from the installation folder to their My Documents\MarcReport\Options folder, and perform all customizing in the latter location.

Here is the order of precedence the program uses when processing the diacritics files:

1. If My Documents\MarcReport\Options\keycodes.txt exists, load the diacritics (both MARC-8 and UTF-8) from this file, else use the default, typically C:\Program Files\TMQ\MARC Report\keycodes.txt.

2. If My Documents\MarcReport\Options\keycodes_utf8.txt exists, overwrite the UTF-8 diacritics generated in step 1 with the contents of this file (The MARC-8 diacritics are not modified).


To customize a row in the keycodes file, the user should enter:

  1. a tab after the ASCII code,
  2. an UTF-8 code, expressed in hex,
  3. a tab after the UTF-8 code,
  4. a description of the diacritic.

For example, to add an UTF8 entry for the lowercase Greek alpha character (which is not included in the default list of diacritics), we could append to the line that reads:

80

so that it looks like this:

80    CEB100    Greek alpha, lowercase
	

The description 'Greek alpha, lowercase' will then display in the MARC Report diacritics menu for UTF8 records, and if selected, the code 'CEB100' will be entered in the record at the cursor position.

—-

Notes

  • Any row not containing valid values in all three columns is discarded at runtime. Thus, the following row will be discarded because a 'description' column is required, even if a valid UTF8 code has been entered:
86    E281B6 
  • It is often necessary to append two zeroes to the UTF8 code to pad it out to the required length of 6 bytes; For example:
83	C2B900	Superscript digit one
  • Do not define the null character (x00) as an entry in the UTF8 keycodes file.
  • If you need more rows for your unicode customization, replace the default entries with your own. We distribute a heavily customized keycodes example with the program; see file called keycodes_utf8-scientific example.txt in the installation folder.


Complete text of MARC-8 keycodes file

Complete text of UTF-8 keycodes file

239/customizing_diacritics_pre246.txt · Last modified: 2021/12/29 16:21 (external edit)
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki