Unicode in RIMMF3

Beginning with update 141206 (mid-December 2014), all non-ASCII data in RIMMF3 (whether created or imported) is '\u' encoded.

For example:

\u00E9

where '00E9' is a hexadecimal number representing the UTF-16 code point of the character.

This character encoding is Unicode-compatible.

Beginning with update 150801, the RIMMF application itself supports the display of Unicode characters. There is no change to the way these characters are stored, however–they are still '\u' encoded.


Here are a few screenshots to illustrate.

1. RIMMF3 display of diacritics, between update 141206 and 150801:

2. RIMMF3 display of diacritics, beginning with update 150801

3. RDF text (snippet) for both #1 and #2 (beginning with update 141206)


Non-Unicode RIMMF

Diacritics in data generated in RIMMF before 141206 are not Unicode-compatible.

We tried to add a character encoding conversion utility to RIMMF3 at the same time we added the \u-encoding support, but this utility succeeds only with the most basic diacritics.

How to handle encoding problems

In the current RIMMF3 application (beginning with update 150801), loading older data that contains diacritics that are not \u-encoded may generate a character-encoding exception when the program starts1).

When this happens, the default behavior is to remove the record. RIMMF does this by moving the record that generated the error from the data folder into the subdirectory named '__history'.

RIMMF also logs the error in the 'RIMMF3.log' (which is found in your 'RIMMF3' folder):

08/11/15 8:15:10 PM
EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000036.txt
EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000099.txt
EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000182.txt
EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000183.txt
EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000015.txt
73 records indexed for EI; 5 errors during indexing.

Unfortunately, removing the record in this way breaks any links present in the record.

To workaround this problem, we added an option with a different default behavior to update 150812.

The new option is located on the 'Data options' form which is accessed from the main menu:

The new option is named:

During EI creation, try to automatically fix character encoding errors

and it is enabled by default. The way this works is that when a character encoding exception is found during start-up, instead of removing the record from the data folder, RIMMF will try to fix the encoding problem and keep the record.

In the EI, these encoding problems will display like this:

To fix the problem, open the record and replace the 'diamond' with the correct diacritic

For complete information about diacritics in RIMMF, please see the Diacritics and Unicode article.

1)
because at this time, when the EI is created, every record is parsed
details/unicode.txt · Last modified: 2023/06/07 20:39 by 127.0.0.1
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki