ctryplace.txt

'ctryplace.txt' is the name of a table used by MARC Report to crosscheck the country code in the 008 (008/15-17) against place names in the 260 or 264 $a.

It is important to note that this table catches only incorrect 008 codes, and only for some of the more well-known places of publication.

Customizing the table

The ctryplace.txt table contains two columns, separated by a tab:

  1. Country code (to be matched againt a record's 008/15)
  2. Placename (to be matched against a record's 260/264 $a)

Placenames are normalized by the program; country codes are not.

The normalization procedure, applied to both the placenames in the table, and the placenames in each separate subfield $a, is:

  1. remove opening square bracket
  2. truncate the string at the first occurrence of '(', ',', ';', '['
  3. shift the string to uppercase
  4. replace any character not in 'A'..'Z' with a blank
  5. remove opening and closing blanks

When the program runs, if the record is not a serial, any placenames found in 260/4 $a are looked up in this table; if the placename is found, and its corresponding country code does not match the country code in the 008, an error is reported.

Note that by default, a check against this table returns 'no error'; thus, no error is reported if a placename is not found in the table.

To prevent an error from being reported for a given country code, add the country code, one tab, and the word 'Any' (without quotes) to the table:

vp         Any
xxu        Any

In the default table distributed with the program, 'vp' (Various places) is entered as given above, whereas 'xxu' is entered with the placename 'United States'. Thus, if the country code in a non-serial record is 'xxu' and the place of publication/production is 'New York', the error message

260: $a <> ctry code

will be produced.


Notes

It is not necessary to add the blank space in two-character country codes

This list should be customized for the locale that the program is being used in (see also the next comment).

Expanding this list is problematic; if you do so, be careful to check for duplicate placenames.


Anecdotal note

During work on version 242, we generated a very large list of placenames using about 10 million records that we had on hand. We boiled it down 1) to the top 1,000 occurring placenames, and then tested it in MARC Report. The result, a large increase in the number of incorrect '260: $a <> ctry code' messages, was a bit unexpected.

The root of this problem was in duplicate placenames. For example, in the US, every state seems to have a city named after early presidents; in addition, there are alot of placenames imported from outside the country, like Oxford. For sure, the hub of publishing activity for this placename lies in The University2); but there were 10 or more placename occurrences in our sample for Oxford, Mass., Oxford, Miss., Oxford, N.Y., and Oxford, Oh. But these US places did not come close to makeing it into our top 1,000, and thus, everytime one of those places was present in a record, the error message unjustly appeared.

From this we conclude that, to avoid false hits (given the way this table works in MARC Report), the list needs to made as all-encompassing as possible, since there is such a great duplication among placenames. We also concluded that each placename needs to be checked by a human to be sure the country code was correct–as there are a very large number of incorrect country codes in our records.

1) an exercise in itself, given the lack of standardization in the entry of the 260/4$a
2) and home of Endeavor Morse!
help/ctryplace.txt · Last modified: 2014/01/12 16:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed