Why MARCXML?

There are two, perhaps three, reasons why MARCXML is important to us now.

First, MARCXML is primarily a communications format1). With MARCXML, all of your bibliographic data can be quickly moved into the XML world without loss. From here, it is a relatively painless task to move the MARCXML data to a more useful XML format, like MODS or DUBLIN CORE. All that is needed for the secondary conversion is a stylesheet, and software which supports XSLT, both of which are available on the web.

Second, MARCXML, by virtue of its being XML, supports unicode, without requiring any intervention (ie., special software). Technically, this argument is not as strong as it once was, because UTF-8 encoding is now well-supported in MARC and by most library systems. Also, UTF-8 in itself does not solve all problems of character encoding, and may perhaps simply exchange one set of problems for another2). However, in the larger picture of internationalization, and the resultant global sharing of resource descriptions, the fundamental support for unicode in XML has to be viewed as a major advantage3).

Finally, using MARCXML as a communications format will enable us to get around some of the persistent limitations of MARC itself. What happens in MARC when a record wants to exceed 99,999 bytes, or a field wants to be longer than 9,999 bytes? The result is purely dependent upon the vendor's software, and there are no good solutions, only kludges, and many software bugs. These problems could be solved by making changes to MARC, supported within the standard itself in some places, and extending it in others 4). But doing so would break all MARC software.

1)
which is exactly what MARC itself is; and if you find MARCXML difficult to read, please remember that raw MARC records are similarly difficult–if not impossible–to read without special software
2)
apart from quirky display problems related to fonts, as a Windows developer, we are also pestered by the Microsoft penchant for 'automatically' translating UTF-8 streams into the codepage of the local machine, and the amount of code necessary to 'program around' this
3)
[Long note follows.]
Consider the MARC-8 escape sequences used to represent foreign script in the 880 fields of LC records:
880   $6260-03/(3/r$a(3edJGf, HGcSJGf :(B$b(3GdecJHI GdaGQhbjI,(B$c1977(3.(B
In MARCXML, the same data, based on the UTF-8 encoding standard, will look like this, in the raw–
<datafield tag="880" ind1=" " ind2=" ">
<subfield code="6">260-03/(3/r</subfield>
<subfield code="a">ملتان، باكستان :</subfield>
<subfield code="b">المكتبة الفاروقية،</subfield>
<subfield code="c">1977.</marc:subfield>
</datafield>
–and like this in the typical browser:
<datafield tag="880" ind1=" " ind2=" ">
<subfield code="6">260-03/(3/r</subfield> 
<subfield code="a">ملتان، باكستان :</subfield> 
<subfield code="b">المكتبة الفاروقية،</subfield> 
<subfield code="c">1977.</subfield> 
</datafield>
4)
alphabetic tags, 5-byte field lengths, increase the number of indicators, increase the length of subfield delimiters, etc.
help/why_marcxml.txt · Last modified: 2021/12/29 16:21 (external edit)
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki