This shows you the differences between two versions of the page.

Link to this comparison view

233:xml_utilities [2013/04/27 09:09] (current)
Line 1: Line 1:
 +__Marc/XML Utilities changes in 233__
 +Previous to version 233, the __Marc to XML__ utility would not process a MARC file that was greater than 2.1GB in size. This limitation has been removed.
 +The performance of the __XML to Marc__ utility has been greatly improved. 
 +A new option has been added to the MarcToXml utility which makes it possible to output each record in a MARC file as a single MarcXml file (a necessary step if you want to publish your MARC data using OAI).
 +The MarcToXml utility now writes out any records that it cannot convert to a file called 'hybrids-nnnn.mrc' (where 'nnnn' represents a datestamp). These records contain encodings that are not valid in MARC-8 (yet may be valid in UTF-8, hence the term 'hybrid'). In some cases, these records can be repaired simply by flipping the leader/09 to 'a' and re-validating using the 'INVALID CHARS' cataloging check set in MARC Report. It may also be the case that the errant codes represent a local practice (for example, a non-compliant character being used for a currency symbol in the 020), and that these characters can be corrected with MARC Global. Apart from these types of problem, the repair of offending characters may not be practical--especially with MARC-8 escape sequences--and in this latter case it would be best to try to get a new record for the resource from LC.
 +A few minor problems in both conversion utilities have been fixed in 233:
 +  * sluggish screen response/updates
 +  * the MARC Mapping being used for CJK code x21x23x28 was incorrect
 +  * the program did not recognize qualified tags (eg. '<marc:leader>')
 +__Converting MARC-8 records to UTF-8 encoding__
 +It is possible, using the xml conversion utilities in MARC Report, to change MARC-8 records into unicode records by running the two utilities back-to-back:
 +  - Run the MARC to XML utility on your file of MARC-8 records
 +  - Run the XML to MARC utility on the XML file created by step #1
 +\\ <!-- [[marc8toutf|Detailed instructions for the two above steps]] -->
 +If you are going to use this technique on a large file (millions of records), we recommend that you first split the MARC source file into records that contain diacritics, etc., and those that do not ([[http://www.marcofquality.com/w/doku.php?id=233:marc_review_diacritics|see this article for how to do this in MARC Review]]).  Records without non-ascii characters may be converted to UTF-8 simply by flipping the leader/09 code from ' ' to 'a', which is easily done in MARC Global. Then, on the file that contains the records with diacritics, follow the two steps listed above. On a large file, using this hint may save several hours of processing time.
233/xml_utilities.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed