Differences

This shows you the differences between two versions of the page.

Link to this comparison view

phelp:helpxmlutils [2017/02/04 12:41] (current)
Line 1: Line 1:
 +MARC21-TO-XML
 +
 +This utility will convert a file of MARC records to 'MarcXml'. For information about MarcXml, see: http://www.loc.gov/standards/marcxml
 +
 +Before converting to Xml we recommend that the Verify Utility be run on the file with all of the repair options selected. Xml is much more sensitive to invalid characters than MARC, and it will generally choke if it runs into things like a MARC subfield delimiter in an indicator position, something which, although not common, happens enough in a large file to warrant this caveat.
 +
 +When converting MARC to Xml there are a couple of problems that the converter will look for. First, 'hybrid' records--records with both MARC-8 escape sequences and a leader/09 set to 'a'--often cause problems in the conversion. We've decided that its not feasible to try to process these records. Hybrids will be dumped to a file called 'hybrids-YYMMDD.mrc' during MARC-to-XML conversion and a corresponding alert will be posted. Ideally, these hybrid records should be replace with unicode versiond from LC or OCLC. The MARC-to-XML utility also 'roundtrips' all conversions (that is, once the MARC record is converted to Xml, it is then checked to see if it will convert back to MARC), so that if a record is output to an XML file you can be reasonably certain that the encoding will not trip any 'invalid character' errors when loading the file in browser like IE. If the roundtrip fails, the record is dumped to the 'hybrid' error file described above, and an alert posted (a separate count is maintained for both 'hybrids' and 'roundtrip fails').
 +
 +Options
 +
 +If you want to convert MARC records into something other than MarcXml, a stylesheet (XSL) must be supplied. All stylesheets available on the LC page quoted above that support MarcXml are distributed and supported by MARC Report. It should theoretically be possible to customize one of these supported stylesheets and still have it work. Note that the key thing is that the stylesheet must 'transform' MarcXml to the desired destination Xml.
 +
 +Open results in default Xml viewer
 +
 +If checked, when the conversion is finished, the program will open the resulting Xml file in whatever program is defined in your windows shell for viewing Xml. If you plan to use this option on large Xml files, you should take care that your default viewer is not Internet Explorer (as it is incapable, in our experience, of opening a large Xml file without hanging or crashing a system).
 +
 +Notes
 +
 +MarcXml has advantages and disadvantages. The main advantage that we can see is its native support for unicode. When a MARC file is converted to Xml, all MARC-8 escape sequences and diacritics must be converted to unicode (UTF-8); the foreign script will be rendered quite nicely in a browser such as Internet explorer. This rendering of foreign script can be demonstrated quite simply: open a file in MARC Report, find a record that contains foreign script or diacritics, then press F5. The program will convert the record from MARC21 to MarcXml, and display it in a browser window. MARC Report's 'F5' and the Marc21-to-Xml utility use the same conversion routine.
 +
 +On the other hand, there is quite alot of overhead in MarcXml (a characteristic of all XML); an XML file will use roughly four/five times as much storage as a Marc21 file; and this does not make MarcXml very pleasant to look at, in our opinion.
 +
 +XML-TO-MARC21
 +
 +This utility will convert a file of MarcXml records to unicode MARC21. This utility assumes that the MarcXml file is encoded in UTF-8, and will maintain this encoding in the conversion to MARC21: the leader/09 will be set to 'a'. The program will check the header of the Xml file, and if it does not contain 'encoding=UTF-8', an appropriate warning message will be displayed.
 +
 +WARNING: This converter does not support the conversion of MarcXml (in UTF-8) into MARC-8.
 +
 +When converting XML-to-MARC, a record that contains invalid characters (which we define as anything that causes the MARCXML document parser to choke), will be dumped to a separate textfile. The filenames for these error files will be of the format: XmlErrnnnnnn.txt, where 'nnnnnn' represents the record's sequence number in the source files. A report of these errors, if any, will be displayed at the end of the conversion. The error threshhold is currently set to 100; this means that if a file being converted has more than 100 of these errors, the conversion will stop. The purpose of this limit is to prevent the program from filling the disk with thousands of files when encountering a hopeless file.
 +
 +Options
 +
 +If you want to convert non-MarcXml records into MARC, a stylesheet (XSL) that converts the other-Xml into MarcXm lmust be supplied. All stylesheets available on the LC page quoted above that support MarcXml are distributed and supported by MARC Report. Note that the key thing is that the supplied t must 'transform' the other Xml to MarcXml.
 +
 +GENERAL
 +
 +It is possible to use MARC Report to convert a file of MARC-8 records to unicode records using the two utilities above, one after another. First, convert the MARC file to MarcXml--this step will 'upgrade' any MARC-8 escape sequences and diacritics to the UTF-8 encoding, and toss out any errors into the 'hybrids.mrc' file. Second, conver the MarcXml results from the first conversion back to MARC21. This conversion will preserve the UTF-8 encoding and set the Leader/09 to 'a'.
 +
 +If you intend on running this procedure on a very large file, it might be best to first split the file in two, putting all records with diacritics, etc., in one file, and all the plain ascii records in another file. Plain ascii records can be 'converted' from MARC-8 to UTF-8 simply by setting the leader/09 code to 'a'. This step can be done a lot faster in MARC Global than by exporting the records to MARCXML and then importing them back to MARC.
 +
 +For the latest information and documentation on this subject, please refer to the following link:
 +http://www.marcofquality.com/w/doku.php?id=help:marc_xml_documentation
  
phelp/helpxmlutils.txt · Last modified: 2017/02/04 12:41 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed