Concatenate Files and MarcXml

The Concatenate Files utility joins files together. It was designed to join MARC files, but will also join text files, or anything else really (though some files are not meant to be concatenated, and concatenating files of different types will usually produce bizarre results).

MarcXml presents a bit of a challenge for concatenation, since each MarcXml file begins with some headers. Here is an example from an LC produced file:

<?xml version=“1.0” encoding=“UTF-8”?>
<collection xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation=“http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd
xmlns=“http://www.loc.gov/MARC21/slim”>

The problem is that if several files, each with these headers, are concatenated, the result will be a broken Xml file.

Beginning with version 2.32, Concatenate Files will support Xml file concatenation, with the following provisions.

  • First, our processing of Xml assumes that the files are carrying MARC data (ie. 'MarcXml' files). Concatenation may work for non-Marc data, but it has not been tested.
  • Second, the Xml header (if any) that is found in the first file to be concatenated will become the Xml header for the results file.
  • Finally, if any of the MarcXml files wrap their contents in '<collection>' tags, the opening collection tag in the first file, and the closing collection tag in the last file, will be preserved, and all other collection tags will be removed. The program tries to check for the situation where the first file contains an opening collection tag, and the last file lacks a closing collection tag, but that is the extent of the errorchecking performed.

XML concatenation does not validate the XML structure in any way, apart from the brief checks listed above. It is more or less a raw concatenation.

If you are concatenating XML files from different vendors/sources, you should run some sample concatenations first before proceeding with a major project. Keep in mind that if you concatenate two xml files from different sources, the program will keep the collection tags from the first file, and dump those from the second file. In some, cases then, it may be important which xml file appears first in the list of files to be concatenated (you can change this easily by dragging and dropping in the list).

232/marccat_and_xml.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed