Concatenate Files utility and XML

With MARC Report version 233, it is possible to use the Concatenate Files utility to join Xml files.

To enable the built-in Xml processing present in this utility, the program requires that all selected sourcefiles have an .xml file extension. (You should also make sure that the result file has an .xml extension also.)

If this condition is met, the following prompt will be shown:

The default response is 'Yes'. This will make sure that the result file begins with 1)

<?xml version="1.0" encoding="UTF-8"?>

and that the remainder of the result file is enclosed by the following element:

<marcreportConcatenation>
...
</marcreportConcatenation>

The reason this element must be added is because, without it, the results will be invalid Xml2). For example, say I have two MARCXML files that I want to join into one. Here is the top bit of each one3):

<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim"> 
...
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim"> 
...

If the two files above are simply joined, the results will be invalid–only one top level element is allowed in an Xml document. Even if we use the special xml header handling described above–

<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim"> ...
<record xmlns="http://www.loc.gov/MARC21/slim"> ...

–the same error occurs. Xml documents 4) must contain one element that is the parent of all other elements.

So, to comply with this, the Concatenate utility will create an Xml file structured as follows:

<?xml version="1.0" encoding="UTF-8"?>
<marcreportConcatenation>
<record xmlns="http://www.loc.gov/MARC21/slim"> ...
<record xmlns="http://www.loc.gov/MARC21/slim"> ...
</marcreportConcatenation>

This technique should work even if the Xml files being joined together are not MARCXML.

Some other caveats about concatenating Xml files

As with any concatenation operation, a bit of care is needed; the maxim 'garbage in, garbage out' certainly applies here. This utility is designed for speed, and does not validate the Xml in the file(s), apart from what has been stated above.

  1. The program assumes the UTF-8 encoding is used in each file. This should usually be the case but it does not always have to be so.
  2. Its possible to join different types of xml records into a single file using the above. 5) This is valid in Xml, but if any of the files being concatenated lack namespace attributes, the results might be un-useable.
  3. As with any other concatenation task that is not joining MARC files, the results reported by the program are the number of bytes, and not the number of records.
  4. If you try to join non-xml files to xml files, the program will pop-up a warning message; however, the warning can be overridden.
1) technically, this header is not required for a version 1.0 Xml file, but its best practice to have one
2) ideally this should be an element that the user can specify, but for now it is hard-coded into the program
3) it does not matter for the purpose of this illustration whether the records begin with <record> or <collection>
4) even if we concatenate a million Xml files together, the results is still considered a 'Xml document'
5) for example, this utility can create a valid Xml file that includes MARCXML, MODS, and OAI_DC records
233/concat_xml.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed