Differences

This shows you the differences between two versions of the page.

Link to this comparison view

233:concat_xml [2013/04/27 13:09]
233:concat_xml [2021/12/29 16:21] (current)
Line 1: Line 1:
 +**Concatenate Files utility and XML** 
 +
 +With MARC Report version 233, it is possible to use the Concatenate Files utility to join Xml files. 
 +
 +To enable the built-in Xml processing present in this utility, the program requires that all selected sourcefiles have an .xml file extension. (You should also make sure that the result file has an .xml extension also.)
 +
 +If this condition is met, the following prompt will be shown: \\
 +{{:233:allxmlprompt.jpg|}}
 +
 +The default response is 'Yes'. This will make sure that the result file begins with ((technically, this header is not required for a version 1.0 Xml file, but its best practice to have one))\\
 +
 +  <?xml version="1.0" encoding="UTF-8"?>
 +
 +and that the remainder of the result file is enclosed by the following element:
 +
 +  <marcreportConcatenation>
 +  ...
 +  </marcreportConcatenation>
 +
 +The reason this element must be added is because, without it, the results will be invalid Xml((ideally this should be an element that the user can specify, but for now it is hard-coded into the program)). For example, say I have two MARCXML files that I want to join into one. Here is the top bit of each one((it does not matter for the purpose of this illustration whether the records begin with <record> or <collection>)):
 +
 +  <?xml version="1.0" encoding="UTF-8"?>
 +  <record xmlns="http://www.loc.gov/MARC21/slim"> 
 +  ...
 +  <?xml version="1.0" encoding="UTF-8"?>
 +  <record xmlns="http://www.loc.gov/MARC21/slim"> 
 +  ...
 +
 +If the two files above are simply joined, the results will be invalid--only one top level element is allowed in an Xml document. Even if we use the special xml header handling described above--
 +
 +  <?xml version="1.0" encoding="UTF-8"?>
 +  <record xmlns="http://www.loc.gov/MARC21/slim"> ...
 +  <record xmlns="http://www.loc.gov/MARC21/slim"> ...
 +
 +--the same error occurs. Xml documents ((even if we concatenate a million Xml files together, the results is still considered a 'Xml document')) must contain one element that is the parent of all other elements. 
 +
 +So, to comply with this, the Concatenate utility will create an Xml file structured as follows:
 +
 +  <?xml version="1.0" encoding="UTF-8"?>
 +  <marcreportConcatenation>
 +  <record xmlns="http://www.loc.gov/MARC21/slim"> ...
 +  <record xmlns="http://www.loc.gov/MARC21/slim"> ...
 +  </marcreportConcatenation>
 +
 +This technique should work even if the Xml files being joined together are not MARCXML.
 +
 +__Some other caveats about concatenating Xml files__
 +
 +As with any concatenation operation, a bit of care is needed; the maxim 'garbage in, garbage out' certainly applies here. This utility is designed for speed, and does not validate the Xml in the file(s), apart from what has been stated above. 
 +
 +  - The program assumes the UTF-8 encoding is used in each file. This should usually be the case but it does not always have to be so.
 +  - Its possible to join different types of xml records into a single file using the above. ((for example, this utility can create a valid Xml file that includes MARCXML, MODS, and OAI_DC records)) This is valid in Xml, but if any of the files being concatenated lack namespace attributes, the results might be un-useable.
 +  - As with any other concatenation task that is not joining MARC files, the results reported by the program are the number of bytes, and not the number of records.
 +  - If you try to join non-xml files to xml files, the program will pop-up a warning message; however, the warning can be overridden.
  
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki