Differences

This shows you the differences between two versions of the page.

Link to this comparison view

phelp:helpconcatenatefiles [2021/12/29 16:21] (current)
Line 1: Line 1:
 +CONCATENATE FILES
 +
 +This utility allows you to take two or more MARC files and join them together, or 'concatenate' them, creating a new file that consists of the sum of the files that you started with. You can also use this utility to append one file to another.
 +
 +
 +BASIC CONCATENATION STEPS
 +
 +There are three basic steps to concatenating files. First, choose the files that you want to concatenate, or join together. Second, choose the name and location for your results. And third, click 'Go' to start the concatenation program.
 +
 +The buttons on the left side of the form make it easy for you to select the files you want to work with. Click the 'Source Files' button to select the files that you want to concatenate. You can click this button once for each file you want to add, or select multiple files at the same time by holding down the <Ctrl> key or <Shift> key. Once you have selected the files you want to join, click the 'Results' button to enter the name you want to use for the results file. By default, the Results are saved in the same direcory as the last Source file selected, but you can navigate to any directory and save your results there.
 +
 +Note that every file that is added to the Source Files list has a checkbox next to it. If you uncheck a file, it will not be added to the results when you start the concatenation job. Click the 'Clear List' button to empty the Source Files list and start from scratch.
 +
 +When you have your filenames set up, click 'Go'. The program will copy all the files in the Source File list to the results file. The progress bar and status bar at the bottom of the form will provide you with feedback on the program's status. When the job is complete, the Results file will contain all of the records in all of the Source files, with the records being ordered in the same order as the Source files were selected.
 +
 +
 +USING A PATTERN TO SELECT SOURCE FILES
 +
 +There is an option called 'Patterns' under the Source files button. Select this option to switch into 'Pattern mode'. Enter the file pattern you want to match in the box below; you must enter the full path as well as the filename pattern. For example: 'd:\marc\*.mrc' matches every file in the folder 'd:\marc' with a file extension of '.mrc'
 +
 +Once the patterns have been entered, click the 'Check Patterns' button on the right. This will tell you how many files that your pattern matched. If there's no match, then check the typing in the pattern and try it again. 
 + 
 +If you want to select files from multiple folders, add each folder as a new pattern, one pattern per line. Remember, the patterns have to begin with a path--do not simply enter '*.mrc'--it will not work (unless there are .mrc files in the program directory). If you have long paths in your patterns, it might be a good idea to open up Windows Explorer, navigate to your files, then copy and paste the path from the Explorer addressbar into this form. 
 +
 +
 +MARC RECORD COUNTS
 +
 +This utility performs a raw copy of each file without any MARC verification or error checking. As a result, this raw copy is many times faster than it otherwise would be; however it may not always produce an accurate record count.
 +
 +When the program is done, it is recommended that you run the Count utility to verify the final record count of the file created by concatenation. 
 +
 +
 +APPENDING FILES
 +
 +If you want to append one file to another, select the file you want to Append using the Source File button, then select the file that you want to append it to using the Results button. In the Confirmation dialog that follows, select 'Yes'.
 +
 +
 +COPYING FILES
 +
 +We recommend that you use Windows to Copy files. However, if you want to copy one file to another using this utility, select the file you want to Copy using the Source File button, then select the file that you want to copy it to using the Results button. In the Confirmation dialog that follows, select 'No'.
 +
 +
 +DUPLICATE FILES
 +
 +Duplicate files are not included by default. For example, if you select a Source File more than once, it will be added to the Source Files list, but its checkbox will not be selected. If you want these files to be added more than once, you will have to manually check them.
 +
 +Although it is not an error, the program will alert you if the Results file is present in the Source File list. The purpose of the alert is to prompt you to double-check your options, since the Results file will be destroyed (and be replaced by the sum of the Source Files) when the concatenation is complete.
 +
 +
 +XML DATA 
 +
 +If 'Concatenate files' is used to join XML files, the following processing is available:
 +
 +1) The XML declaration (typically '<?xml version="1.0" encoding="UTF-8" ?>') will be removed from each file in the list of files to be concatenated
 +
 +2) The XML declaration will then be added to the beginning of the concatenation results, followed by a new root element that is guaranteed to be unique:
 +
 + <?xml version="1.0" encoding="UTF-8" ?> 
 + <collection_110321112601>
 + [contents of xml file 1 ...
 + [contents of xml file 2 ...
 + [contents of xml file 3 ...
 + </collection_110321112601>
 +
 +The purpose of adding this root element is ensure that the resulting file is 'well-formed'.
 +
 +Apart from the check for the XML declaration implied above, Concatenate Files does not validate the XML structure in any way. If using this utility to concatenate XML files from different sources, the results may not be 'valid' (in the XML sense of the term).
 +
 +You may either accept or decline this special processing. If declined, the files will be concatenated exactly as they are.
 +
 +
 +NON-MARC DATA
 +
 +Although it was not designed for this purpose, and the instructions above do not mention it, this utility can also be used to concatenate non-MARC files. For example, you could use this utility to concatenate a list of text files.  
 +
 +In this mode, the program counts each byte as a 'record'; therefore, when the program completes, the statistics that are reported may appear refer to the number of bytes concatenated and not 'records'.
 +
 +Please be aware that some types of files just are not meant to be concatenated. 
 +
 +
 +INTERLEAVE
 +
 +This option allows you to concatenate the MARC records from different files together in an interleaved order. 
 +
 +For example, if you have three source files named A, B, and C, and each file contains three records, the resulting MARC file in an 'interleaving' concatenation would contain nine records in this order:
 +
 +A-B-C-A-B-C-A-B-C
 +
 +whereas, in a normal concatenation, the results file would contain the nine MARC records in this order:
 +
 +A-A-A-B-B-B-C-C-C
 +
 +If there are an unequal number of records in the source files, the program maintains the interleaving order until all records from a file are exhausted. For example, given the three files A, B, C, where file A contains 3 records, file B contains 2 records, and file C contains 4 records, the resulting MARC file in an 'interleaving' concatenation would contain nine records in this order:
 +
 +A-B-C-A-B-C-A-C-C
 +
 +whereas, in a normal concatenation, the results file would contain the nine MARC records in this order:
 +
 +A-A-A-B-B-C-C-C-C
 +
 +The ordering of the source files in an interleave operation might be important, depending on what you are going to do with the results. Use drag and drop to put the files in the order that you want.
 +
 +Interleave requires at least two source files. Interleave is only available if all of the sourcefiles contain MARC records.
 +
 +As this option creates an unusual results file, you will always receive an 'Are you sure' pop-up when the Interleave option is checked.
 +
 +
 +EXAMPLE
 +
 +Here is a sample scenario for using the Concatenate Files utility. Say that you have a large file and are looking for a specific problem (like CIP records). You could run a Quick Review to identify all records where 300 $a = 'p. cm.'. To make editing easier, you use the MARC output option and select 'Split', which writes the results to two different files (one file containing only the CIP records, the other containing all of the other records that were not CIP). Upgrading the CIP records is now fairly straightforward, since you have conveniently collected them into a single file. But when you are done, you want to add the upgraded records back to the other records so that you once again have all of the records a single file. 
 +
 +This is where Concatenate Files comes in. 
 +
 +
 +REPEAT
 +
 +The purpose of this option is to make it easy to create a MARC file that contains the same record 'x' of times. 
 +
 +To do this, find the record you want to "multiply" and save it to a MARC File (press <F6> in MARC Report). Make any edits that are necessary to this record and save them.
 +
 +Next, open the Concatenate Files utility, press Select, and select the file containing the record you want to multiply.
 +
 +Next, select "Repeat", and then enter the number of copies of the record that you want to be created. For example, if you enter 10, the selected file should be copied to the "Result File" 10 times, thus creating a new file with 10 records--each record being a copy of the record that you started with.
 +
 +Finally, press "Result file" and set the filename for the results of the copy. This must be a new filename. 
 +
 +WARNING: if you select an existing filename for the results, the contents will be overwritten with the selected file ('x' times). It is not possible in this mode to concatenate the selected file to the end of an existing file (as in unix, for example).
 +
 +Press "Run" to copy the file 'x' times.
 +
 +When using "Repeat" mode of this utility, only one source file is supported. If you enter more than one sourcefile, an error message will display.
 +
 +NB: the selected source file is copied to the results file, 'x' times, regardless of how many records are in the source. In the description above, we assume the source file contains only one record. However, although this utility does a good job of determining how many records are present in a MARC file, 'Concatenate' copies files not records. Therefore, whether the source file contains a single record, or a million records, the result is the same--the source file will be copied to the results file 'x' times. 
 +
 +
 +UNIX CAT
 +
 +This program tries to behave in the same manner as the unix 'cat' utility; there is, however, at least one notable difference. In MARC Cat, the files in the Source file list are added in turn to a temporary file, and when this concatenation is finished, the temporary file is renamed to the Results file. This means that copying a file to itself has no effect, and does not blow the file away as in unix (cat file > file). Appending a file to itself also does just that in Marc Cat, as opposed to the infinite concatenation result that occurs in unix (cat file >> file). Finally, if the Results file is present in the Source file list, it will still be added to the Results in MARC Cat; whereas in unix, the Results file is either cleared at the outset (cat file1 file2 > file1), or goes into an infinite loop (cat file1 file2 > file2).
 +
 +NOTE: There are a great many varieties of unix, and some of the description above may not apply in each case.
  
phelp/helpconcatenatefiles.txt ยท Last modified: 2021/12/29 16:21 (external edit)
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki