EDITING VERY LARGE FILES WITH MARC REPORT
In MARC Report we define a very large file as one that contains more than 100 MB (100,000,000 bytes). When working with very large files of MARC records in MARC Report, you should understand the following options.
'How to Name the Edit Results'. If this option is set to 'Always rename to name of source' (the default), then the first thing MARC Report does when editing a file--once the file indexing completes--is to copy the source file to a backup folder. This operation can take a bit of time--depending on your system and the size of the file--and also use significant disk space. For example, if editing a 300MB MARC file (a library database of several hundred thousand records), then close MARC Report to take a break, when re-opening the file in MARC Report, a new Edit Session is begun--and the 300 MB file is backed up again to a new session folder. It does not overwrite the backup of the previous session. So there are now three copies of the file, using up about 1 GB of disk space: the file you started with (which we call the Source), the backup from the first session, and now a backup for the new session.
'How to Save Records'. If this option is set to 'Save changed and unchanged to the same file' (the default), then all records in the file are saved whether they are changed or not. This means that if you are editing a file of 300,000 records and just make two or three changes, all 300,000 records will still be saved to the Edit Results. Again, this uses up disk space and takes time.
'Clean Up Edit Work Files'. This option was designed to to automatically delete these backups at specified time intervals so that you can recover the disk space used during previous Edit Sessions. By default, the Clean-Up option is set to 'Weekly'--i.e. MARC Report will preserve (at most) one calendar week of backup sessions. Again, if you are working with a very large file for a few days, with the default options, the 'Weekly' setting is not going to be of much help. On the other hand, if you set the Clean-Up option to 'Daily', it means that any backup files will be deleted the next day that the program is started, which effectively negates the purpose of having a backup.
The default settings for the options above are the optimum settings for the average MARC editing situation. You may also need to use these setting when editing very large files, depending on your objectives. If possible, however, we recommend the following settings when working with very large files: 1) How to Save Records--Changed records only. 2) How to Name the Edit Results--Prompt for a filename. 3) Clean Up Edit Work files--Weekly (no change).
THE JUMP INDEX AND VERY LARGE FILES
When running an Edit Session, the program first scans the file to create a jump index (a table of the starting offset for each record in the file). If you press the 'Cancel' button during this scan, the Edit Session is cancelled, because the jump index is required for editing.
ABOUT THE EDIT SESSION IDENTIFIER
When a file is edited in MARC Report, an 'Edit Session' identifier is created.
The format of this Edit Session Identifier is:
'S' + [mmdd] + '_' + [nn].
[mmdd] is the month and day that the Edit Session began. [nn] is a serial number; it is set to '01' at the beginning of each day, and increments each time an Edit Session is opened.
For example: S1107_02 (Edit Session #2 from November 7) This identifier appears in the lower left-hand corner of the editor.
The Edit Session Idenitifer is used to create a folder on your hard disk in the program's work directory. The Windows location of this work folder is CSIDL_LOCAL_APPDATA + 'MarcReport'.
A typical value for this folder would be: C:\Documents and Settings\[user]\Local Settings\Application Data\MarcReport
The program uses this folder to save work during an edit session. It contains some (or all) of the following files:
- an index of the offsets in the sourcefile
- a title listing of the sourcefile
- a backup copy of the sourcefile (if your Edit options are set to 'Always rename to source')
- a copy of every record that is edited
- a copy of the Edit results, before they are moved to the location specified in your Edit options (if your Edit options are not set to 'Always rename to source')
- a file containing any records that were deleted (if your Edit options have set the 'Save deleted records')