Changes to MARC Analysis in 235

Two new statistics were added to the top section of the report:

  1. A counter for the longest field in the file
  2. A counter for the most repeated subfield in the file

Along with these new overall numbers, we've added an RSN1) to the report in a few places so that it is easier to track back to the specific records generating these statistics.

The following example shows the two new statistics (at the bottom), along with the RSN information:

File size: 23558898 bytes

MARC record count: 13348

Average record length: 1764

Mean Average record length: 1973 (11 records with this length)

Shortest record length: 649  (in record number 1082)

Longest record length: 8391  (in record number 8487)

Longest field in file: 2001  (in tag 505 of record number 4999)

Most repeated subfield in file: 505 $t  (105 times in record number 5043)

Changes to the format

We made a small change to the formatting of the section of the report that lists the Indicators and the Subfields for each Tag.

Prior to version 235, the Tag area would look something like this:

--------------------------------------------------------------------
Tag    Records   TotOccs   MaxOccs   AvgSize   Longest  Shortest
041       1086      1086         1        15        83         8
--------------------------------------------------------------------
Indicator1   Indicator2   Subfields    Occ  1       2       3      4+
#:       2   #:    1086   a:     340      149      50      22       6
0:     938                b:      27       13       3       0       2
1:     146                d:     706      294     109      28      25
                          e:     276      136      34      13       8
                          f:       1        1       0       0       0
                          g:     792      596      23      39       8
                          h:     278      185      21       7       7

In version 235, it will now look like this:

--------------------------------------------------------------------
Tag    Records   TotOccs   MaxOccs   AvgSize   Longest  Shortest
041       1086      1086         1        15        83         8
--------------------------------------------------------------------
Indicator1   Indicator2   
#:       2   #:    1086
0:     938             
1:     146             

Subfields    Occ  1       2       3       4       5       6+
a:     340      149      50      22       5       1       0
b:      27       13       3       0       2       0       0
d:     706      294     109      28      18       5       2
e:     276      136      34      13       7       1       0
f:       1        1       0       0       0       0       0
g:     792      596      23      39       7       1       0
h:     278      185      21       7       5       2       0

The new format is not as compact as the old, but it is a bit less cluttered, and allows more individual Subfield occurrence stats to be displayed (within the single page format which has always been our guideline).

If you would prefer to keep the old format, start MARC Analysis, click the Options button, select the Output Options page, and check the box labelled 'Display Indicators and Subfields in the same section':

The default for this option is False (ie. to use the new format).

1)
record sequence number
235/ma_changes.txt · Last modified: 2021/12/29 16:21 (external edit)
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki