MARC Review and MARC Global: 'Embedded booleans'

Many years ago, we 'invented' a syntax that would make it easy to pack multiple reviews into a single statement. See below for the manual entry for this syntax.

Although we are going to try to keep this way of doing things functional, we want to point out that in most cases, using PCRE regular expressions (the new way of doing things), might be the better choice.

For example, consider this 'embedded boolean':

Stringing together the starting numbers like this removes the need to make a separate pattern to match each number, and is a great improvement for the user. At runtime, though, the program splits these embedded patterns apart and runs the review just as if you had specified six different patterns:

TAG=082 SUBF=a DATA=^71 REGEX=True
TAG=082 SUBF=a DATA=^72 REGEX=True
TAG=082 SUBF=a DATA=^73 REGEX=True
TAG=082 SUBF=a DATA=^74 REGEX=True
TAG=082 SUBF=a DATA=^306 REGEX=True
TAG=082 SUBF=a DATA=^646 REGEX=True

So, there is performance hit with this way of doing things. Perhaps a more efficient way to write this review would be to combine the first four patterns into one–

TAG=082 SUBF=a DATA=^7[1234] REGEX=True

–so that the program only has to run three reviews on each 082, instead of six.

But now we have PCRE (which we did not have 20 years ago). And we can rewrite the above as simply:

PCRE uses parens to group sub-patterns1), and a single pipe to separate them from one another. To the software this appears as a single review, and as it isn't manipulated by our software at runtime, its much faster.

It takes a bit of getting used to, but it should be worth it.


EMBEDDED PATTERNS

It is possible, and sometimes necessary, to specify multiple patterns in a single 
'DATA' pattern. This is done by stringing the patterns together, separating each
one with one of the boolean symbols listed below. 

The following boolean symbols are supported within the DATA box:

	&& = and	
	|| = or 	
	!! = not	

You can use the following English equivalents for the above interchangeably, as long as
they are enclosed in angle brackets (they are not case-sensitive):

	<and>	= &&	
	<or>	= ||	
	<not>	= !!	

An example of each of these three boolean expressions follows.

'And' example: 040 $d DLC<and>OCoLC		
	True if both 'DLC' and 'OCoLC' are present in $d in the same 040

'Or' example: 035 $a OCoLC||TMQ		
	True if either 'OCoLC' or 'TMQ' are present in 035 $a

'Not' example: 040 $d OCoLC<not>DLC		
	True if there is a $d 'OCoLC' and not a $d 'DLC' in the same 040

These patterns can be combined with the standard Match Rules 'AND', and 'NOT' 
(eg. 'NOT 035 $a OCoLC||TMQ'). The Match Rule is applied AFTER the data is evaluated.

NOTE: If you use a regular expression with an embedded boolean, it must be repeated for each 
argument. For example: 949 $a = '^PB<or>^PER' (The '^' is repeated before both PB and PER).
1) this, in itself, is reason alone to use PCRE; the old MARC Review syntax had no support for subpatterns at all
240/embobug.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed