Differences

This shows you the differences between two versions of the page.

Link to this comparison view

240:embobug [2013/04/27 09:09] (current)
Line 1: Line 1:
 +====== MARC Review and MARC Global: 'Embedded booleans' ======
 +
 +Many years ago, we 'invented' a syntax that would make it easy to pack multiple reviews 
 +into a single statement. See below for the manual entry for this syntax.
 +
 +Although we are going to try to keep this way of doing things functional, we want to point out that in most cases, using PCRE regular expressions //(the new way of doing things)//, might be the better choice.
 +
 +For example, consider this 'embedded boolean': 
 +
 +{{:240:embo1.png|}}
 +
 +Stringing together the starting numbers like this removes the need to make a separate pattern to match each number, and is a great improvement for the user. At runtime, though, the program splits these embedded patterns apart and runs the review just as if you had specified six different patterns:
 +
 +<code>
 +TAG=082 SUBF=a DATA=^71 REGEX=True
 +TAG=082 SUBF=a DATA=^72 REGEX=True
 +TAG=082 SUBF=a DATA=^73 REGEX=True
 +TAG=082 SUBF=a DATA=^74 REGEX=True
 +TAG=082 SUBF=a DATA=^306 REGEX=True
 +TAG=082 SUBF=a DATA=^646 REGEX=True
 +</code>
 +
 +So, there is performance hit with this way of doing things. Perhaps a more efficient way to write this review would be to combine the first four patterns into one--
 +<code>
 +TAG=082 SUBF=a DATA=^7[1234] REGEX=True
 +</code>
 +--so that the program only has to run three reviews on each 082, instead of six.
 +
 +But now we have PCRE (which we did not have 20 years ago). And we can rewrite the above as simply:
 +
 +{{:240:pcre-version.png|}}
 +
 +PCRE uses parens to group **sub-patterns**((this, in itself, is reason alone to use PCRE; the old MARC Review syntax had no support for subpatterns at all)), and a single pipe to separate them from one another. To the software this appears as a single review, and as it isn't manipulated by our software at runtime, its much faster. 
 +
 +It takes a bit of getting used to, but it should be worth it. 
 +
 +----
 +<code>EMBEDDED PATTERNS
 +
 +It is possible, and sometimes necessary, to specify multiple patterns in a single 
 +'DATA' pattern. This is done by stringing the patterns together, separating each
 +one with one of the boolean symbols listed below. 
 +
 +The following boolean symbols are supported within the DATA box:
 +
 + && = and
 + || = or
 + !! = not
 +
 +You can use the following English equivalents for the above interchangeably, as long as
 +they are enclosed in angle brackets (they are not case-sensitive):
 +
 + <and> = &&
 + <or> = ||
 + <not> = !!
 +
 +An example of each of these three boolean expressions follows.
 +
 +'And' example: 040 $d DLC<and>OCoLC
 + True if both 'DLC' and 'OCoLC' are present in $d in the same 040
 +
 +'Or' example: 035 $a OCoLC||TMQ
 + True if either 'OCoLC' or 'TMQ' are present in 035 $a
 +
 +'Not' example: 040 $d OCoLC<not>DLC
 + True if there is a $d 'OCoLC' and not a $d 'DLC' in the same 040
 +
 +These patterns can be combined with the standard Match Rules 'AND', and 'NOT' 
 +(eg. 'NOT 035 $a OCoLC||TMQ'). The Match Rule is applied AFTER the data is evaluated.
 +
 +NOTE: If you use a regular expression with an embedded boolean, it must be repeated for each 
 +argument. For example: 949 $a = '^PB<or>^PER' (The '^' is repeated before both PB and PER).
 +</code>
 +
  
240/embobug.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed