X260B Crosscheck §

The purpose of this crosscheck is to prevent records representing different productions, publications, etc. of an item from matching.  

X260B often fails because of variations in the form of the publisher given in the 260 subfield $b. For this reason, the X260B is a good target for PLP's synonym rules. PLP also performs an unusual amount of processing to this crosscheck in an effort to reduce the number of XCFails generated by it. 

Back To Top

Pre-processing §


Back To Top

Data extraction §

  • Publisher names are extracted from MARC 260 subfield $b; all occurrences of subfield $b are used 
  • If the extracted data contains a semi-colon, the string will be broken at each ';', creating additional publisher strings 
  • It is typical (after normalization–see section that follows) for each record to egnerate several publisher strings 

Back To Top

Normalization §

  • Any data enclosed within square brackets is deleted 
  • The following common strings are deleted: X260B Stopwords 
  • If the resulting string is < 3 characters or > 64, processing stops and an empty publisher is returned 
  • The following common phrases are then deleted: ' & COMPANY INC', ' & CO INC', ' & COMPANY', ' AND COMPANY INC', ' AND CO INC', ' AND COMPANY', ' AND CO', ' INC', ' COMPANY', ' CO', ' & CO' 
  • If the result contains any blank spaces, an attempt is made to extract the most meaningful term from the string and add the result as a separate 'publisher'. For example: 
      if we have:       this step will create an additional item for:      
      A KNOPF           KNOPF
      J P GETTY         GETTY
      JOHN P GETTY      GETTY 

During this special search for 'most meaningful' terms, the following processing occurs: 

  • the words 'AND', 'FOR', 'OF' are deleted (the three most common words in the 260 $b) 
  • of the remaining words, only the first three are retained 
  • a second list of stopwords is applied to the third word 

Back To Top

Processing rules §

  • If the publisher string for either record is blank, the crosscheck passes 
  • If any of the publisher strings from one record match any publisher string in the other, the crosscheck passes 
  • Otherwise, the crosschek fails 
plp/crosschecks/x260b.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed