PLP Standard normalization 

PLP normalization occurs at the subfield level. Therefore, except as noted elsewhere, tags and indicators are ignored. 

The MARC subfield delimiter and the subfield that follows it are replaced by a single blank space. 

If the corresponding crosscheck option is selected, MARC-8 encoded data is converted to UTF-8. 

Subscripts, superscripts, and a number of diacritics are normalized as described in Appendix A of this document: 

All remaining data is shifted to uppercase. 

The following characters are deleted: single quote, apostrophe, right and left brackets, vertical bar. 

All punctuation characters–except for the pound sign, ampersand, and plus sign–are replaced with a blank space. 

Double blank spaces are replaced with single blank spaces. 

Leading and trailing blanks are deleted. 

If the field being normalized is not a title, and it begins with 'THE ', 'AN ', or 'A ', the article and following blank space are deleted. 

plp/normalization.txt · Last modified: 2013/04/27 09:09 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed