Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Anchor
topofpage
topofpage

...

There are many ways you might find duplicates, but the primary method is to use the Tableau duplicate reports. A duplicate report uses the 001, 020, and 020a to find duplicate values. If you are ever uncertain about a pair of possible duplicates, contact Lloyd at lloyd@marmot.org.

How to use a shared view in the Marmot Duplicates tableau reports

...

Compare the Mat Type for the records. Then compare their 001 fields.

...

All of the 001 fields (Bib Utility) fields match. If the 001 matches, this is almost certainly a duplicate we can remove, unless something is very wrong with the record.

...

Sometimes the001 fields may not match, but there is a match with the019.

...

When this happens it means that OCLC, or SkyRiver if it is a SKY number, has combined records in their system because they realized they had a duplicate. These are a duplicate and should be deduped. The record with the matching number in the 019 will be the newer record. The one with the matching number in the 001 is the one that OCLC or SkyRiver deleted. Occasionally, you will find a match where the matching number is in the 019 of both records. In that case use the date in the 005 to pick the newer record or get a completely new record from the bib utility to replace them both.

...

You would also compare the 100 (Author), 245 (Title) and the 250 (Edition) fields to see if they match. In this example, all those fields match.

...

You would also compare the 260 or 264(Publisher) and 300 (Description/Pagination or Discs). In this example, all those fields match.

...

001

Bib Utility Number

019

Outdated Bib Utility Number (indexed as Bib Utility)

020

ISBN

024

UPC (ISBN field group tag, UPC index)

035

Other Bib Utility Numbers (not indexed)

100

Author

245

Title

250

Edition Statement

260/264

Publication Place |b Publisher |c Date

300

Pagination – Pages or Discs |c Size |e Accompanying material

4xx

Series

5xx

Notes

505

Table of Contents

520

Summary

6xx

Subject Headings

7xx

Additional Authors, Illustration, Editors

856

URL

Important OCLC Fields

003

OCoLC – 001 is an OCLC number

SKY – 001 is a SkyRiver number

DLC – 001 is a Library of Congress control number

Other – 001 could be anything

If the 001 is a number with no prefix, try to check if it is an OCLC number. If it is please add OCoLC in the 003. If it is not OCLC either delete it, or add a prefix.

035

System Control Number

Members can put in control numbers from other systems

040

Cataloging Source

...

You would also want to compare the 024 (UPC) and 028(Publisher or Distributor Number) fields.

...

You would also compare the 264 fields (Publisher) information.

...

You also want to compare the 300 – 347 fields (Description).

...

Another comparison is the 505field (Note). The 505 field should to be transcribed from the item. If the two people who created these two records were really looking at the same manifestation, the 505 fields should not contradict. They may be more or less complete and still be a match as long as they don’t have contradictory information. If the note from one record is better, you can always move the information into the other record if you decide to keep that one for other reasons.

...

Next, you would look at the bib utility number or the 001 field. These two are different, so we are going to look further into the record.

...

Another match point could be the 028 field, which is the publisher's number.

You would also want to compare 100 fields (Author), and the 245field (Title).

...

Compare the 505 field (Note). This is transcribed from the Table of Contents. The information is the same for both entries.

...

Compare the300 field (Description). If both 300 fields had the number of pages, and size (22 cm), this would be a really good indicator that these are duplicate records. If the size was listed, anything more than 2 cm difference, would mean they are not duplicate records. Also, if the pages had a difference of more than 5 pages, this would mean they are not duplicate records. However, since this information is missing, we are still not sure if this is or is not a duplicate record.

...

You could also look at the 520 field (Note). This is the summary that the library has written. The information could come for the book cover, or the publisher. In this example, only one record had the 520field.

You would also compare the 260or 264 field (Publisher). In this case, the location and year are different. This is not necessarily definitive. Books sometimes list several locations on the title page, so different catalogers may have selected different locations from the same book. There can also be several dates on a book, so this can be confusing as well.

...

This could also be confirmed by looking at the 008 field (MARC tag). It should match what is listed in the 260 field. We are not going to call these two items duplicates.

...

You would want to check the 300 field (Description) of the record. You want to check that the Mat Type of EBOOKS, to confirm that it is an online resource. You also want to check that the Mat Type of BOOK/SERIAL, to confirm that it is a physical book. If the 300 descriptions were the same then the problem would be with the Mat Type. In this case, the paper ISBN is simply on both records. We are not going to call these two items duplicates.

...

Even though the first record should not be deduped, the person who is looking at these records, should compare the 005 fields (updated date in OCLC). The updated information shows that the first record was updated in 2016, while the third record was updated in 2012. You could use this information to realize there might be a newer version of the record to download, so you might replace your record with a newer one. At this point, we would close the window for the first record, and compare the other records.

...

You would check the 300 field (Description). You can see that one of the records is a book, and the other record is an EBook. However, the Mat Types for both were the same (Book/Serial). This probably just means that the both ISBN (book & eBook) are on the record. An owning library could change the Mat Type to match the 300 field. We are not going to call these two items duplicates.

...

You could also compare the 028 fields (Publisher or Distributor Number). These records have the same 028 fields, so this is a match. Subfield a is the publisher’s number. Subfield b is whose number is it.

...

Another match could be the 035 field (Marc tag). This is where you can put in control numbers from other systems. Note, the OCLC number from record 2 appears here in the SkyRiver record. The SR record was originally this OCLC record.

...

The 019 fields (Control Number Cross-Reference) to see if there are any merged bib utility numbers. Both OCLC and Skyriver do this within their own systems. This information does not help with making a decision.

...

You can compare the 024 fields (UPC). This information is found on the item, so this would seem like these are duplicate records, but we can still check more fields.

...

You would compare the 260 fields (Publisher). The publishers listed are different, maybe because each cataloger saw something different on the box or credits. One could be the distributor, while the other is a production company. It is very difficult to tell with videos.

...

In this example, the 710 fields (Author/Illustrator) are basically the same. This usually means they are the same movie. However, we need to know if they are the same DVD. These do seem like the same records, but the publishers listed are different. This is where you may need to look at the physical item, or talk to someone from the other library or libraries to find out if it is the same bib record. We are not going to call these two items duplicates because of the publisher difference without more information. However, if you could see that the item listed both of these companies and could be easily confused you might conclude that they are duplicates.

...