Marmot Catalog Deduping Training

Table of Contents

Marmot Duplicates Utility Process

There are many ways you might find duplicates, but the primary method is to use the Tableau duplicate reports. A duplicate report uses the 001, 020, and 020a to find duplicate values. If you are ever uncertain about a pair of possible duplicates, contact Lloyd at lloyd@marmot.org.

How to use a shared view in the Marmot Duplicates tableau reports

  • Go to the Duplicates utility you want to work on for this process

  • They can be found here:

  • The Report is password protected and will not display until you are signed in. Click on the Sign in to Tableau Server button.

  • The sign-in screen will appear. Sign in with this specific username and password for this process.

    • Username: dedupe

    • Password: marmdedupe

Everyone should sign in with this same login for deduping. Even if you have a different Tableau login, use this one for working on deduping.

001 Report

  • The 001 report finds duplicates in the BIB UTIL index including both 001 and 019.

  • You can narrow this report by Agency, Accounting Unit, Pickup Location, Holds Present, and Number of Holds. This allows you to find the duplicates relevant to your library.

  • The report may show the settings for for the last library who saved the view. You will want to change the settings for your library.

  • At the bottom of the page is a toolbar. Below are the definitions for the links on the toolbar.

    • Undo – This button allows you to cancel the previous action.

    • Redo – This button allows you to re-perform the previous action.

    • Revert - This option will revert the report to the last saved version of the view. It will remove any changes made during this session.

    • Refresh – This can be ignored because this data source is not live, so it can’t be refreshed this way.

    • Pause – This button would pause a data source refresh.

    • View: DefaultDeduping – This is the shared default view that is used for the deduping process.

    • Alert – This button is used to send a change report alert to one email account.

    • Share – This has an embed code and a link for the report.

    • Download – This button allows you to download the report as an Image, Data, Crosstab (CSV/Excel File) or PDF.

    • Full Screen – This button opens the report in a full screen.

 

  • When you find a duplicate you want to look up in Sierra, click on the number. That will bring up a small box where you can copy the number. Double click on the number to select it, then right click to bring up the menu where you can copy it.

  • Copy the index number (001 in this case) and search for the record in Sierra. From there you can dedupe in Sierra. That process is discussed below.

Once the record has been deduped in Sierra, or you have confirmed that the record is not a duplicate, this entry can be hidden by excluding it from the report. Excluding the entry will remove it so the next person who uses this report will not look to dedupe something that is no longer a duplicate or should not be deduplicated. This only works when everyone is using the same login. That’s why everyone should login with “dedupe.” It will be removed for all users on the dedupe login, so the other libraries that were included in the duplicate group will no longer see it. Click on record that you would like to exclude.

Click Exclude, and the line will be excluded.

Once you have excluded entries, you want to save the changes to the view. If you don’t save the changes, then the records you excluded will reappear for you and other users. You can save after every exclusion.

Go to the bottom of the page and click on the View:DefaultDeduping link.

Click the Save button to save all the entries that you excluded.

Top of page

OCLC guidelines

OCLC provides a very useful document that describes how they determine if records are duplicates. We follow the same rules so we can use their documentation. Even if you do not have OCLC, you can still use the OCLC Support & Training site. This is the document: https://www.oclc.org/bibformats/en/input.html. It is framed in terms of when you should create a separate record rather than when you should eliminate one, but it is the same issue, just in reverse.

When to Dedupe Bibliographic Records in Sierra

001 Duplicate Entries from Create Lists (Book/Serial Comparison)

In the Cataloging Function, search for the item. Open all the records with that number.

Tile the windows vertically by opening the Window menu and choosing Title Vertically. Choosing that will tile all the records you have open even if they are in the background. You can tile up to five windows. However, it is very hard to read them if you tile more than three.

Compare the Mat Type for the records. Then compare their 001 fields.

All of the 001 fields (Bib Utility) fields match. If the 001 matches, this is almost certainly a duplicate we can remove, unless something is very wrong with the record.

Sometimes the 001 fields may not match, but there is a match with the 019.

When this happens it means that OCLC, or SkyRiver if it is a SKY number, has combined records in their system because they realized they had a duplicate. These are a duplicate and should be deduped. The record with the matching number in the 019 will be the newer record. The one with the matching number in the 001 is the one that OCLC or SkyRiver deleted. Occasionally, you will find a match where the matching number is in the 019 of both records. In that case use the date in the 005 to pick the newer record or get a completely new record from the bib utility to replace them both.

The 005 field (updated date in OCLC) will often tell which record is newer. In this example, both records have the same 005 fields. Normally, if the 001 fields are the same, the record with the most recent 005 field would be the one that is chosen to keep.

No 001 match

If you don’t have a 001 you would check the 024 (UPC) or 020 (ISBN/ISSN) fields.

Here we don’t have an 024, but all of the 020 (ISBN/ISSN) fields match. In this example, they all match.

You would also compare the 100 (Author), 245 (Title) and the 250 (Edition) fields to see if they match. In this example, all those fields match.

You would also compare the 260 or 264 (Publisher) and 300 (Description/Pagination or Discs). In this example, all those fields match.

The 250 and 300 are particularly important to check. Sometimes publishers will reuse ISBNs in reprints or even new editions. So you can get duplicate numbers but different versions that need different records.

You can also compare all the notes and subject headings. In this example they all match.

Since all the fields match, these are exact duplicates and the extra record should be removed.

Deciding which duplicate bib to keep

First look at the 005. That is a good indicator of when that version of the record was created. Usually the newer record will be better. You can also check the Sierra Create Date field. Again the newer record is often better. Otherwise, pick the record that has more information such as extra notes or subject headings in one record. If they are mostly the same, then keep the one with the most Marmot holdings.

Transferring Attached Records

When you transfer to a new record, do not copy 995 fields from the old record to the new one.

Once you decide which bib is going to be kept, you should look for any 856 links or 020 or 024 (Standard number) fields that can be moved from the bib that is going to be deleted to the bib you are keeping. Since you would often combine records for hardback and paperback books, it is common to have both sets of numbers in one record. However, it is a good idea to identify what version each number comes from in a subfield |q.

The easiest way to move fields to the new bib record is to put them in the LOCAL INFO field group. Any fields in this field group will move over to the bib when you move attached records and delete the empty bib. 590 and 690 fields should already be in LOCAL INFO, so they should go over. However, you would need to change this for 856 or 020 or 024 fields that you want to move the the other record. In that case change the field group back to y or i after the move, so they get in the right index in the new record.

To move the attached records, open both bib records and click on Edit and Transfer attached from the record that is going to be transferred.

This will bring up the Find Bib and/or the bib number (i.e. b35944122). It knows this is the other bib, because the other bib window is open. If the window was not open, you could use the Find Bib to search for the bib number, but it is easier just to have them both open to begin with.

Once you click on the bib number, you will get the following pop-up box. The radio button will automatically be chosen for the 1st choice, to RETAIN source bib. You want the 2nd choice. Select Transfer all attached records, DELETE source bib and click OK. Another message box will appear letting you know how many items have been transferred.

If you were to retain the source bib, it would get cleaned up with other orphan bib records, but as long as the duplicate is in the system, there is a chance that another copy could be loaded which might create a third duplicate. So it is best to remove it now.

You can click on Summary to see all the items that are now attached to the remaining bib.

Under Summary you can check that your items are now attached as well.

If you go back to the bib record to look at Locations, you will not see any new locations until the next day. The information goes through an indexing process overnight.

On occasion you might have to transfer holds when transferring a record. Make sure to click on the 3rd radio button, Transfer all attached records, and holds, DELETE source bib.

Top of page

Marc Tags to compare for Deduping

001

Bib Utility Number

019

Outdated Bib Utility Number (indexed as Bib Utility)

020

ISBN

024

UPC (ISBN field group tag, UPC index)

035

Other Bib Utility Numbers (not indexed)

100

Author

245

Title

250

Edition Statement

260/264

Publication Place |b Publisher |c Date

300

Pagination – Pages or Discs |c Size |e Accompanying material

4xx

Series

5xx

Notes

505

Table of Contents

520

Summary

6xx

Subject Headings

7xx

Additional Authors, Illustration, Editors

856

URL

Important OCLC Fields

003

OCoLC – 001 is an OCLC number

SKY – 001 is a SkyRiver number

DLC – 001 is a Library of Congress control number

Other – 001 could be anything

If the 001 is a number with no prefix, try to check if it is an OCLC number. If it is please add OCoLC in the 003. If it is not OCLC either delete it, or add a prefix.

035

System Control Number

Members can put in control numbers from other systems

040

Cataloging Source

Top of page

Reasons Why Bibliographic Records Should not be Deduped

UPC Duplicate Entries (DVD Comparison Example)

Look for the record using the UPC code. This will show the indexed field and the item(s).

Open up all the records associated with the duplicate number.

Double click to open all the records. This will give you the bib records to view.

Click on Window and Tile Vertically. You can tile up to five windows.

This will tile the windows next to each other for comparison.

Some of the fields that you would check for comparison would be the MAT TYPE and the BIB UTIL#. OCLC (OCoLC – MARC 003) is the preferred record. Always select the OCLC record if it is an option.

You would also want to compare the 024 (UPC) and 028 (Publisher or Distributor Number) fields.

You should compare the 245 fields for Titles.

You would also compare the 264 fields (Publisher) information.

You also want to compare the 300 – 347 fields (Description).

Note: With videos, you have to check whether each records is DVD or Blu-ray. You cannot combine the two formats on one record. Sometimes the information is in the 245, 347, or 538 field (Note). It is also good to check the video Regions. Regions are encoded to play in certain countries. Different regions require different equipment to play. This is why they should not be combined on the same record. This information may be recorded in different fields. That does not mean the items are not the same thing. If the information is the same, the records are a match.

Another comparison is the 505 field (Note). The 505 field should to be transcribed from the item. If the two people who created these two records were really looking at the same manifestation, the 505 fields should not contradict. They may be more or less complete and still be a match as long as they don’t have contradictory information. If the note from one record is better, you can always move the information into the other record if you decide to keep that one for other reasons.

It is also good to compare the information listed in all of the 500 - 546 fields (Note). Look for information that will identify if these records have similarities or differences. There are three differences in these two records (widescreen, double-sided discs and closed-captioned). At this point, you might want to check OCLC (if applicable), or talk to someone from the other libraries to find out if their disc is also double-sided, for example. Some of these differences are ambiguous. The one on the left doesn’t say it’s NOT double-sided, and one or both of these catalogers may not have been clear on the subtle difference between subtitles and closed-captioning. That’s why it might be a good idea to check with the other owning library to see what they really have. As it stands, we are not going to call these two items duplicates without more information. With more information they could be determined to be duplicates.

Top of page

ISBN/ISN Duplicate Entries (Book Comparison Example)

Here is a search for the record using the ISN under the Catalog Function.

Open all the records associated with the ISBN/ISN. In this example, two of the hits were the same record, so there were only two windows to open.

Click on Window and Tile Vertically. You can tile up to five windows.

This will tile the windows next to each other for comparison.

You would first check the Mat Type.

Next, you would look at the bib utility number or the 001 field. These two are different, so we are going to look further into the record.

Next, you would look for an 019 field. The 019 contains merged 001 from duplicate records in OCLC. If the 001 numbers are different, do not dedup unless you find a matching number in a 019.

Another match point could be the 010 field, which is the Library of Congress control number.

Another match point could be the 028 field, which is the publisher's number.

You would also want to compare 100 fields (Author), and the 245 field (Title).

Compare the 505 field (Note). This is transcribed from the Table of Contents. The information is the same for both entries.

Compare the 300 field (Description). If both 300 fields had the number of pages, and size (22 cm), this would be a really good indicator that these are duplicate records. If the size was listed, anything more than 2 cm difference, would mean they are not duplicate records. Also, if the pages had a difference of more than 5 pages, this would mean they are not duplicate records. However, since this information is missing, we are still not sure if this is or is not a duplicate record.

You could also look at the 520 field (Note). This is the summary that the library has written. The information could come for the book cover, or the publisher. In this example, only one record had the 520 field.

You would also compare the 260 or 264 field (Publisher). In this case, the location and year are different. This is not necessarily definitive. Books sometimes list several locations on the title page, so different catalogers may have selected different locations from the same book. There can also be several dates on a book, so this can be confusing as well.

In this example, the publisher is listed as McGraw-Hill with two different locations, Maidenhead (a town in England) and New York. In this case, it would be best to look at the book to see if both locations are listed. If the item only had Maidenhead, this would be a different publication. If the item had both locations listed, this could be the same publication and different catalogers chose to list different locations.

We can also look into the publication date. The New York publisher information has a c2007, which means copyright 2007. The Maidenhead publisher has 2006 (without the letter c), which means it is the publication date not a copyright date. These records would need to be separate, because something is not going to be copyrighted in 2007 and published in 2006 (although the reverse is common). This may have been published in England first, and a year later in New York, so these look like separate publications.

This could also be confirmed by looking at the 008 field (MARC tag). It should match what is listed in the 260 field. We are not going to call these two items duplicates.

If you have access to OCLC, you could look up the 001 fields. In this case, OCLC did not combine these records. This is another good indication that these records should not be deduped.

Top of page

ISBN/ISN Duplicate Entries (Book & EBook Comparison)

Search for a record using the ISN number using the Catalog Function.

Open all the records associated with the ISBN/ISN.

Click on Window and Tile Vertically. You can tile up to five windows, but more than three are hard to read.

This will tile the windows next to each other for comparison.

The first thing you might notice from these records is that the Mat Types are different.

You would want to check the 300 field (Description) of the record. You want to check that the Mat Type of EBOOKS, to confirm that it is an online resource. You also want to check that the Mat Type of BOOK/SERIAL, to confirm that it is a physical book. If the 300 descriptions were the same then the problem would be with the Mat Type. In this case, the paper ISBN is simply on both records. We are not going to call these two items duplicates.

Top of page

ISBN/ISN Duplicate Entries (Unique Bib Utility)

Search for a record using the ISN number using the Catalog Function.

Open all the records associated with the ISBN/ISN.

Click on Window and Tile Vertically. You can tile up to five windows.

You will want to check the Mat Types first. They all match.

You would compare all the 001 fields (Bib Utility). You will notice that two of the Bib Utility numbers match. However, one of them begins with three letters. Marmot allows their members to intentionally create a duplicate record for their own internal processes. If you want to own the bib, you would put your initials, so no one else will touch it. The library that has created the duplicate could dedupe it.

Even though the first record should not be deduped, the person who is looking at these records, should compare the 005 fields (updated date in OCLC). The updated information shows that the first record was updated in 2016, while the third record was updated in 2012. You could use this information to realize there might be a newer version of the record to download, so you might replace your record with a newer one. At this point, we would close the window for the first record, and compare the other records.

You would check the 300 field (Description). You can see that one of the records is a book, and the other record is an EBook. However, the Mat Types for both were the same (Book/Serial). This probably just means that the both ISBN (book & eBook) are on the record. An owning library could change the Mat Type to match the 300 field. We are not going to call these two items duplicates.

Top of page

ISBN/ISN Duplicate Entries (SkyRiver vs OCLC)

Search for a record using the ISN number using the Catalog Function.

Open all the records associated with the ISBN/ISN.

Click on Window and Tile Vertically. You can tile up to five windows.

You would compare the Mat Type of the records. Both are listed as Visual Media.

The 001 (Bib Utility) and the 003 (Marc tag) shows that the first record is a SkyRiver record, and the second record is an OCLC record.

You could also compare the 028 fields (Publisher or Distributor Number). These records have the same 028 fields, so this is a match. Subfield a is the publisher’s number. Subfield b is whose number is it.

Another match could be the 035 field (Marc tag). This is where you can put in control numbers from other systems. Note, the OCLC number from record 2 appears here in the SkyRiver record. The SR record was originally this OCLC record.

The 019 fields (Control Number Cross-Reference) to see if there are any merged bib utility numbers. Both OCLC and Skyriver do this within their own systems. This information does not help with making a decision.

You can compare the 024 fields (UPC). This information is found on the item, so this would seem like these are duplicate records, but we can still check more fields.

You would compare the 260 fields (Publisher). The publishers listed are different, maybe because each cataloger saw something different on the box or credits. One could be the distributor, while the other is a production company. It is very difficult to tell with videos.

In this example, the 710 fields (Author/Illustrator) are basically the same. This usually means they are the same movie. However, we need to know if they are the same DVD. These do seem like the same records, but the publishers listed are different. This is where you may need to look at the physical item, or talk to someone from the other library or libraries to find out if it is the same bib record. We are not going to call these two items duplicates because of the publisher difference without more information. However, if you could see that the item listed both of these companies and could be easily confused you might conclude that they are duplicates.

If we did decide that they are duplicates, it would be best to create a second 001 field, and move the SkyRiver number to the OCLC record.

Top of page

Macros for de-duping

This system will tile several records for easy comparison. The apostrophes create a pause in the macro. More apostrophes will create a longer pause. You may need a longer or shorter pause depending on how fast your connection to the server is. There is an asterisk after every 99 apostrophes. So each set is 100 characters.

First, run the search. Click the first macro to open the results. Click it again to open the first record. Click it again to open the next record. Macro #2 will tile both of them. If there is a third record, press #1 again, before pressing #2. Then #2 will tile all three records. You could use this to tile 4 or 5 records, but when you tile so many records, each window gets very small and hard to work with. Sierra will not tile more than 5 records under any circumstances.

These macros may stop working with any upgrade to Sierra.

-De-duping Macro #1

%CTRL+SHIFT+b%,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*%DOWN%%ENTER%

-De-duping Macro #2

%ALT+i%%ENTER%

Top of page