Almost every organization and business uses Excel spreadsheets to store their data about employees, clients, and projects. Excel offers versatile tools and features for data storage, manipulation, and analysis. You’ll have a competitive edge over others when you strengthen your skills with courses in Excel and Microsoft Office.
As the organization expands, the data will expand as well. There’s a chance of data being duplicated, making it hard to search through to get the information you want.
You can easily find and eliminate duplicates manually when the spreadsheet is small, but when it has a lot of data and is shared with many users, that’s a much bigger challenge. Excel provides many features to find, highlight, and remove duplicates.
In this guide, you will learn how to highlight and find duplicates. You will also learn how to remove duplicates in Excel using functions and the ribbon features.
How to Find Duplicates in Excel
Let’s take the example of a spreadsheet containing sales data where the data repeats at the end. This might lead to the wrong sales figures, as the data in rows 3 and 4 are repeated in rows 10 and 11. You might not notice it in most cases and calculate the total sales each month or in the first quarter.
A simple way to find duplicate cells is to use conditional formatting to find and highlight duplicates. Select Home → Conditional Formatting → Highlight Cells Rules → Duplicate Values.
You get the dialogue box that prompts you to choose the cells you want to format. You can choose to highlight the cells that are duplicated or unique. Also, choose the formatting style from the drop-down menu.
Choose any of the formatting styles you want, then click OK. When you have completed this, you will find rows 3, 4, 10, and 11 are highlighted with changes in the cell’s colour and the text colour.
If you wish to remove the formatting you have applied to the cells, select Home → Conditional Formatting → Clear Rules → Clear Rules from Selected Cells. The cells revert back to the previous form having duplicate data in the table.
How To Remove Duplicates in Excel
Conditional formatting helps you to identify the presence of duplicates in the table and to eliminate them, which can be done by following these steps. Select the table or the spreadsheet where you wish to remove duplicates. Then, select the Data tab, and in the Data Tools group, select Remove Duplicates.
You get the dialogue box listing the columns you have in the table. Choose one of the columns to check for duplicates and to remove them. You have two options that you can use to select the columns to check for duplicates.
After checking the column for duplicates, click OK. You can see the duplicated row removed from the table and the message displayed below.
How to Count Unique Values in Excel
Excel does not provide a predefined function to count unique or distinct values. But, there are several methods to separate the unique values and return their count.
Count Unique Values Using Functions
There are functions to count unique values in Excel, using the combination of IF, SUM, FREQUENCY, COUNTIF, and LEN.
- IF – checks whether a condition is met and returns one value if a condition is true and another if the condition is false.
- SUM – adds all the numbers in the range.
- FREQUENCY – calculates how often values occur within a range of values, then returns a vertical array of numbers.
Let’s look at how this function works. The execution starts from the inner function: frequency. The function frequency () works on numeric values only. It does not accept text and zero values. In the above table, the frequency of occurrence of Emp_ID is found. For the first occurrence of the number, this function returns a number 1. For the next subsequent occurrence of the same number, it returns a zero.
The formula = SUM(IF(FREQUENCY(A2:A7, A2:A7)>0,1)) then proceeds to display the sum of the values returned by frequency().
The COUNTIF() is executed first as it is the innermost function. Following it, “1/” and the SUM() are executed. The COUNTIF() has two parameters, a range of values (A2:A7) supplied twice. This makes the function count the number of times a value appears in this range. There are six values in the range A2:A7. Therefore, the intermediate step resolves as follows:
Here, the Emp ID 8422 occurs twice. So, the array contains the value ‘2’ in the place where it occurs, and other values occur once. Therefore, the value of 1 appears in the array. The formula proceeds as follows, ½ = 0.5+ 1/1/=1+ 1/1/=1+½=0.5 + 1/1/=1+ 1/1/=1. This gives the value 5, which is the count of unique occurrences of values in the table.
Advanced Filtering Options
Advanced filtering options in Excel quickly filter a data set based on the selection or any other criteria, such as a text or a number.
Count the Number of Unique Values by Using Filter Option
Sometimes, you would need unique values in the table and the count of them, rather than finding the duplicates and removing them. Use the Advanced Filter option to extract the unique values from a column and paste them into a new location. Later, use the ROWS function to count the number of values in the new location.
Select the range of cells along with the column heading. Select Data → Sort & Filter → Advanced. You get the Advanced Filter dialogue box. Select the option Copy to Another Location. In the List range, you can find the cell reference of the table starting to ending location.
In the Copy to field, mention the cell reference where the unique values are to be pasted. Do not forget to select the Unique records only checkbox, then click OK.
The unique values are copied to the new location that you have mentioned. Place your cursor below the last value and enter the ROWS function. This function will return the count of values in the range mentioned.
For instance, a copy of the table with unique values is created at cell D2:D9. The formula =ROWS (D2:D9) is used to count the values.
You can count the number of values returned by the advanced filter option. There’s an alternate formula to count the number of unique values in the column using the Quick Analysis option. When you select all the rows, you get an icon; click on it, and you get some formatting options. Click the Totals that return the row count.
Formulas to Remove Duplicates in Excel
It is easy to remove duplicates using the option in the Excel ribbon by selecting Remove Duplicates in the Data Tools group in the Data tab.
Remove Duplicates Using UNIQUE()
However, when you insert or delete a value in the table, you need to redo the operation again and again. Instead, use the function =UNIQUE(), which returns the table without duplicate values.
The formula takes a range of values and returns the unique values alone. You can use this function on two columns to return the unique values in both columns.
You can insert a new value or delete a value from the table and need not refresh it to see the results. The changes reflect in the table automatically.
Remove Duplicates Using IF() and COUNTIF()
There is no direct formula to remove the duplicates, but you can identify the duplicate values and delete them from the table or save them separately. The steps involved are to find the duplicate values using a formula, segregate the values, and then remove them from the list.
To find the duplicate values, use the formula =IF(COUNTIF($A$2:$A$7,A2)>1,”Duplicate”,”Unique”). This formula will display the duplicated value as Duplicate and the unique value as Unique.
Use the Filter option in the Home tab to filter the duplicate values. You get a list that you can delete or save elsewhere.
How To Remove Duplicates in Excel with Power Query
Power Query is an advanced Excel tool used for ETL (Extract, Transform, and Load). Using Power Query, you can obtain data from different sources and apply transformations. It provides a data cleansing mechanism to reshape it according to your needs.
Follow these steps to remove duplicates using Power Query.
Step 1: Create a table by highlighting the rows and selecting Insert → Table. Or, you can press CTRL+T to create a table. Check the option My table has headers.
Step 2: Go to the Data tab, and in the Get & Transform Data, select the option From Table/Range.
Step 3: Power Query Editor opens, where you can perform the necessary transformations.
Step 4: Click on Home → Remove Rows → Remove Duplicates.
Step 5: In the Query Settings tab, you get the message ‘Removed Duplicates.’
Step 6: After this response, Click Home → Close & Load.
Step 7: You get the table with duplicates removed in a new sheet.
1. How to count duplicates in Excel?
The easiest way to count duplicates in Excel is to use COUNTIF(). This function counts the number of cells within the specified range that meets the criteria. The formula is =COUNTIF( A2:A16, “Monitor”).
2. What is a Power Query?
Power Query is a business intelligence tool to import data from other sources such as SQL, Oracle, and Microsoft Azure to clean, transform, and load the data.
Duplicate values in a table can cause a lot of trouble and takes a lot of time to cleanse. This article gives a detailed explanation of how to identify, filter, and remove duplicate values. Check out Excel formulas – list and guide to master new skills in Excel. You can also check out our courses in Excel and Microsoft Office Applications to learn more about Excel functions and formulas. You can enroll in all these courses on our website and earn micro-credentials once you complete them.