Pandas offers some great functions to process a dataset. In a data file there can be duplicates available at row level. Droping duplicates becomes very important, as the rows will create noise in any analysis. Some time the duplicates can be present for many rows. This article covers steps to drop duplicates Pandas Dataframe. The example focuses on data with duplicate at complete record level.
Emma has Customer Data available with her in Pandas. She founds that data has many duplicate records. For example ID 4 is repeating, ID 7 is repeating.
Key steps to remove duplicate values:
- Step 1: Check the shape of Dataframe to know number of rows.
Customer_data.shape
- Step 2: Use drop_duplicates function, with parameter keep=’first’. This will keep the first row of the duplicate records and drop rest. In case you want to keep last record then mention keep = ‘last’. Press CTR+Enter, or click Run button on the top.
Customer_data_2 = Customer_data.drop_duplicates(keep='first')
- Step 3: To check if the new data still has duplicate record or not, you can check the head rows. Also you can check the shape of the dataframe.
Customer_data_2.head(10) Customer_data_2.shape
As we can see above ID 4 and 7 which had duplicates earlier are dropped now.
Please note that as we are dropping duplicates by checking duplicity at complete row level, so we are not using the first parameter of drop_duplicates function i.e. subset.
Thus, Emma is able to create single Dataframe as per her requirement without duplicates in Python.
To get top certifications in Python and build your resume visit here. Also, you can read books listed here to build strong knowledge around Python.
Watch our video tutorial to learn more:
Looking to practice more with this example? Drop us a note, we will email you the Code file:
📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!
- 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
- 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
- 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.
💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.
👉 Subscribe here:
One thought on “Steps to drop duplicates Pandas Dataframe”
Comments are closed.