Pyspark enables processing of big data sets, at the same time enable processing of complex queries as well. Machine learning algorithm, statistical algorithms are easy to deploy with the help of Pyspark. Before running an algorithm, cleaning of data is very important. Sometime number of column present in data is to high. Dropping some of the unnecessary column becomes important. This article explains Pyspark drop column – Easy steps.
Amy has employee data available with her. She is looking to drop few columns, as she needs only basic details like name and gender in her dataset.
Below are the key steps to drop necessary columns:
- Step 1: Import all the necessary modules and set Spark/SQLContext.
import pandas as pd import findspark findspark.init() import pyspark from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext("local", "App Name") sql = SQLContext(sc)
- Step 2: Use Pyspark Dataframe.drop function to drop columns. As here we are looking to drop only one column. We will add the column in paranthesis. To illustrate, below is the syntax for the example:
Emp_Data2=Emp_Data.drop("Employee_ID")
- Step 3: Check data quality after dropping the column above. Use show() command to see top rows of Pyspark Dataframe.
Emp_Data2.show()
- Example 2: Another example, where we are trying to drop multiple columns. As it can be seen to drop more than 1 column, column names can be added sequentially in paranthesis.
Emp_Data2=Emp_Data.drop("Employee_ID","Skills")
To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark.
Visit us below for video tutorial:
Looking to practice more with this example? Drop us a note, we will email you the Code file:
📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!
- 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
- 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
- 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.
💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.
👉 Subscribe here: