Quality of data can be good or can some time not be good enough as per expectations. There may be some data cleaning requirement for many cases. Sometime the column names are not up to the mark and can have unwanted characters. It may also happen that the name is different from what the actual column stores. In these cases we may have to rename the columns. Below article discusses step by step process of renaming columns in Pyspark.
Amy has customer Data file for her company available with her. She founds that column like Customer ID, Names has spaces in it. To do efficient coding, she thought its good to replace all the spaces with underscore _ symbol.
- Step 1: Import all the necessary modules. Also set SparkContext and SQLContext.
import findspark findspark.init() import pyspark from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext("local", "App Name") sql = SQLContext(sc)
- Step 2: Use withColumnRenamed function to change name of the columns. this function requires two arguments, first being the old name and second being the new name. So for example we are looking forward to change name from “Customer ID” to “Customer_ID”. So the arguments would be (“Customer ID”,”Customer_ID”).
Customer_Data1 = Customer_Data.withColumnRenamed("Customer ID", "Customer_ID")\ .withColumnRenamed("First Name", "First_Name")\ .withColumnRenamed("Last Name", "First_Name")
- Step 3: Check the new available data if the column is properly renamed. Use show() command to see top rows of Pyspark Dataframe.
Customer_Data1.show()
Amy will be able to work on the newly available data now.
To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark.
Visit us below for video tutorial:
Looking to practice more with this example? Drop us a note, we will email you the Code file:
📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!
- 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
- 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
- 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.
💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.
👉 Subscribe here: