How to lowercase in Pyspark

Keeping text in right format is always important. The data coming out of Pyspark eventually helps in presenting the insights. In case the texts are not in proper format, it will require additional cleaning in later stages. Fields can be present as mixed case in the text. The objective is to create column with all letters as lower case, to achieve this Pyspark has lower function. Pyspark string function str.lower() helps in creating lower case in Pyspark. In this article we will learn how to do lowercase in Pyspark with the help of an example.

Emma has customer data available with her for her company. She has Email field available. The field is all Upper case. She wants to create lower case field from the same. For example, for “[email protected]” new email should look like “[email protected]

How to lowercase in Pyspark

Pyspark Lower Case Example

  • Step 1: Import all the necessary modules.
import pandas as pd
import findspark
findspark.init()
import pyspark
from pyspark import SparkContext
from pyspark.sql import SQLContext 
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)
import pyspark.sql.functions as func
  • Step 2: Use sql.functions lower to convert text to lower case. To use this function, pass the column name along with Dataframe which helps to identify column for lower case. Here is the syntax to lower case ‘Email’ column.
Customer_Data = Customer_Data.withColumn("Email_Updated",func.lower(func.col("Email")))
  • Step 3: Check the output data quality to assess the observations in final Dataframe.
Customer_Data.show(15)
How to lowercase in Pyspark

Thus, Emma is able to create column in Dataframe as per her requirement in Pyspark.

To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark. 

Visit us below for video tutorial:

📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!

  • 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
  • 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
  • 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.

💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.

👉 Subscribe here:

Related Posts