How to uppercase in Pyspark

Keeping text in right format is always important. The data coming out of Pyspark eventually helps in presenting the insights. In case the texts are not in proper format, it will require additional cleaning in later stages. Fields can be present as mixed case in the text. The objective is to create a column with all letters as upper case, to achieve this Pyspark has upper function. Pyspark string function str.upper() helps in creating Upper case texts in Pyspark. In this article we will learn how to do uppercase in Pyspark with the help of an example.

Emma has customer data available with her for her company. She has Gender field available. The field is in Proper case. She wants to create all Uppercase field from the same. For example, for “Male” new Gender column should look like “MALE”

How to uppercase in Pyspark

Pyspark Capitalize All Letters

  • Step 1: Import all the necessary modules.
import pandas as pd
import findspark
findspark.init()
import pyspark
from pyspark import SparkContext
from pyspark.sql import SQLContext 
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)
import pyspark.sql.functions as func
  • Step 2: Use sql.functions upper function to convert text to upper case. To use this function, pass the column name along with Dataframe which helps to identify column for upcase. Here is the syntax to upcase ‘Gender’ column.
Customer_Data = Customer_Data.withColumn("Gender_Updated",func.upper(func.col("Gender")))
  • Step 3: Check the output data quality to assess the observations in final Dataframe.
Customer_Data.show(15)
How to uppercase in Pyspark

Thus, Emma is able to create column in Dataframe as per her requirement in Pyspark.

To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark. 

Visit us below for video tutorial:

📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!

  • 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
  • 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
  • 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.

💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.

👉 Subscribe here:

Related Posts