How to use left function in Pyspark

Text fields require good amount of cleaning before starting data analysis. Pyspark has many functions that helps working with text columns in easier ways. There can be a requirement to extract letters from left in a text value, in such case substring option in Pyspark is helpful. In this article we will learn how to use left function in Pyspark with the help of an example.

Emma has customer data available for her company. There is one gender column available in the Dataframe. She is looking forward to extract first letter from the column. For example, for “Male” she is looking to create a new Gender column with value “M”.

How to use left function in Pyspark

Left Function in Pyspark Dataframe

  • Step 1: Import all the necessary modules.
import pandas as pd
import findspark
findspark.init()
import pyspark
from pyspark import SparkContext
from pyspark.sql import SQLContext 
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)
from pyspark.sql.functions import col, substring
  • Step 2: Use str[] operation to extract left value. Firstly, mention the Dataframe and Field Name. str[:1], extracts 1 letter from left, in case you need to extract 4 letters from left then use [:4]. For the current example, syntax is:
Customer_Data = Customer_Data.withColumn('Gender Updated', substring('Gender', 1,1))
  • Step 3: Check the output data quality to assess the observations in final Dataframe.
Customer_Data.show(15)
How to use left function in Pyspark

Thus, Emma is able to extract letters from left as per her requirement in Pyspark. This kind of extraction can be a requirement in many scenarios and use cases. This example talks about one of the use case.

To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark. 

Visit us below for video tutorial:

📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!

  • 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
  • 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
  • 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.

💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.

👉 Subscribe here:

Related Posts