Text fields require good amount of cleaning before starting data analysis. Pyspark has many functions that helps working with text columns in easier ways. There can be a requirement to extract letters from left in a text value, in such case substring option in Pyspark is helpful. In this article we will learn how to use left function in Pyspark with the help of an example.
Emma has customer data available for her company. There is one gender column available in the Dataframe. She is looking forward to extract first letter from the column. For example, for “Male” she is looking to create a new Gender column with value “M”.
Left Function in Pyspark Dataframe
- Step 1: Import all the necessary modules.
import pandas as pd import findspark findspark.init() import pyspark from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext("local", "App Name") sql = SQLContext(sc) from pyspark.sql.functions import col, substring
- Step 2: Use str[] operation to extract left value. Firstly, mention the Dataframe and Field Name. str[:1], extracts 1 letter from left, in case you need to extract 4 letters from left then use [:4]. For the current example, syntax is:
Customer_Data = Customer_Data.withColumn('Gender Updated', substring('Gender', 1,1))
- Step 3: Check the output data quality to assess the observations in final Dataframe.
Customer_Data.show(15)
Thus, Emma is able to extract letters from left as per her requirement in Pyspark. This kind of extraction can be a requirement in many scenarios and use cases. This example talks about one of the use case.
To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark.
Visit us below for video tutorial:
📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!
- 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
- 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
- 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.
💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.
👉 Subscribe here: