
Find Total Number of Stores in Each Department
import pandas as pd
import findspark
findspark.init()
import pyspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)
import pyspark.sql.functions as func
df1.groupby('Department').agg(func.expr('count(distinct StoreID)')\
.alias('Distinct_Stores')).show()

Example 2: Calculate Total Number of Stores for Each Geograpphy
df1.groupby('Geography').agg(func.expr('count(distinct StoreID)')\
.alias('Distinct_Stores')).show()

Thus, John is able to calculate value as per his requirement in Pyspark. This kind of extraction can be a requirement in many scenarios and use cases. This example talks about one of the use case.
To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark.
📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!
- 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
- 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
- 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.
💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.
👉 Subscribe here:
