How to calculate minimum value by group in Pyspark

Aggregation of fields is one of the basic necessity for data analysis and data science. Pyspark provide easy ways to do aggregation and calculate metrics. Finding minimum value for each group can also be achieved while doing the group by. The function that is helpful for finding the minimum value is min(). The below article explains with the help of an example How to calculate minimum value by Group in Pyspark.

John has store sales data available for analysis. There are five columns present in the data, Geography (country of store), Department (Industry category of the store), StoreID (Unique ID of each store), Time Period (Month of sales), Revenue (Total Sales for the month). John is looking forward to calculate minimum revenue for each stores. As there are 4 months of data available for each store, there will be one minimum value out of the four.

How to calculate minimum value by group in Pyspark

Find the minimum sales for each store in Pyspark

  • Step 1: Firstly, Import all the necessary modules.
import pandas as pd
import findspark
findspark.init()
import pyspark
from pyspark import SparkContext
from pyspark.sql import SQLContext 
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)
  • Step 2: Then, Use min() function along with groupby operation. As we are looking forward to group by each StoreID, “StoreID” works as groupby parameter. The Revenue field contains the sales of each store. To find the minimum value, we will be using “Revenue” for minimum value calculation. For the current example, syntax is:
df1.groupBy("StoreID").agg({'Revenue':'min'}).show()
How to calculate minimum value by group in Pyspark

Example 2: Calculate Minimum value for each Department

  • Here we are looking forward to calculate the minimum value across each department. So, the field in groupby operation will be “Department”
df1.groupBy("Department").agg({'Revenue':'min'}).show()
How to calculate minimum value by group in Pyspark

Thus, John is able to calculate value as per his requirement in Pyspark. This kind of extraction can be a requirement in many scenarios and use cases. This example talks about one of the use case.

To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark. 

Visit us below for video tutorial:

📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!

  • 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
  • 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
  • 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.

💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.

👉 Subscribe here:

Related Posts

3 thoughts on “How to calculate minimum value by group in Pyspark

  1. My developer is trying to persuade me to move to .net from PHP. I have always disliked the idea because of the expenses. But he’s tryiong none the less. I’ve been using WordPress on a variety of websites for about a year and am concerned about switching to another platform. I have heard great things about blogengine.net. Is there a way I can transfer all my wordpress content into it? Any kind of help would be really appreciated!

  2. Thanks for your write-up. Another factor is that being photographer includes not only problems in recording award-winning photographs but also hardships in establishing the best dslr camera suited to your needs and most especially situations in maintaining the standard of your camera. It is very accurate and evident for those photography lovers that are in to capturing the particular nature’s exciting scenes — the mountains, the actual forests, the actual wild or perhaps the seas. Visiting these adventurous places definitely requires a camera that can live up to the wild’s hard area.

  3. My brother recommended I might like this blog. He was totally right. This post actually made my day. You cann’t imagine simply how much time I had spent for this information! Thanks!

Comments are closed.