Pandas groupby difference between first and last

Aggregation of data is necessary to summarize and analyze the results. Groupby function in Pandas helps in grouping the data and further aggregation. Summarization can be done for counting rows, getting sum, maximum value, minimum value etc. Challenge comes in complex aggregation like finding the difference between first and last row while grouping by. This article explains step by step process to calculate difference between similar groups and then producing the results.

John has customer transaction data available with him. He has information like, customer ID , Name and transaction dates. John wants to figure out number of days gap between first transaction and last transaction of a customer. For example for Jeannie he wants to know the gap between her first and last purchase.

Pandas groupby difference between first and last

Below are the key steps to follow:

  • Step 1: Import necessary modules in PythonΒΆ
import pandas as pd
  • Step 2: Sort data by customer ID and transaction date. This will help in keeping the records for each customer ID by transaction date level. First observation for each customer will be the first transaction date, and last observation will be last transaction date.
Transaction_data2=Transaction_data.sort_values(['Customer ID','Transaction Date'])\
                                     .reset_index(drop=True)
  • Step 3: Identify the first and last value per groupby of customer ID. Here aggregation is done to find the first and last value of each group. So the output will have two columns, one with first transaction date and other with last transaction date.
Grouped_data=Transaction_data2.groupby(by='Customer ID')['Transaction Date']\
                                 .agg(['first','last'])

Convert character date to datetime format:

 Grouped_data[["first", "last"]] = Grouped_data[["first", "last"]].apply(pd.to_datetime)
Pandas groupby difference between first and last
  • Step 4: Calculate days gap by subtracting first date from last date
Grouped_data['Days_gap']=Grouped_data['last']-Grouped_data['first']
  • Step 5: Check output quality
Grouped_data.head()
Pandas groupby difference between first and last

Thus, John is able to create summaries from the Dataframe as per his requirement in Python.

To get top certifications in Python and build your resume visit here. Also, you can read books listed here to build strong knowledge around Python.

Visit us below for video tutorial:

 Looking to practice more with this example? Drop us a note, we will email you the Code file: 

    πŸ“¬ Stay Ahead in Data Science & AI – Subscribe to Newsletter!

    • 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
    • πŸ“Š Data Science for All: Simplified articles on key concepts, accessible to all levels.
    • πŸ€– Generative AI for All: Easy explanations on Generative AI trends transforming industries.

    πŸ’‘ Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.

    πŸ‘‰ Subscribe here:

    Related Posts