How to append 2 Dataframes in Pyspark

Pyspark has union function that helps in stacking one Dataframe below the other. Appending helps in creation of single file from the base multiple file. The variables present in both files should ideally be same and have same formats. This helps in error free appending. This article discusses how to append 2 Dataframes in Pyspark.

Amy has June and July Transaction Dataframe available. One transaction data file is of June month, while the other data file is of July month. Both the files has same number of columns present. There are six columns present in each of the file. She is looking forward to create single Dataframe out of the two available ones.

Steps to append 2 Dataframes in Pyspark
Steps to append 2 Dataframes in Pyspark

Below are the key steps to follow.

  • Step 1: Import all the necessary modules, also set SPARK/SQLContext.
import pandas as pd
 import findspark
 findspark.init()
 import pysparkfrom pyspark import SparkContext
 from pyspark.sql import SQLContext 
 sc = SparkContext("local", "App Name")
 sql = SQLContext(sc)
  • Step 2: Use union function to append the two Dataframes. The file written in pranthesis will be added in the bottom of the table while former on the top.
Trx_Data_2Months_Pyspark=Trx_Data_Jun20_Pyspark.union(Trx_Data_Jul20_Pyspark)
  • Step 3: Check if the final data has 200 rows available, as the base data has 100 rows each. Use show() command to show top rows in Pyspark Dataframe.
Trx_Data_2Months_Pyspark.show(10)
Print Shape of the file, i.e. number of rows and number of columns
 print((Trx_Data_2Months_Pyspark.count(), len(Trx_Data_2Months_Pyspark.columns)))
Steps to append 2 Dataframes in Pyspark

Hence, Amy is able to append both the transaction files together.

To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark. 

Visit us below for video tutorial:

 Looking to practice more with this example? Drop us a note, we will email you the Code file: 

    📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!

    • 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
    • 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
    • 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.

    💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.

    👉 Subscribe here:

    Related Posts