Sometimes Dataframe does not contains header in the column names. Pyspark has union function that helps in stacking one Dataframe below the other. In case Dataframe does not contain header, then it is important to do basic checks before importing. All columns should be present in same order. Appending helps in creation of single file from the base multiple file. The variables present in both files should ideally have same formats. This helps in error free appending. This article discusses easy steps to append Pyspark Dataframe without Column Names.
Amy has June and July Transaction Dataframes without any header information. There are six columns in each of the two files that Amy has. All columns are similar in nature in terms of format and values. She is looking forward to create single Dataframe out of the two available ones.
Below are the key steps to follow.
- Step 1: Import all the necessary modules, also set SPARK/SQLContext.
import pandas as pd import findspark findspark.init() import pysparkfrom pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext("local", "App Name") sql = SQLContext(sc)
- Step 2: Use union function to append the two Dataframes. The file written in pranthesis will be added in the bottom of the table while former on the top.
Trx_Data_2Months_Pyspark=Trx_Data_Jun20_Pyspark.union(Trx_Data_Jul20_Pyspark)
- Step 3: Check if the final data has 200 rows available, as the base data has 100 rows each. Use show() command to see top rows of Pyspark Dataframe.
Trx_Data_2Months_Pyspark.show(10) Print Shape of the file, i.e. number of rows and number of columns print((Trx_Data_2Months_Pyspark.count(), len(Trx_Data_2Months_Pyspark.columns)))
Thus Amy is able to append the two dataframes together to create single file. The single dataframe has 200 rows, as the other two files contains 100 rows each.
To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark.
Visit us below for video tutorial:
Looking to practice more with this example? Drop us a note, we will email you the Code file:
📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!
- 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
- 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
- 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.
💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.
👉 Subscribe here: