Bigger datafiles are generally stored in text format, csv format. But Excel file i.e. XLSX file also remains an important format of storage, as it can save formats and other features along with the data as well. Importing an Excel file in Pyspark can be a tricky challenge some times. We are sharing step by step guide on how to read Excel file in Pyspark. The file size of XLSX file is not very huge, as the file generally contains limited amount of data. It is helpful in storing report files and not as a data storage format.
Emma has Employee dataset avaialble with her in XLSX format. There are two sheets available, and she has to import “Data_Sheet2” from the file.
Below are the key steps for Emma to follow to import the Excel file in Pyspark:
- Step 1: Import all the necessary modules like Pandas. Also set up SparkContext and SQLContext as shown below.
import pandas as pd import findspark findspark.init() import pyspark from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext("local", "App Name") sql = SQLContext(sc)
- Step 2: Read the Excel file as Pandas Dataframe, refer the link to do the same. To illustrate, below is the syntax:
df2 = pd.read_excel("C:\Website\LearnEasySteps\Python\Excel_File_Data.xlsx",sheet_name="Data_Sheet2")
- Step 3: Convert Pandas Dataframe to Pyspark Dataframe, refer the link to do the same.
df2=sql.createDataFrame(df2)
- Step 4: Check some rows of the file to ensure if everything looks ok. Use show() command to see top rows of Pyspark Dataframe.
df2.show()
To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark.
Visit us below for video tutorial:
Looking to practice more with this example? Drop us a note, we will email you the Code file:
📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!
- 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
- 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
- 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.
💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.
👉 Subscribe here: