How to read Excel file in Pyspark (XLSX file)

Bigger datafiles are generally stored in text format, csv format. But Excel file i.e. XLSX file also remains an important format of storage, as it can save formats and other features along with the data as well. Importing an Excel file in Pyspark can be a tricky challenge some times. We are sharing step by step guide on how to read Excel file in Pyspark. The file size of XLSX file is not very huge, as the file generally contains limited amount of data. It is helpful in storing report files and not as a data storage format.

Emma has Employee dataset avaialble with her in XLSX format. There are two sheets available, and she has to import “Data_Sheet2” from the file.

Read Excel in Pyspark

Below are the key steps for Emma to follow to import the Excel file in Pyspark:

  • Step 1: Import all the necessary modules like Pandas. Also set up SparkContext and SQLContext as shown below.
import pandas as pd
 import findspark
 findspark.init()
 import pyspark
from pyspark import SparkContext
 from pyspark.sql import SQLContext 
 sc = SparkContext("local", "App Name")
 sql = SQLContext(sc)
df2 = pd.read_excel("C:\Website\LearnEasySteps\Python\Excel_File_Data.xlsx",sheet_name="Data_Sheet2")
df2=sql.createDataFrame(df2) 
  • Step 4: Check some rows of the file to ensure if everything looks ok. Use show() command to see top rows of Pyspark Dataframe.
df2.show()
Read Excel file in Pyspark
To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark.

Visit us below for video tutorial:

 Looking to practice more with this example? Drop us a note, we will email you the Code file: 

    📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!

    • 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
    • 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
    • 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.

    💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.

    👉 Subscribe here:

    Related Posts