Steps to read CSV file without header in Pyspark

Pyspark can read CSV file directly to create Pyspark Dataframe. In situation where the CSV file does not has header available in the data, it becomes difficult to read it the right way. It may happen that the first row of the data can be read as dataframe header. To ensure proper importing below article helps to understand the steps to read CSV file without header in Pyspark

Emma has transaction data for her customers in CSV format. The file does not contains any header. As you can see below the file that Emma has. It has 6 columns, but the header is missing. Emma is looking to import the file in Pyspark.

Steps to read CSV file without header in Pyspark

Below are the key important step to follow.

  • Step 1: Import all the necessary modules and set SPARK/SQLContext.
import findspark
 findspark.init()
 import pyspark
from pyspark import SparkContext
 from pyspark.sql import SQLContext 
 sc = SparkContext("local", "App Name")
 sql = SQLContext(sc)
  • Step 2: Use read.csv function to import CSV file. Ensure to keep header option set as “False”. This will tell the function that header is not available in CSV file.
Trans_Data = sql.read.csv("C:\Website\LearnEasySteps\Python\Customer_Yearly_Spend_Data.csv",
                              header=False)
  • Step 3: Check the data quality by running the below command. Use show() command to show top rows in Pyspark Dataframe.
Trans_Data.show()
Steps to read CSV file without header in Pyspark

Hence Emma is able to use the key steps to read CSV file without header in Pyspark.

To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark. 

Visit us below for video tutorial:

 Looking to practice more with this example? Drop us a note, we will email you the Code file: 

    📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!

    • 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
    • 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
    • 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.

    💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.

    👉 Subscribe here:

    Related Posts