Comma Separated Value files (CSV) remains one of the main format to store data. It can store smaller number of rows, as well as large datasets. Most of the analysis starts with reading data into the coding environment. Reading CSV file also requires some set of instructions in every language. In this article we will cover steps to read CSV file in Pyspark.
John has CSV file storing customer information for his company. He wants to analyze the same on Pyspark. To do so he has to read the file first in Pyspark environment.
Below are the key steps to follow to read CSV file in Pyspark:
- Step 1: Import necessary modules and set SparkContext. Also set SQLContext. Refer the code below to do the same.
import findspark findspark.init() import pyspark from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext("local", "App Name") sql = SQLContext(sc)
- Step 2: Use read.csv function defined within SQL Context to read CSV file, as described in below code. Ensure to use header=True option. This will read the first row of the CSV file as header in Pyspark Dataframe.
Customer_Data = sql.read.csv("C:\Website\LearnEasySteps\Python\Customer_Yearly_Spend_Data.csv", header=True)
- Step 3: Test whether the file is read properly. Use show() command to see top rows of Pyspark Dataframe.
Customer_Data.show()
The file has been successfully read in Pyspark. Now John can start working on the analysis part.
To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark.
Looking to practice more with this example? Drop us a note, we will email you the Code file:
📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!
- 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
- 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
- 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.
💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.
👉 Subscribe here: