Python and Pyspark are two key coding languages popular for data processing. When working on a Pandas Dataframe, it becomes sometimes necessary to convert the file into Pyspark Dataframe. After then further processing can be done in Pyspark environment. This article covers step by step process of how to convert Pandas Dataframe to Pyspark Dataframe.
- Step 1: Import necessary modules, like pandas, pyspark. Some of the modules you may or may not need. It depends whether you are running the code in Pyspark local, or in Pyspark server.
import pandas as pd import findspark findspark.init() import pyspark
- Step 2: Create SparkContext and SQLContext. Use the below lines of code to create the same.
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)
- Step 3: Use function createDataFrame to convert pandas Dataframe to spark Dataframe. To illustrate, below is the syntax:
Customer_data_Pandasdf=sql.createDataFrame(Customer_data_Pandasdf)
- Step 4: To check if the file looks ok, check the final data quality. Use show() command to see top rows of Pyspark Dataframe.
Customer_data_Pandasdf.show()
Thus with the help of createDataFrame function a Python Pandas Dataframe can be easily converted into Pyspark Dataframe. As Pyspark helps to run complex queries by leverage the power of hadoop and big data infrastructure. This code can be very helpful. Also many a time it is beneficial to use both Pyspark and Pandas in conjunction. Thus this code can help to achieve the same.
To get top certifications in Pyspark and build your resume visit here. Additionally, you can read books listed here to build strong knowledge around Pyspark.
Visit us below for video tutorial:
Looking to practice more with this example? Drop us a note, we will email you the Code file:
📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!
- 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
- 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
- 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.
💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.
👉 Subscribe here:
One thought on “How to convert Pandas Dataframe to Pyspark Dataframe”
Comments are closed.