How to Install Pyspark in Windows

Pyspark is becoming popular among Data Scientists. For doing data processing for large datasets, running machine learning algorithms etc. Pyspark has many use cases. Of course, for any Pyspark learning enthusiast having the coding language installed in local laptop becomes important. This article discusses step by step process of how to install Pyspark in Windows laptop. Installing Pyspark is a longer process, we have broken it down into four major collated steps:

  1. Java Installation
  2. Anaconda (Python) Installation
  3. Installation of Pyspark
  4. Updating System Environment Variables

We will discuss these one by one.

Java Installation in Windows

  • Step 1: Firstly download JDK installer from Oracle. Visit this link to install JDK installer.
  • Step 2: Double click on the installed file or right click and run the downloaded file to start installation.
  • Step 3: Then read through the license agreement and click next.
Install Java 1
  • Step 4: If needed change installation location, else click next to proceed.
Install Java 2

Step 5: Thus Java JDK is successfully installed. Click on close.

Install Java 3

Anaconda (Python) Installation in Windows

To install Anaconda (Python) in Windows, step by step process is explained here. Please follow the same to install Anaconda.

Installation of Pyspark in Windows

  • Step 1: To install Pyspark, visit the link. Select the recent version available. 3.2 we recommend to download. To illustrate, below image represent the version.
Install Pyspark 1
  • Step 2: The next step of installation is simple. Just extract the downloaded file, and keep it in a folder. Make sure the file location does not have any spaces. For Example the file location where we are installing Pyspark is “C:\Spark”, no space in the location.
Install Pyspark 2
  • Step 3: Visit this location and download the Windows Utility (Winutil) file as per the pyspark version. As we downloaded Pyspark 3,2, we will install the same winutil file.
Install Pyspark 3

Updating system environment variable

There are 3 environment variables that we will create:

  • Java_home
  • Spark_home
  • Hadoop_home

Below are the steps create the three variables:

  • Step 1: Firstly search System Environment Variable in search bar to open system environment setting editor.
Install Pyspark 4
Install Pyspark 5
  • Step 2: Then click on Environment Variables to open the below dialogue box.
Install Pyspark 6
  • Step 3: Then click on New to open the below dialogue box.
Install Pyspark 7
  • Step 4: Add “JAVA_HOME” as variable name and the JDK file installed during Java download in variable value and click enter.
Install-Pyspark-8
  • Step 5: Create another new variable for Spark_home. Variable value with be the location where Pyspark folder is extracted after downloading pyspark.
Install-Pyspark-9
  • Step 6: Add another new variable Hadoop_Home. The variable value will have location of the folder where Winutil file is present (placed in a folder name bin), which was downloaded in third step of Pyspark installation. To illustrate, below is the detail.
Install-Pyspark-10
Install-Pyspark-11
  • Step 7: Next step is to set the path variable. Click on Path present under System variable and then click edit.
Install-Pyspark-12
  • Step 8: Then click on New in the below dialogue box.
Install-Pyspark-13
  • Step 9: Finally, add the below three new variables:
    • %JAVA_HOME%\bin
    • %SPARK_HOME%\bin
    • %Hadoop_Home%\bin

This completes all the key steps required to install Python on Windows. Hence, Pyspark is available on Windows.

To check whether Pyspark is installed properly open Command Prompt, write Pyspark and hit Enter. You should see the below screen in some time:

Install-Pyspark-14

To summarize, the overall process is a bit lengthy. But following the process step by step will help you get Pyspark in local system.

📬 Stay Ahead in Data Science & AI – Subscribe to Newsletter!

  • 🎯 Interview Series: Curated questions and answers for freshers and experienced candidates.
  • 📊 Data Science for All: Simplified articles on key concepts, accessible to all levels.
  • 🤖 Generative AI for All: Easy explanations on Generative AI trends transforming industries.

💡 Why Subscribe? Gain expert insights, stay ahead of trends, and prepare with confidence for your next interview.

👉 Subscribe here:

Related Posts