Learn EASY STEPS

How to delete column in Pandas

As the world of data is growing, corporation are maintaining detailed datasets. Number of columns are increasing day by day. It becomes sometime very difficult to work with data having multiple columns in it. So there exist a need of

Pyspark

How to read Excel file in Pyspark (XLSX file)

Bigger datafiles are generally stored in text format, csv format. But Excel file i.e. XLSX file also remains an important format of storage, as it can save formats and other features along with the data as well. Importing an Excel

Pyspark

Steps to read CSV file in Pyspark

Comma Separated Value files (CSV) remains one of the main format to store data. It can store smaller number of rows, as well as large datasets. Most of the analysis starts with reading data into the coding environment. Reading CSV

Pyspark

How to convert Pandas Dataframe to Pyspark Dataframe

Python and Pyspark are two key coding languages popular for data processing. When working on a Pandas Dataframe, it becomes sometimes necessary to convert the file into Pyspark Dataframe. After then further processing can be done in Pyspark environment. This

Python

How to drop duplicates in Pandas by specific column

Pandas helps in processing data to high extent. Many a times a user may need to drop rows based on specific column value. For example for a specific customer name there can be many rows, and the need would be

Python

Steps to drop duplicates Pandas Dataframe

Pandas offers some great functions to process a dataset. In a data file there can be duplicates available at row level. Droping duplicates becomes very important, as the rows will create noise in any analysis. Some time the duplicates can

Python

How to open ipynb file in Jupyter Notebook

Python scripts saved in Jupyter notebooks are of ipynb formats. This is an interactive file, with charts data images all captured along with the codes. Due to its interactive nature, ipynb files is gathering popularity. Now python codes are mostly

Python

How to open Jupyter Notebook in Chrome

Jupyter Notebook is one of the most popular IDE. It has some great features. Python learners generally prefer Jupyter IDE over others. As Jupyter Notebook can directly open in desktop browers, it becomes very easy to operate through. Chrome which

Pyspark

How to Install Pyspark in Windows

Pyspark is becoming popular among Data Scientists. For doing data processing for large datasets, running machine learning algorithms etc. Pyspark has many use cases. Of course, for any Pyspark learning enthusiast having the coding language installed in local laptop becomes

Python

How to Group By data in Python

For doing data analysis, group by remains one of the key process to follow. Data preparation and exploration stage requires multiple level of aggregation. This article covers how to Group By data in Python. Aggregation can be a done at