Aggregation of data is necessary to summarize and analyze the results. Groupby function in Pandas helps in grouping the data and further aggregation. Summarization can be done for counting rows, getting sum, maximum value, minimum value etc. Challenge comes in

Read More

Sorting a dataframe is very often done during data processing steps. To know the best performing observation we can sort the dataset by specific column. Similarly, to know the worst performing observation, sorting can help. Sorting can help to have

Read More

Pyspark enables processing of big data sets, at the same time enable processing of complex queries as well. Machine learning algorithm, statistical algorithms are easy to deploy with the help of Pyspark. Before running an algorithm, cleaning of data is

Read More

Data processing in Pandas can require various stages in between. There can be a need to drop certain rows in the datafile as well. Dropping rows in Pandas is comparatively easier when done at index level. This article explains Pandas

Read More