Skip to main content

10 Steps to Data Wrangling for Data Analysis using Pandas

10 Steps to Data Wrangling for Data Analysis using Pandas

Step 1: Import Pandas

import pandas as pd

Step 2: Have a DataFrame created using pandas

df = pd.read_csv('sample_data.csv')

Step 3: Count null values

df.isnull().sum() # gives you count of null values in each column of the dataframe

Step 4: Plot on heatmap

import seaborn as sns
sns.heatmap(df.isnull(), yticklabels==False, cmap='viridis')

Step 5: Drop Column which has way too many null values

df.drop('Column name', axis=1, inplace=True)
axis=1 is required to delete column and not row.
inplace=True is required to update the dataframe.

Step 6: Drop all rows having null values

df.dropna(inplace=True)

Step 7: Recheck dataframe

df.isnull().sum() # should return 0 for all columns

Step8: Change categorical-string data into columns with binary values

For example sex/gender of a person can be changed to 0s and 1s and will still make sense.
Using pandas' dummies function this can be achieved.
sex=pd.dummies(df['sex'], drop_first=True)
Here we are using drop_first=True because the function returns two columns 'male', and 'female'. One is opposite of other, so we don't need both.
Do this for all categorical columns.

Step9: Club all dataframes

df = pd.concat(pd, sex)

Step10: Remove all non-numeric and non-binary columns

df.drop(['sex'], axis=1, inplace=True)

Comments

Popular posts from this blog

Difference between .exec() and .execPopulate() in Mongoose?

Here I answer what is the difference between .exec() and .execPopulate() in Mongoose? .exec() is used with a query while .execPopulate() is used with a document Syntax for .exec() is as follows: Model.query() . populate ( 'field' ) . exec () // returns promise . then ( function ( document ) { console . log ( document ); }); Syntax for .execPopulate() is as follows: fetchedDocument . populate ( 'field' ) . execPopulate () // returns promise . then ( function ( document ) { console . log ( document ); }); When working with individual document use .execPopulate(), for model query use .exec(). Both returns a promise. One can do without .exec() or .execPopulate() but then has to pass a callback in populate.

Machine Learning — Supervised, Unsupervised, and Reinforcement — Explanation with Example

🤖 Let's take an example of machine learning and see how it can be performed in three different ways — Supervised, Unsupervised, and Reinforcement. We want a program to be able to identify apple in pictures Supervised Learning You will create or use a model that takes a set of pictures of apple and it analyses the commonality in those pictures. Now when you show a new picture to the program, it will identify whether it has an apple or not. It can also provide details on how confident is the program about it. Unsupervised Learning In this method, you create or use a model that goes through some images and tries to group them as per the commonalities it observes such as color, shape, size, partern, etc. And now you can go through the groups and inform the program what to call them. So, you can inform the program about the group that is apple mostly. Next time you show a picture, it can tell if an apple is there or not. Reinforcement Learning Here the model you create or...

269. Alien Dictionary

  Solution This article assumes you already have some confidence with  graph algorithms , such as  breadth-first search  and  depth-first searching . If you're familiar with those, but not with  topological sort  (the topic tag for this problem), don't panic, as you should still be able to make sense of it. It is one of the many more advanced algorithms that keen programmers tend to "invent" themselves before realizing it's already a widely known and used algorithm. There are a couple of approaches to topological sort;  Kahn's Algorithm  and DFS. A few things to keep in mind: The letters  within a word  don't tell us anything about the relative order. For example, the presence of the word  kitten  in the list does  not  tell us that the letter  k  is before the letter  i . The input can contain words followed by their prefix, for example,  abcd  and then  ab . These cases will never ...