Skip to main content

10 Steps to Data Wrangling for Data Analysis using Pandas

10 Steps to Data Wrangling for Data Analysis using Pandas

Step 1: Import Pandas

import pandas as pd

Step 2: Have a DataFrame created using pandas

df = pd.read_csv('sample_data.csv')

Step 3: Count null values

df.isnull().sum() # gives you count of null values in each column of the dataframe

Step 4: Plot on heatmap

import seaborn as sns
sns.heatmap(df.isnull(), yticklabels==False, cmap='viridis')

Step 5: Drop Column which has way too many null values

df.drop('Column name', axis=1, inplace=True)
axis=1 is required to delete column and not row.
inplace=True is required to update the dataframe.

Step 6: Drop all rows having null values

df.dropna(inplace=True)

Step 7: Recheck dataframe

df.isnull().sum() # should return 0 for all columns

Step8: Change categorical-string data into columns with binary values

For example sex/gender of a person can be changed to 0s and 1s and will still make sense.
Using pandas' dummies function this can be achieved.
sex=pd.dummies(df['sex'], drop_first=True)
Here we are using drop_first=True because the function returns two columns 'male', and 'female'. One is opposite of other, so we don't need both.
Do this for all categorical columns.

Step9: Club all dataframes

df = pd.concat(pd, sex)

Step10: Remove all non-numeric and non-binary columns

df.drop(['sex'], axis=1, inplace=True)

Comments

Popular posts from this blog

Python - List - Append, Count, Extend, Index, Insert, Pop, Remove, Reverse, Sort

🐍 Advance List List is widely used and it's functionalities are heavily useful. Append Adds one element at the end of the list. Syntax list1.append(value) Input l1 = [1, 2, 3] l1.append(4) l1 Output [1, 2, 3, 4] append can be used to add any datatype in a list. It can even add list inside list. Caution: Append does not return anything. It just appends the list. Count .count(value) counts the number of occurrences of an element in the list. Syntax list1.count(value) Input l1 = [1, 2, 3, 4, 3] l1.count(3) Output 2 It returns 0 if the value is not found in the list. Extend .count(value) counts the number of occurrences of an element in the list. Syntax list1.extend(list) Input l1 = [1, 2, 3] l1.extend([4, 5]) Output [1, 2, 3, 4, 5] If we use append, entire list will be added to the first list like one element. Extend, i nstead of considering a list as one element, it joins the two lists one after other. Append works in the following way. Input l1 = [1, 2, 3] l1.append([4, 5]) Output...

Difference between .exec() and .execPopulate() in Mongoose?

Here I answer what is the difference between .exec() and .execPopulate() in Mongoose? .exec() is used with a query while .execPopulate() is used with a document Syntax for .exec() is as follows: Model.query() . populate ( 'field' ) . exec () // returns promise . then ( function ( document ) { console . log ( document ); }); Syntax for .execPopulate() is as follows: fetchedDocument . populate ( 'field' ) . execPopulate () // returns promise . then ( function ( document ) { console . log ( document ); }); When working with individual document use .execPopulate(), for model query use .exec(). Both returns a promise. One can do without .exec() or .execPopulate() but then has to pass a callback in populate.

Python Class to Calculate Distance and Slope of a Line with Coordinates as Input

🐍  Can be run on Jupyter Notebook #CLASS DESIGNED TO CREATE OBJECTS THAT TAKES COORDINATES AND CALCULATES DISTANCE AND SLOPE class Line:     def __init__(self,coor1,coor2):         self.coor1=coor1         self.coor2=coor2 #FUNCTION CALCULATES DISTANCE     def distance(self):         return ((self.coor2[0]-self.coor1[0])**2+(self.coor2[1]-self.coor1[1])**2)**0.5 #FUNCTION CALCULATES SLOPE         def slope(self):         return (self.coor2[1]-self.coor1[1])/(self.coor2[0]-self.coor1[0]) #DEFINING COORDINATES coordinate1 = (3,2) coordinate2 = (8,10) #CREATING OBJECT OF LINE CLASS li = Line(coordinate1,coordinate2) #CALLING DISTANCE FUNCTION li.distance() #CALLING SLOPE FUNCTION li.slope()