Skip to main content

10 Steps to Data Wrangling for Data Analysis using Pandas

10 Steps to Data Wrangling for Data Analysis using Pandas

Step 1: Import Pandas

import pandas as pd

Step 2: Have a DataFrame created using pandas

df = pd.read_csv('sample_data.csv')

Step 3: Count null values

df.isnull().sum() # gives you count of null values in each column of the dataframe

Step 4: Plot on heatmap

import seaborn as sns
sns.heatmap(df.isnull(), yticklabels==False, cmap='viridis')

Step 5: Drop Column which has way too many null values

df.drop('Column name', axis=1, inplace=True)
axis=1 is required to delete column and not row.
inplace=True is required to update the dataframe.

Step 6: Drop all rows having null values

df.dropna(inplace=True)

Step 7: Recheck dataframe

df.isnull().sum() # should return 0 for all columns

Step8: Change categorical-string data into columns with binary values

For example sex/gender of a person can be changed to 0s and 1s and will still make sense.
Using pandas' dummies function this can be achieved.
sex=pd.dummies(df['sex'], drop_first=True)
Here we are using drop_first=True because the function returns two columns 'male', and 'female'. One is opposite of other, so we don't need both.
Do this for all categorical columns.

Step9: Club all dataframes

df = pd.concat(pd, sex)

Step10: Remove all non-numeric and non-binary columns

df.drop(['sex'], axis=1, inplace=True)

Comments

Popular posts from this blog

Python - List - Append, Count, Extend, Index, Insert, Pop, Remove, Reverse, Sort

🐍 Advance List List is widely used and it's functionalities are heavily useful. Append Adds one element at the end of the list. Syntax list1.append(value) Input l1 = [1, 2, 3] l1.append(4) l1 Output [1, 2, 3, 4] append can be used to add any datatype in a list. It can even add list inside list. Caution: Append does not return anything. It just appends the list. Count .count(value) counts the number of occurrences of an element in the list. Syntax list1.count(value) Input l1 = [1, 2, 3, 4, 3] l1.count(3) Output 2 It returns 0 if the value is not found in the list. Extend .count(value) counts the number of occurrences of an element in the list. Syntax list1.extend(list) Input l1 = [1, 2, 3] l1.extend([4, 5]) Output [1, 2, 3, 4, 5] If we use append, entire list will be added to the first list like one element. Extend, i nstead of considering a list as one element, it joins the two lists one after other. Append works in the following way. Input l1 = [1, 2, 3] l1.append([4, 5]) Output...

Difference between .exec() and .execPopulate() in Mongoose?

Here I answer what is the difference between .exec() and .execPopulate() in Mongoose? .exec() is used with a query while .execPopulate() is used with a document Syntax for .exec() is as follows: Model.query() . populate ( 'field' ) . exec () // returns promise . then ( function ( document ) { console . log ( document ); }); Syntax for .execPopulate() is as follows: fetchedDocument . populate ( 'field' ) . execPopulate () // returns promise . then ( function ( document ) { console . log ( document ); }); When working with individual document use .execPopulate(), for model query use .exec(). Both returns a promise. One can do without .exec() or .execPopulate() but then has to pass a callback in populate.

683 K Empty Slots

  Approach #1: Insert Into Sorted Structure [Accepted] Intuition Let's add flowers in the order they bloom. When each flower blooms, we check it's neighbors to see if they can satisfy the condition with the current flower. Algorithm We'll maintain  active , a sorted data structure containing every flower that has currently bloomed. When we add a flower to  active , we should check it's lower and higher neighbors. If some neighbor satisfies the condition, we know the condition occurred first on this day. Complexity Analysis Time Complexity (Java):  O(N \log N) O ( N lo g N ) , where  N N  is the length of  flowers . Every insertion and search is  O(\log N) O ( lo g N ) . Time Complexity (Python):  O(N^2) O ( N 2 ) . As above, except  list.insert  is  O(N) O ( N ) . Space Complexity:  O(N) O ( N ) , the size of  active . Approach #2: Min Queue [Accepted] Intuition For each contiguous block ("window") of  k  po...