10 Steps to Data Wrangling for Data Analysis using Pandas
Step 1: Import Pandas
import pandas as pdStep 2: Have a DataFrame created using pandas
df = pd.read_csv('sample_data.csv')Step 3: Count null values
df.isnull().sum() # gives you count of null values in each column of the dataframeStep 4: Plot on heatmap
import seaborn as snssns.heatmap(df.isnull(), yticklabels==False, cmap='viridis')
Step 5: Drop Column which has way too many null values
df.drop('Column name', axis=1, inplace=True)
axis=1 is required to delete column and not row.
inplace=True is required to update the dataframe.
Step 6: Drop all rows having null values
df.dropna(inplace=True)
Step 7: Recheck dataframe
df.isnull().sum() # should return 0 for all columns
Step8: Change categorical-string data into columns with binary values
For example sex/gender of a person can be changed to 0s and 1s and will still make sense.Using pandas' dummies function this can be achieved.
sex=pd.dummies(df['sex'], drop_first=True)
Here we are using drop_first=True because the function returns two columns 'male', and 'female'. One is opposite of other, so we don't need both.
Do this for all categorical columns.
Step9: Club all dataframes
df = pd.concat(pd, sex)
Step10: Remove all non-numeric and non-binary columns
df.drop(['sex'], axis=1, inplace=True)
Comments
Post a Comment