Skip to main content

Missing Data and GroupBy Function in Pandas - Python

Here is how to play with missing data and use groupby function in Pandas.
Run the code here: https://repl.it/@VinitKhandelwal/pandas-groupby
import numpy as np
import pandas as pd

print("MISSING DATA")
d = {'A':[1,2,np.nan],'B':[4,np.nan,np.nan],'C':[7,8,9]}
print(d)
df = pd.DataFrame(d)
print(df)
print("DROP ROWS")
print(df.dropna())
print("DROP COLUMNS")
print(df.dropna(axis=1))
print("DROP ROWS WITH THRESHHOLD")
print(df.dropna(thresh=2))
print(df.fillna(value="Fill Value"))
print(df.fillna(value=df.mean()))
print("GROUP BY")
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'], 'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'], 'Sales':[200,120,340,124,243,350]}
print(data)
df = pd.DataFrame(data)
print(df)
by_comp = df.groupby('Company')
print(by_comp.mean())
print(by_comp.std())
print(by_comp.sum())
print(by_comp.sum().loc['GOOG'])
print(df.groupby('Company').sum().loc['GOOG'])
print(df.groupby('Company').count())
print(df.groupby('Company').max())
print(df.groupby('Company').min())
print(df.groupby('Company').describe())
print(df.groupby('Company').describe().transpose())
print(df.groupby('Company').describe().transpose()['FB'])

OUTPUT

MISSING DATA
{'A': [1, 2, nan], 'B': [4, nan, nan], 'C': [7, 8, 9]}
     A    B  C
0  1.0  4.0  7
1  2.0  NaN  8
2  NaN  NaN  9
DROP ROWS
     A    B  C
0  1.0  4.0  7
DROP COLUMNS
   C
0  7
1  8
2  9
DROP ROWS WITH THRESHHOLD
     A    B  C
0  1.0  4.0  7
1  2.0  NaN  8
            A           B  C
0           1           4  7
1           2  Fill Value  8
2  Fill Value  Fill Value  9
     A    B  C
0  1.0  4.0  7
1  2.0  4.0  8
2  1.5  4.0  9
GROUP BY
{'Company': ['GOOG', 'GOOG', 'MSFT', 'MSFT', 'FB', 'FB'], 'Person': ['Sam', 'Charlie', 'Amy', 'Vanessa', 'Carl', 'Sarah'], 'Sales': [200, 120, 340, 124, 243, 350]}
  Company   Person  Sales
0    GOOG      Sam    200
1    GOOG  Charlie    120
2    MSFT      Amy    340
3    MSFT  Vanessa    124
4      FB     Carl    243
5      FB    Sarah    350
         Sales
Company
FB       296.5
GOOG     160.0
MSFT     232.0
              Sales
Company
FB        75.660426
GOOG      56.568542
MSFT     152.735065
         Sales
Company
FB         593
GOOG       320
MSFT       464
Sales    320
Name: GOOG, dtype: int64
Sales    320
Name: GOOG, dtype: int64
         Person  Sales
Company
FB            2      2
GOOG          2      2
MSFT          2      2
          Person  Sales
Company
FB         Sarah    350
GOOG         Sam    200
MSFT     Vanessa    340
          Person  Sales
Company
FB          Carl    243
GOOG     Charlie    120
MSFT         Amy    124
        Sales
        count   mean         std    min     25%    50%     75%    max
Company
FB        2.0  296.5   75.660426  243.0  269.75  296.5  323.25  350.0
GOOG      2.0  160.0   56.568542  120.0  140.00  160.0  180.00  200.0
MSFT      2.0  232.0  152.735065  124.0  178.00  232.0  286.00  340.0
Company              FB        GOOG        MSFT
Sales count    2.000000    2.000000    2.000000
      mean   296.500000  160.000000  232.000000
      std     75.660426   56.568542  152.735065
      min    243.000000  120.000000  124.000000
      25%    269.750000  140.000000  178.000000
      50%    296.500000  160.000000  232.000000
      75%    323.250000  180.000000  286.000000
      max    350.000000  200.000000  340.000000
Sales  count      2.000000
       mean     296.500000
       std       75.660426
       min      243.000000
       25%      269.750000
       50%      296.500000
       75%      323.250000
       max      350.000000
Name: FB, dtype: float64

Comments

Popular posts from this blog

Python - List - Append, Count, Extend, Index, Insert, Pop, Remove, Reverse, Sort

🐍 Advance List List is widely used and it's functionalities are heavily useful. Append Adds one element at the end of the list. Syntax list1.append(value) Input l1 = [1, 2, 3] l1.append(4) l1 Output [1, 2, 3, 4] append can be used to add any datatype in a list. It can even add list inside list. Caution: Append does not return anything. It just appends the list. Count .count(value) counts the number of occurrences of an element in the list. Syntax list1.count(value) Input l1 = [1, 2, 3, 4, 3] l1.count(3) Output 2 It returns 0 if the value is not found in the list. Extend .count(value) counts the number of occurrences of an element in the list. Syntax list1.extend(list) Input l1 = [1, 2, 3] l1.extend([4, 5]) Output [1, 2, 3, 4, 5] If we use append, entire list will be added to the first list like one element. Extend, i nstead of considering a list as one element, it joins the two lists one after other. Append works in the following way. Input l1 = [1, 2, 3] l1.append([4, 5]) Output...

Difference between .exec() and .execPopulate() in Mongoose?

Here I answer what is the difference between .exec() and .execPopulate() in Mongoose? .exec() is used with a query while .execPopulate() is used with a document Syntax for .exec() is as follows: Model.query() . populate ( 'field' ) . exec () // returns promise . then ( function ( document ) { console . log ( document ); }); Syntax for .execPopulate() is as follows: fetchedDocument . populate ( 'field' ) . execPopulate () // returns promise . then ( function ( document ) { console . log ( document ); }); When working with individual document use .execPopulate(), for model query use .exec(). Both returns a promise. One can do without .exec() or .execPopulate() but then has to pass a callback in populate.

683 K Empty Slots

  Approach #1: Insert Into Sorted Structure [Accepted] Intuition Let's add flowers in the order they bloom. When each flower blooms, we check it's neighbors to see if they can satisfy the condition with the current flower. Algorithm We'll maintain  active , a sorted data structure containing every flower that has currently bloomed. When we add a flower to  active , we should check it's lower and higher neighbors. If some neighbor satisfies the condition, we know the condition occurred first on this day. Complexity Analysis Time Complexity (Java):  O(N \log N) O ( N lo g N ) , where  N N  is the length of  flowers . Every insertion and search is  O(\log N) O ( lo g N ) . Time Complexity (Python):  O(N^2) O ( N 2 ) . As above, except  list.insert  is  O(N) O ( N ) . Space Complexity:  O(N) O ( N ) , the size of  active . Approach #2: Min Queue [Accepted] Intuition For each contiguous block ("window") of  k  po...