obtain null or missing values of a dataframe

Suppose the dataframe has the following formats, with 10 rows and 5 clomns:


          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810

the isnull() function which would return a dataframe like this:

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

following command will select rows that has any null values

df[df.isnull().any(axis=1)]

following command will select columns that has any null values

df[df.columns[df.isna().any()]]

follwoing command will select rows that have null values for a specific column, e.g., column=3

df[df[3].isnull()]

Drop null values

df = pd.DataFrame({“name”: [‘Alfred’, ‘Batman’, ‘Catwoman’],
… “toy”: [np.nan, ‘Batmobile’, ‘Bullwhip’],
… “born”: [pd.NaT, pd.Timestamp(“1940-04-25”),
… pd.NaT]})

>>> df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Drop the rows where at least one element is missing.

>>> df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all')
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2)
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy'])
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep the DataFrame with valid entries in the same variable.

>>> df.dropna(inplace=True)
>>> df
     name        toy       born
1  Batman  Batmobile 1940-04-25

Fill missing values

Filling missing values using fillna(), replace() and interpolate()

In order to fill null values in a datasets, we use fillna(), replace() and interpolate() function these function replace NaN values with some value of their own. All these function help in filling a null values in datasets of a DataFrame. Interpolate() function is basically used to fill NA values in the dataframe but it uses various interpolation technique to fill the missing values rather than hard-coding the value.

Code #1: Filling null values with a single value


# importing pandas as pd

import pandas as pd

  

# importing numpy as np

import numpy as np

  

# dictionary of lists

dict = {'First Score':[100, 90, np.nan, 95],

        'Second Score': [30, 45, 56, np.nan],

        'Third Score':[np.nan, 40, 80, 98]}

  

# creating a dataframe from dictionary

df = pd.DataFrame(dict)

  

# filling missing value using fillna()  

df.fillna(0)

Code #2: Filling null values with the previous ones


# importing pandas as pd

import pandas as pd

  

# importing numpy as np

import numpy as np

  

# dictionary of lists

dict = {'First Score':[100, 90, np.nan, 95],

        'Second Score': [30, 45, 56, np.nan],

        'Third Score':[np.nan, 40, 80, 98]}

  

# creating a dataframe from dictionary

df = pd.DataFrame(dict)

  

# filling a missing value with

# previous ones  

df.fillna(method ='pad')

Code #3: Filling null value with the next ones


# importing pandas as pd

import pandas as pd

  

# importing numpy as np

import numpy as np

  

# dictionary of lists

dict = {'First Score':[100, 90, np.nan, 95],

        'Second Score': [30, 45, 56, np.nan],

        'Third Score':[np.nan, 40, 80, 98]}

  

# creating a dataframe from dictionary

df = pd.DataFrame(dict)

  

# filling  null value using fillna() function  

df.fillna(method ='bfill')