Pandas and Numpy are widely used formats for data mining and data sciences, but sometimes people get confused by None and NaN, which are very similar but slightly different data types. Here we figure it out once for all with some examples.
main difference
The distinction between None and NaN in Pandas can be summarized as:
- None represents a missing entry, but its type is not numeric. So any column (ad Pandas Series) that contains a None value is definately not a numeric type, such as int or float.
- NaN which stands for not-a-number, is on the other hand a numeric type. This means that NaN can be found in a numeric column of int or float type.
tests in action
in the following test, a None value is automatically transferred as a NaN value, because Pandas automatically converted None to NaN
given that the other value in the series is a numeric. The will make the series a numeric type and will be much easier for many
following operations.
import pandas as pd |
0 1.0
1 NaN
dtype: float64
in the following test, the other value in the series is a string, so the None value stay as None value. This make the whole
series an object type.
import pandas as pd |
0 1
1 None
dtype: object
None type can lead to more arithmetic errors
Why did we claim with NaN type, it will be much easier for many other operations useful to data science?
It just gives less error for many arithmetic operations. For example, the following operation will give an error:
None + 1 |
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-3fd8740bf8ab> in <module>
----> 1 None + 1
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
while the following operations with NaN type is fine, we just get another NaN type, but no error.
import numpy as np |
nan
How to check the None and NaN type
There are several different ways to check if a data type is None or NaN values;
First using numpy, the function np.isnan() can check if a value is a NaN value, but it won’t work with None values.
np.isnan(np.nan) |
True
in Pandas, there are functions that are isnull() and isna(), which are literally does the same things. isnull() is just an alias of the isna() method; Basically isnull() detects missing values, so both nan or None will be True.
pd.isnull(np.nan) |
True
pd.isnull(None) |
True