We often need to plot histograms to visualize distributions of certain features or variables. How to quickly obtain a useful plot and get the work done? If what we care is the frequency of each values, seaborn provides a convenient way, count_plot() function, to get the plot without count the data by ourself and then do the bar chars.
Check the following example:
get the data and do a count plot %matplotlib inline import seaborn as snstitanic = sns.load_dataset("titanic" ) titanic['class' ] = titanic['class' ].astype('str' ) display(titanic)
survived
pclass
sex
age
sibsp
parch
fare
embarked
class
who
adult_male
deck
embark_town
alive
alone
0
0
3
male
22.0
1
0
7.2500
S
Third
man
True
NaN
Southampton
no
False
1
1
1
female
38.0
1
0
71.2833
C
First
woman
False
C
Cherbourg
yes
False
2
1
3
female
26.0
0
0
7.9250
S
Third
woman
False
NaN
Southampton
yes
True
3
1
1
female
35.0
1
0
53.1000
S
First
woman
False
C
Southampton
yes
False
4
0
3
male
35.0
0
0
8.0500
S
Third
man
True
NaN
Southampton
no
True
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
886
0
2
male
27.0
0
0
13.0000
S
Second
man
True
NaN
Southampton
no
True
887
1
1
female
19.0
0
0
30.0000
S
First
woman
False
B
Southampton
yes
True
888
0
3
female
NaN
1
2
23.4500
S
Third
woman
False
NaN
Southampton
no
False
889
1
1
male
26.0
0
0
30.0000
C
First
man
True
C
Cherbourg
yes
True
890
0
3
male
32.0
0
0
7.7500
Q
Third
man
True
NaN
Queenstown
no
True
891 rows × 15 columns
sns.set_theme(style="darkgrid" ) ax = sns.countplot(x="embark_town" , data=titanic)
what if we have too many values for the feature, and we can’t plot all of their distributions in the histogram? sub_index = titanic['class' ].value_counts().index[:2 ] sub_data = titanic[titanic['class' ].isin(sub_index)] sub_data = sub_data.reset_index(drop=True ) ax = sns.countplot(x="class" , data=sub_data)
ax = sns.countplot(x="class" , data=sub_data,order=sub_index[::-1 ])
now how to show the value counts for two categorical variables? ax = sns.countplot(x="class" , hue="who" , data=titanic)