Seaborn Module - Python



 

What is seaborn?


"seaborn" is a very strong module in python used to visualize data. This module is basically built over the matplotlib module which provides it with powerful features. It can be used to create hybrid plots between the common useful plots like histograms, scatter plots, and linear graphs to give us some insights into the dataset.




Installing and Importing


Run the following command on the CMD to install the module:

pip install seaborn

Import the required modules:

import seaborn as sns
import matplotlib.pyplot as plt




Before starting


Preparing data


Check some of the pre-defined datasets from seaborn

sns.set()
sns.get_dataset_names()
# ['anagrams','anscombe','attention','brain_networks','car_crashes',
# 'diamonds','dots','exercise','flights','fmri','gammas','geyser','iris',
# 'mpg','penguins','planets','taxis','tips','titanic']

Select one of the data sets for testing

# get the example “iris” from the saved ones 
iris = sns.load_dataset("iris")

Change the theme

sns.set_theme(style="dark")


Investigate the dataset


Get info about the data:

As we can see, it is represented as a (pandas dataframe). So we can apply all pandas functions to it

iris.info()  # return some information about the selected dataset 

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 150 entries, 0 to 149

Data columns (total 5 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 sepal_length 150 non-null float64

1 sepal_width 150 non-null float64

2 petal_length 150 non-null float64

3 petal_width 150 non-null float64

4 species 150 non-null object

dtypes: float64(4), object(1)

memory usage: 6.0+ KB


Take a look at the data

iris.head()  # return the first 5 rows 

sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa



Global plots attributes

We have some attributes that control the view and look of the plot, so we can use them in any kind of plot.

sns.pairplot(data=iris, attr1=val1, attr2=val2, ...)


Marker

It changes the shape of the dots (only for plots with dots)

markers=["o", "s", "D"]

Size

It changes the size (scale) of the plot

size=5

Aspect

It changes the aspect ratio of the plot

aspect=2  # means 2:1
aspect=0.5  # means 1:2

Color

It changes the color of the plot

color="Red"

Palette

It changes the color palette of the plot

palette="Oranges"

we can choose one from the following for the attribute (palette):

‘Accent’, ‘Accent_r’, ‘Blues’, ‘Blues_r’, ‘BrBG’, ‘BrBG_r’, ‘BuGn’, ‘BuGn_r’, ‘BuPu’, ‘BuPu_r’, 
 ‘CMRmap’, ‘CMRmap_r’, ‘Dark2’, ‘Dark2_r’, ‘GnBu’, ‘GnBu_r’, ‘Greens’, ‘Greens_r’, ‘Greys’, ‘Greys_r’, ‘OrRd’, 

 ‘OrRd_r’, ‘Oranges’, ‘Oranges_r’, ‘PRGn’, ‘PRGn_r’, ‘Paired’, ‘Paired_r’, ‘Pastel1’, 
 ‘Pastel1_r’, ‘Pastel2’, ‘Pastel2_r’, ‘PiYG’, ‘PiYG_r’, ‘PuBu’, ‘PuBuGn’, ‘PuBuGn_r’, 
 ‘PuBu_r’, ‘PuOr’, ‘PuOr_r’, ‘PuRd’, ‘PuRd_r’, ‘Purples’, ‘Purples_r’, ‘RdBu’, ‘RdBu_r’, 
 ‘RdGy’, ‘RdGy_r’, ‘RdPu’, ‘RdPu_r’, ‘RdYlBu’, ‘RdYlBu_r’, ‘RdYlGn’, ‘RdYlGn_r’, ‘Reds’, 
 ‘Reds_r’, ‘Set1’, ‘Set1_r’, ‘Set2’, ‘Set2_r’, ‘Set3’, ‘Set3_r’, ‘Spectral’, ‘Spectral_r’, 
 ‘Wistia’, ‘Wistia_r’, ‘YlGn’, ‘YlGnBu’, ‘YlGnBu_r’, ‘YlGn_r’, ‘YlOrBr’, ‘YlOrBr_r’, ‘YlOrRd’, 
 ‘YlOrRd_r’, ‘afmhot’, ‘afmhot_r’, ‘autumn’, ‘autumn_r’, ‘binary’, ‘binary_r’, ‘bone’, 
 ‘bone_r’, ‘brg’, ‘brg_r’, ‘bwr’, ‘bwr_r’, ‘cividis’, ‘cividis_r’, ‘cool’, ‘cool_r’, ‘coolwarm’, ‘coolwarm_r’, ‘copper’, ‘copper_r’,

 ‘cubehelix’, ‘cubehelix_r’, ‘flag’, ‘flag_r’, ‘gist_earth’, ‘gist_earth_r’, ‘gist_gray’, ‘gist_gray_r’, ‘gist_heat’, ‘gist_heat_r’, ‘gist_ncar’, ‘gist_ncar_r’,
 ‘gist_rainbow’, ‘gist_rainbow_r’, ‘gist_stern’, ‘gist_stern_r’, ‘gist_yarg’, 
 ‘gist_yarg_r’, ‘gnuplot’, ‘gnuplot2’, ‘gnuplot2_r’, ‘gnuplot_r’, ‘gray’, ‘gray_r’,
 ‘hot’, ‘hot_r’, ‘hsv’, ‘hsv_r’, ‘icefire’, ‘icefire_r’, ‘inferno’, 
 ‘inferno_r’, ‘magma’, ‘magma_r’, ‘mako’, ‘mako_r’,

 ‘nipy_spectral’, ‘nipy_spectral_r’, ‘ocean’, ‘ocean_r’, ‘pink’, ‘pink_r’,
 ‘plasma’, ‘plasma_r’, ‘prism’, ‘prism_r’, ‘rainbow’, ‘rainbow_r’,
 ‘rocket’, ‘rocket_r’, ‘seismic’, ‘seismic_r’, ‘spring’, ‘spring_r’,
 ‘summer’, ‘summer_r’, ‘tab10’, ‘tab10_r’, ‘tab20’, ‘tab20_r’, ‘tab20b’,
 ‘tab20b_r’, ‘tab20c’, ‘tab20c_r’, ‘terrain’, ‘terrain_r’, ‘twilight’,
 ‘twilight_r’, ‘twilight_shifted’, ‘twilight_shifted_r’, ‘viridis’, ‘viridis_r’, ‘vlag’, ‘vlag_r’, ‘winter’, ‘winter




The (pairplot) method

Create a pair plot that shows the relations between data.

NOTE: none of the seaborn plots will be displayed unless you add the “show” method after them


Standard pairplot

sns.pairplot(data=iris)  # create the plot
plt.show()  # display the plot 


Categorized pairplot

We want to classify the data into categories based on the column “species

sns.pairplot(data=iris, hue='species')  
plt.show() 


KDE pairplot

Applys do the off-diagonal graphs (not the diagonal ones) as KDEs

sns.pairplot(data=iris, kind="kde") 


Histogram pairplot

Create a histogram pairplot for diagonals.

sns.pairplot(data=iris, kind="hist") 


Style mapped pairplot

Applies a style mapping on the off-diagonal axes (change the shape of the dots). Currently, it will be redundant with the hue variable. Markers define the shapes of dots for each plot (dots, squares, diamonds).

sns.pairplot(data=iris, hue="species", markers=["o", "s", "D"])




The (jointplot) method

Create a jointplot that shows the relationship between two columns. It is a combination of scatter plots and histogram plots.


Standard joinplot

We can join plots together using specific columns. The hue attribute specifies the column we want to study with respect to the X and Y.

sns.jointplot(data= iris, y='sepal_width' , x='sepal_length', hue='species') 


Hexagon joinplot

We can represent the depth of values using hexagons

sns.jointplot(data= iris, y='sepal_width' , x='sepal_length',  kind="hex") 







The (histoplot) method

Create a histoplot for categorical data.


Standard histoplot

Create a histogram for a special column

sns.histplot(data=iris, x='sepal_length')


Relational histoplot

We can separate the categories for a special column

sns.histplot(data= iris, x='sepal_length', hue='species') 


Sided histoplot

Use “dodge” to place the bars beside each other.

sns.histplot(data= iris, x='sepal_length', hue='species', multiple='dodge') 


Stacked histoplot

Use "stack" to stack bars on top of each other instead of making them transparent.

sns.histplot(data= iris, x='sepal_length', hue='species', multiple='stack')


Binwidth histoplot

Change the thickness of the bars.

sns.histplot(data= iris, x='sepal_length', hue='species', binwidth=0.2) 


Matplotlib.hist stacked

We can use this method to plot multiple histograms over each other. We usually distribute the data by their categories to plot them indevidually.

labels = iris.species.unique()  # get all categories of the column (species)
# distribute the data over the categories and get only the (sepal_length) values
newList = [iris[iris.species == i].sepal_length for i in labels] 
    
# plot the 3 independent graphs over each other 
plt.hist(newList, bins=30, stacked=True, label=labels)  
plt.legend()
plt.show()


We can turn them into thin bars

plt.hist(newList, bins=30, stacked=True, label=labels, histtype='bar', rwidth=0.8) 


We can show only the steps

plt.hist(newList, bins=30, stacked=True, label=labels, histtype='step') 





The (distplot) method

Create a distplot for numeric data. It is a combination of a line plot and a histogram.

iris = iris.iloc[:, :-1]  # we have to delete any non-numeric data


Standard distplot

sns.distplot(iris)


Bins of distplot

We can change the number of bars (bins) for the graph.

sns.distplot(iris, bins=30)


Hidden histo distplot

We can hide the histogram in the background.

sns.distplot(iris, bins=30, hist=False)


Colored distplot

We can change its color.

sns.distplot(iris, bins=30, color="Red")




The (lmplot) method

Create a lmplot (linear-m plot) for numeric data. It is a combination of a scatter plot and a linear plot.


Standard lmplot

sns.lmplot(data=iris, x="sepal_length", y="petal_length")


Scatter lmplot

Hide the linear regression line and leave the scatter plot.

sns.lmplot(data=iris, x="sepal_length", y="petal_length", fit_reg=False)


Custom diagonal to the lmplot

Add a custom diagonal line to see which data is upper the diagonal and which is lower

sns.lmplot(data=iris, x="petal_width", y="petal_length", fit_reg=False)
plt.plot((0, 8), (0, 8), 'k--')
plt.show()


Specified lmplot

Plot linear graphs for specific columns. The hue attribute specifies the column we want to study with respect to the X and Y.

sns.lmplot(data=iris, x="sepal_length", y="petal_length", hue="species")




The (kdeplot) method

Create a KDE (kernel density estimate) plot for numeric data. It is a plot that shows translates the scatter plot into continuous lines to show the density areas of data.


Standard kdeplot

sns.kdeplot(iris.sepal_width, iris.sepal_length)


Colored kdeplot

sns.kdeplot(iris.sepal_width, iris.sepal_length, cmap="Reds")


Shaded kdeplot

sns.kdeplot(iris.sepal_width, iris.sepal_length, shade=True)


Background shaded kdeplot

sns.kdeplot(iris.sepal_width, iris.sepal_length, shade=True, shade_lowest=True)




The (heatmap) method

Create a heatmap for numeric data.


Standard heatmap

sns.heatmap(iris.corr(), annot=True, cmap='RdBu_r')


Hidden heatmap

We can hide the annotations from the heatmap.

sns.heatmap(iris.corr(), annot=False, cmap='RdBu_r')




The (boxplot) method

Create a boxplot between categorical and numerical data.


Standard boxplot

plot all the columns with the corresponding categories.

sns.boxplot(data=iris) 


Horizontal boxplot

sns.boxplot(data=iris, orient="h") 


Specified boxplot

plot special columns as a box plot

sns.boxplot(data=iris, orient="h", x='sepal_length', y='species')


The boxenplot

It is the same as the boxplot but with wider boxes with no deleted ones after the max and min limits.

sns.boxenplot(data=iris)




The (violinplot) method

Create a violinplot between categorical and numerical data.


Standard violinplot

plot a general violinplot for all columns.

sns.violinplot(data=iris)


Specified violinplot

plot a general violinplot for two columns. The X-axis must have categorical data and the Y-axis must have numerical data points.

sns.violinplot(data=iris, x=iris.species, y=iris.petal_length)




The (swarmplot) method

Create a swarmplot for numeric data.


Standard swarmplot

plot a general swarmplot for all features

sns.swarmplot(data=iris, orient="v")  





The (FacetGrid) method


FacetGrid using a column

iris = sns.load_dataset("iris")
myPlot = sns.FacetGrid(data=iris, col="species", hue="petal_length")
myPlot = myPlot.map(plt.hist, "sepal_length")



FacetGrid using a row

iris = sns.load_dataset("iris")
myPlot = sns.FacetGrid(data=iris, row="species", hue="petal_width")
myPlot = myPlot.map(plt.hist, "sepal_width")






Comments