Seaborn Module - Python
What is seaborn?
"seaborn" is a very strong module in python used to visualize data. This module is basically built over the matplotlib module which provides it with powerful features. It can be used to create hybrid plots between the common useful plots like histograms, scatter plots, and linear graphs to give us some insights into the dataset.
Installing and Importing
Run the following command on the CMD to install the module:
pip install seaborn
Import the required modules:
import seaborn as sns
import matplotlib.pyplot as plt
Before starting
Preparing data
Check some of the pre-defined datasets from seaborn
sns.set()
sns.get_dataset_names()
# ['anagrams','anscombe','attention','brain_networks','car_crashes',
# 'diamonds','dots','exercise','flights','fmri','gammas','geyser','iris',
# 'mpg','penguins','planets','taxis','tips','titanic']
Select one of the data sets for testing
# get the example “iris” from the saved ones
iris = sns.load_dataset("iris")
Change the theme
sns.set_theme(style="dark")
Investigate the dataset
Get info about the data:
As we can see, it is represented as a (pandas dataframe). So we can apply all pandas functions to it
iris.info() # return some information about the selected dataset
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
Take a look at the data
iris.head() # return the first 5 rows
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
Global plots attributes
We have some attributes that control the view and look of the plot, so we can use them in any kind of plot.
sns.pairplot(data=iris, attr1=val1, attr2=val2, ...)
Marker
It changes the shape of the dots (only for plots with dots)
markers=["o", "s", "D"]
Size
It changes the size (scale) of the plot
size=5
Aspect
It changes the aspect ratio of the plot
aspect=2 # means 2:1 aspect=0.5 # means 1:2
Color
It changes the color of the plot
color="Red"
Palette
It changes the color palette of the plot
palette="Oranges"
we can choose one from the following for the attribute (palette):
‘Accent’, ‘Accent_r’, ‘Blues’, ‘Blues_r’, ‘BrBG’, ‘BrBG_r’, ‘BuGn’, ‘BuGn_r’, ‘BuPu’, ‘BuPu_r’, ‘CMRmap’, ‘CMRmap_r’, ‘Dark2’, ‘Dark2_r’, ‘GnBu’, ‘GnBu_r’, ‘Greens’, ‘Greens_r’, ‘Greys’, ‘Greys_r’, ‘OrRd’, ‘OrRd_r’, ‘Oranges’, ‘Oranges_r’, ‘PRGn’, ‘PRGn_r’, ‘Paired’, ‘Paired_r’, ‘Pastel1’, ‘Pastel1_r’, ‘Pastel2’, ‘Pastel2_r’, ‘PiYG’, ‘PiYG_r’, ‘PuBu’, ‘PuBuGn’, ‘PuBuGn_r’, ‘PuBu_r’, ‘PuOr’, ‘PuOr_r’, ‘PuRd’, ‘PuRd_r’, ‘Purples’, ‘Purples_r’, ‘RdBu’, ‘RdBu_r’, ‘RdGy’, ‘RdGy_r’, ‘RdPu’, ‘RdPu_r’, ‘RdYlBu’, ‘RdYlBu_r’, ‘RdYlGn’, ‘RdYlGn_r’, ‘Reds’, ‘Reds_r’, ‘Set1’, ‘Set1_r’, ‘Set2’, ‘Set2_r’, ‘Set3’, ‘Set3_r’, ‘Spectral’, ‘Spectral_r’, ‘Wistia’, ‘Wistia_r’, ‘YlGn’, ‘YlGnBu’, ‘YlGnBu_r’, ‘YlGn_r’, ‘YlOrBr’, ‘YlOrBr_r’, ‘YlOrRd’, ‘YlOrRd_r’, ‘afmhot’, ‘afmhot_r’, ‘autumn’, ‘autumn_r’, ‘binary’, ‘binary_r’, ‘bone’, ‘bone_r’, ‘brg’, ‘brg_r’, ‘bwr’, ‘bwr_r’, ‘cividis’, ‘cividis_r’, ‘cool’, ‘cool_r’, ‘coolwarm’, ‘coolwarm_r’, ‘copper’, ‘copper_r’, ‘cubehelix’, ‘cubehelix_r’, ‘flag’, ‘flag_r’, ‘gist_earth’, ‘gist_earth_r’, ‘gist_gray’, ‘gist_gray_r’, ‘gist_heat’, ‘gist_heat_r’, ‘gist_ncar’, ‘gist_ncar_r’, ‘gist_rainbow’, ‘gist_rainbow_r’, ‘gist_stern’, ‘gist_stern_r’, ‘gist_yarg’, ‘gist_yarg_r’, ‘gnuplot’, ‘gnuplot2’, ‘gnuplot2_r’, ‘gnuplot_r’, ‘gray’, ‘gray_r’, ‘hot’, ‘hot_r’, ‘hsv’, ‘hsv_r’, ‘icefire’, ‘icefire_r’, ‘inferno’, ‘inferno_r’, ‘magma’, ‘magma_r’, ‘mako’, ‘mako_r’, ‘nipy_spectral’, ‘nipy_spectral_r’, ‘ocean’, ‘ocean_r’, ‘pink’, ‘pink_r’, ‘plasma’, ‘plasma_r’, ‘prism’, ‘prism_r’, ‘rainbow’, ‘rainbow_r’, ‘rocket’, ‘rocket_r’, ‘seismic’, ‘seismic_r’, ‘spring’, ‘spring_r’, ‘summer’, ‘summer_r’, ‘tab10’, ‘tab10_r’, ‘tab20’, ‘tab20_r’, ‘tab20b’, ‘tab20b_r’, ‘tab20c’, ‘tab20c_r’, ‘terrain’, ‘terrain_r’, ‘twilight’, ‘twilight_r’, ‘twilight_shifted’, ‘twilight_shifted_r’, ‘viridis’, ‘viridis_r’, ‘vlag’, ‘vlag_r’, ‘winter’, ‘winter
The (pairplot) method
Create a pair plot that shows the relations between data.
NOTE: none of the seaborn plots will be displayed unless you add the “show” method after them
Standard pairplot
sns.pairplot(data=iris) # create the plot
plt.show() # display the plot
Categorized pairplot
We want to classify the data into categories based on the column “species”
sns.pairplot(data=iris, hue='species')
plt.show()
KDE pairplot
Applys do the off-diagonal graphs (not the diagonal ones) as KDEs
sns.pairplot(data=iris, kind="kde")
Histogram pairplot
Create a histogram pairplot for diagonals.
sns.pairplot(data=iris, kind="hist")
Style mapped pairplot
Applies a style mapping on the off-diagonal axes (change the shape of the dots). Currently, it will be redundant with the hue variable. Markers define the shapes of dots for each plot (dots, squares, diamonds).
sns.pairplot(data=iris, hue="species", markers=["o", "s", "D"])
The (jointplot) method
Create a jointplot that shows the relationship between two columns. It is a combination of scatter plots and histogram plots.
Standard joinplot
We can join plots together using specific columns. The hue attribute specifies the column we want to study with respect to the X and Y.
sns.jointplot(data= iris, y='sepal_width' , x='sepal_length', hue='species')
Hexagon joinplot
We can represent the depth of values using hexagons
sns.jointplot(data= iris, y='sepal_width' , x='sepal_length', kind="hex")
The (histoplot) method
Create a histoplot for categorical data.
Standard histoplot
Create a histogram for a special column
sns.histplot(data=iris, x='sepal_length')
Relational histoplot
We can separate the categories for a special column
sns.histplot(data= iris, x='sepal_length', hue='species')
Sided histoplot
Use “dodge” to place the bars beside each other.
sns.histplot(data= iris, x='sepal_length', hue='species', multiple='dodge')
Stacked histoplot
Use "stack" to stack bars on top of each other instead of making them transparent.
sns.histplot(data= iris, x='sepal_length', hue='species', multiple='stack')
Binwidth histoplot
Change the thickness of the bars.
sns.histplot(data= iris, x='sepal_length', hue='species', binwidth=0.2)
Matplotlib.hist stacked
We can use this method to plot multiple histograms over each other. We usually distribute the data by their categories to plot them indevidually.
labels = iris.species.unique() # get all categories of the column (species)
# distribute the data over the categories and get only the (sepal_length) values
newList = [iris[iris.species == i].sepal_length for i in labels]
# plot the 3 independent graphs over each other
plt.hist(newList, bins=30, stacked=True, label=labels)
plt.legend()
plt.show()
We can turn them into thin bars
plt.hist(newList, bins=30, stacked=True, label=labels, histtype='bar', rwidth=0.8)
We can show only the steps
plt.hist(newList, bins=30, stacked=True, label=labels, histtype='step')
The (distplot) method
Create a distplot for numeric data. It is a combination of a line plot and a histogram.
iris = iris.iloc[:, :-1] # we have to delete any non-numeric data
Standard distplot
sns.distplot(iris)
Bins of distplot
We can change the number of bars (bins) for the graph.
sns.distplot(iris, bins=30)
Hidden histo distplot
We can hide the histogram in the background.
sns.distplot(iris, bins=30, hist=False)
Colored distplot
We can change its color.
sns.distplot(iris, bins=30, color="Red")
The (lmplot) method
Create a lmplot (linear-m plot) for numeric data. It is a combination of a scatter plot and a linear plot.
Standard lmplot
sns.lmplot(data=iris, x="sepal_length", y="petal_length")
Scatter lmplot
Hide the linear regression line and leave the scatter plot.
sns.lmplot(data=iris, x="sepal_length", y="petal_length", fit_reg=False)
Custom diagonal to the lmplot
Add a custom diagonal line to see which data is upper the diagonal and which is lower
sns.lmplot(data=iris, x="petal_width", y="petal_length", fit_reg=False)
plt.plot((0, 8), (0, 8), 'k--')
plt.show()
Specified lmplot
Plot linear graphs for specific columns. The hue attribute specifies the column we want to study with respect to the X and Y.
sns.lmplot(data=iris, x="sepal_length", y="petal_length", hue="species")
The (kdeplot) method
Create a KDE (kernel density estimate) plot for numeric data. It is a plot that shows translates the scatter plot into continuous lines to show the density areas of data.
Standard kdeplot
sns.kdeplot(iris.sepal_width, iris.sepal_length)
Colored kdeplot
sns.kdeplot(iris.sepal_width, iris.sepal_length, cmap="Reds")
Shaded kdeplot
sns.kdeplot(iris.sepal_width, iris.sepal_length, shade=True)
Background shaded kdeplot
sns.kdeplot(iris.sepal_width, iris.sepal_length, shade=True, shade_lowest=True)
The (heatmap) method
Create a heatmap for numeric data.
Standard heatmap
sns.heatmap(iris.corr(), annot=True, cmap='RdBu_r')
Hidden heatmap
We can hide the annotations from the heatmap.
sns.heatmap(iris.corr(), annot=False, cmap='RdBu_r')
The (boxplot) method
Create a boxplot between categorical and numerical data.
Standard boxplot
plot all the columns with the corresponding categories.
sns.boxplot(data=iris)
Horizontal boxplot
sns.boxplot(data=iris, orient="h")
Specified boxplot
plot special columns as a box plot
sns.boxplot(data=iris, orient="h", x='sepal_length', y='species')
The boxenplot
It is the same as the boxplot but with wider boxes with no deleted ones after the max and min limits.
sns.boxenplot(data=iris)
The (violinplot) method
Create a violinplot between categorical and numerical data.
Standard violinplot
plot a general violinplot for all columns.
sns.violinplot(data=iris)
Specified violinplot
plot a general violinplot for two columns. The X-axis must have categorical data and the Y-axis must have numerical data points.
sns.violinplot(data=iris, x=iris.species, y=iris.petal_length)
The (swarmplot) method
Create a swarmplot for numeric data.
Standard swarmplot
plot a general swarmplot for all features
sns.swarmplot(data=iris, orient="v")
The (FacetGrid) method
FacetGrid using a column
iris = sns.load_dataset("iris")
myPlot = sns.FacetGrid(data=iris, col="species", hue="petal_length")
myPlot = myPlot.map(plt.hist, "sepal_length")
FacetGrid using a row
iris = sns.load_dataset("iris")
myPlot = sns.FacetGrid(data=iris, row="species", hue="petal_width")
myPlot = myPlot.map(plt.hist, "sepal_width")
Comments
Post a Comment