data:image/s3,"s3://crabby-images/c4b98/c4b9864e2f02c43a88f38f35c727337829f2e2cd" alt="Python Data Mining Quick Start Guide"
Plotting and exploring data – harnessing the power of Seaborn
Now let's start our analysis with Seaborn's canned plotting routine called pairplot to visualize pairwise feature relationships. You can use this routine to hunt down relationships, candidates for groupings, possible outliers, and an intuition for what downstream strategies to investigate for analysis. Each off-diagonal cell is a pairwise scatter plot and the diagonals are filled with univariate distributions:
# explore with Seaborn pairplot
import seaborn as sns
sns.pairplot(df,hue='species')
You will see the following output after executing the preceding code:
data:image/s3,"s3://crabby-images/fcc81/fcc819244431298b3a78f7ead9540fb0f7480eeb" alt=""
Sometimes, a histogram is easier to use than probability-density plots for understanding a distribution. With Seaborn, we can easily pass the diag_kind arg and re-plot it to view the histograms in the diagonals.
Also, we can change the aesthetics with palette and marker args. You can refer to the Seaborn documentation for more available args; let's do the re-plot as follows:
# add histograms to diagonals of Seaborn pairplot
sns.pairplot(df,hue='species',diag_kind='hist',
palette='bright',markers=['o','x','v'])
You will see the following output after executing the preceding code:
data:image/s3,"s3://crabby-images/77e8a/77e8a5ffcbb452efb3ce6f2088a6ca15833ee5c3" alt=""
At this point, we can choose two variables and plot them in a scatter plot with Seaborn's lmplot. If your dataset has more than five features, important variable relationships may not be shown on the same window of the pair plot. You can use this bivariate scatter plot to isolate and view important pairings:
# plot bivariate scatter with Seaborn
sns.lmplot(x='petal length in cm', y='petal width in cm',
hue="species", data=df, fit_reg=False,
palette='bright',markers=['o','x','v'])
You will see the following output after executing the preceding code:
data:image/s3,"s3://crabby-images/9b690/9b6902467b97027a922a3feffa1e8cdd554186cd" alt=""
A popular quick-view of a single feature vector is a violin plot. Many practitioners prefer violins for understanding raw value distributions and class spreads on a single plot. Each violin is actually the univariate distribution, displayed as probability density, of the values within a given class plotted vertically like a box plot. This concept probably sounds convoluted, but one look at the plot should get the idea across with ease, and that's the idea. The more violin plots you see, the more you will learn to love them:
sns.violinplot(x='species',y='petal length in cm', data=df)
You will see the following output after executing the preceding code:
data:image/s3,"s3://crabby-images/48f86/48f86abb7e17c4a1522553b2c772a5e6ea65f42f" alt=""