Python – Seaborn and pd.scatter_matrix() plot color issues

data-visualizationmatplotlibpandaspythonseaborn

I am making a pd.scatter_matrix() plot from a DataFrame based on the Iris dataset colored by the target variable (plant species). When I run the code below I get a scatter matrix with black, grey and white (!) colored scattering points which hinders visualization. The grid seems inconsistent too, apparently only the plots close to the axis get the respective gridding. I wanted a nice grid and scatter matrix following the sns default color palette (blue, green, red).

Why is seaborn plot style and the use of pd.scatter_matrix() enforcing a different (awful!) color palette then the defaults for the scatter plots and inconsistent grid lines? How can I solve these visualization issues?

I already updated seaborn to a fairly recent version (0.8 of July 2017). Also tried the non-deprecated version the scatter_matrix plot for pandas pd.plotting.scatter_matrix() and had no luck. If I use the 'ggplot' style the color palette is correct for the scatter plots but the grids are still inconsistent.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn')
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X, columns = iris.feature_names)

pd.scatter_matrix(df, c=y, figsize = [8,8],
                      s=80, marker = 'D');

enter image description here

Package versions:

pandas version: 0.20.1
matplotlib version: 2.0.2
seaborn version:0.8.0

Best Answer

I am not sure if this answers your question but you could use the pairplot. let me know..

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X, columns = iris.feature_names)

pd.plotting.scatter_matrix(df, c=y, figsize = [8,8],
                      s=80, marker = 'D');
df['y'] = y

sns.pairplot(df,hue='y')

which gives you:

enter image description here

If you want to avoid that the last line of the visualizations then:

import seaborn as sns
sns.set(style="ticks", color_codes=True)
iris = sns.load_dataset("iris")
%matplotlib inline

iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species")

enter image description here