import pandas as pd
import altair as alt
from vega_datasets import data
df = data.iris()
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 sepalLength 150 non-null float64 1 sepalWidth 150 non-null float64 2 petalLength 150 non-null float64 3 petalWidth 150 non-null float64 4 species 150 non-null object dtypes: float64(4), object(1) memory usage: 6.0+ KB
Just as we can distinguish categorical variables by setting a color
parameter, we can break a plot into subplots by category by using a column
or row
category.
The iris dataset is made up of three species. Here we'll plot each species singly as subplots in a single plot, first by column, and then by row.
alt.Chart(df).mark_circle().encode(
x='petalLength',
y='petalWidth',
column='species').properties(
width=200,
height=200)
alt.Chart(df).mark_circle().encode(
x='petalLength',
y='petalWidth',
row='species').properties(
width=200,
height=200)
h = alt.Chart(df).mark_bar().encode(
x=alt.X('petalLength', bin=alt.Bin(maxbins=20)),
y='count()')
h
And here we create a boxplot of the same data:
b = alt.Chart(df).mark_boxplot().encode(
x="petalLength:Q")
b
h + b
We can easily change the fill colours to make the plots more distinguishable.
h = alt.Chart(df).mark_bar(color='lightblue').encode(
x=alt.X('petalLength', bin=alt.Bin(maxbins=20)),
y='count()')
b = alt.Chart(df).mark_boxplot(color='crimson').encode(
x="petalLength:Q")
h + b
There are two different methods of combining plots, as shown in the table. There is no different between them. a & b == alt.vconcat(a,)
will always return True
. So will a | b == alt.hconcat(a,b)
and a & b == alt.layer(a,b)
.
Code | Result | Alternative Code |
---|---|---|
a & b |
Combine plots row-wise. | alt.vconcat(a,b) |
a | b |
Combine plots column-wise. | alt.hconcat(a,b) |
a + b |
Overlay plots. | alt.layer(a,b) |
Repeat plots are the equivalent of the pandas
native .scatter_matrix()
method.
pd.plotting.scatter_matrix(df);
To create an altair
repeat plot, we need a list of rows and columns that we want repeated. These will be the numerical columns in the iris dataset.
numerical_columns = df.columns[:-1].to_list()
numerical_columns
['sepalLength', 'sepalWidth', 'petalLength', 'petalWidth']
Note that numerical_columns
is a list. Using a pd.series
or np.array
will just throw an error.
tooltip = df.columns.to_list()
alt.Chart(df).mark_circle().encode(
x=alt.X(alt.repeat('row'), type='quantitative'),
y=alt.Y(alt.repeat('column'), type='quantitative'),
color='species',
tooltip=tooltip).properties(
width=150,
height=150).repeat(
row=numerical_columns,
column=numerical_columns)
More colorful, more interactive, more customizable. Winner all the way.