import pandas as pd
import altair as alt
The Seattle Weather dataset is a record of the daily weather taken daily over four years, 2012 to 2015 inclusive, in Seattle, WA, USA.
In this example I'm using a slightly-modified verison of the data with four extra columns to show the year
, month
, day
and dayOfYear
for each row of data, because that makes it easier to create useful and informative graphs, which is what we're all about.
df = pd.read_csv('seattle_weather.csv')
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1461 entries, 0 to 1460 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 1461 non-null object 1 precipitation 1461 non-null float64 2 temp_max 1461 non-null float64 3 temp_min 1461 non-null float64 4 wind 1461 non-null float64 5 weather 1461 non-null object 6 year 1461 non-null int64 7 month 1461 non-null int64 8 day 1461 non-null object 9 dayOfYear 1461 non-null int64 dtypes: float64(4), int64(3), object(3) memory usage: 114.3+ KB
We're going to create a stacked bar chart from this data. Our metric is average precipitation, which we're going to break down by year and by day. The years are along the x-axis and the days are the stacks in the bar chart. Here we go:
c = alt.Chart(df).mark_bar().encode(
x='year:O',
y='mean(precipitation)',
color='day',
tooltip=['day', 'year', 'mean(precipitation)']).interactive()
c
And what you notice is that the graph, like all altair
default graphs, is a bit on the squished side. That's how altair
is set up, to take up as little space as possible. But not to worry - there's a .properties()
method that's going to fix that for us.
.properties()
Method¶The properties method takes three parameters - title
, height
and width
. We add those values (height and width are in pixels), and already we see an improvement.
c = alt.Chart(df).mark_bar().encode(
x='year:O',
y='mean(precipitation)',
color='day',
tooltip=['day', 'year', 'mean(precipitation)']).interactive().properties(
title='Average Precipitation in Seattle by Day, by Year',
height=370,
width=600)
c
The x
and y
parameters can take alt.X()
and alt.Y()
methods, which allow us to format them a little better. In this case, we're going to set x=alt.X("year:O", sort="-y")
, which means sort on the desending value of the y-parameter. So instead of sorting in yearly order, it sorts from the highest to the lowest yearly value.
Again, this data isn't of any use to anyone; it's just useful for showing the how to make the most of altair
. Here's that sorted plot:
c = alt.Chart(df).mark_bar().encode(
x=alt.X('year:O', sort='-y'),
y='mean(precipitation)',
color='day',
tooltip=['day', 'year', 'mean(precipitation)']).interactive().properties(
title='Average Precipitation in Seattle by Day, by Year',
height=370,
width=600)
c
We can also use a custom sort option, which we do here. Now, we're going to have the days on the x-axis and mean annual precipitation values colored by year. Our custom sort is a simple list of names: x=alt.X('day:O', sort=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
. If we didn't give this list of days in correct order, the axis would sort alphabetically by default, and look awful.
c = alt.Chart(df).mark_bar().encode(
x=alt.X('day:O', sort=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']),
y='mean(precipitation)',
color='year',
tooltip=['day', 'year', 'mean(precipitation)']).interactive().properties(
title='Average Precipitation in Seattle by Day, by Year',
height=370,
width=600)
c
You'll have noticed the tooltip gives an intimidating number of decimal places. We don't need that, and we can fix it.
Just as there are alt.X()
and alt.Y()
methods, there is an alt.Tooltip()
method. So instead of having mean(precipitation)
in our tooltip as before, we return an alt.Tooltip()
value with two parameters, the first being the value and the second being the formatting: alt.Tooltip('mean(precipitation)', format=".2f")
.
c = alt.Chart(df).mark_bar().encode(
x=alt.X('day:O', sort=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']),
y='mean(precipitation)',
color='year',
tooltip=['day', 'year', alt.Tooltip('mean(precipitation)', format=".2f")]).interactive().properties(
title='Average Precipitation in Seattle by Day, by Year',
height=370,
width=600)
c
What's left? Well, now that we expanded the size of the chart, the chart is overpowering the size of its labels. But not to worry, we can fix that. Both the alt.X()
and alt.Y()
methods take an axis
parameter, which allows us to style:
It's not always obvious what these sizes should be, but it's not hard experiment with different sizes. Here's how we'll style the y-axis:
y=alt.Y('mean(precipitation)', axis=alt.Axis(title="Average Precipitation", labelFontSize=12, titleFontSize=14))
c = alt.Chart(df).mark_bar().encode(
x=alt.X('day:O', sort=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],
axis=alt.Axis(title="Days of the Week", labelFontSize=12, titleFontSize=14)),
y=alt.Y('mean(precipitation)', axis=alt.Axis(title="Average Precipitation", labelFontSize=12, titleFontSize=14)),
color='year',
tooltip=['day', 'year', alt.Tooltip('mean(precipitation)', format=".2f")]).interactive().properties(
title='Average Precipitation in Seattle by Day, by Year',
height=370,
width=600)
c
You'll notice how the years aren't formatting correctly in the color bar. That's because we've set color='year'
, and altair
is recognising year as a numerical quantity. We can fix that by setting years as ordinal and our chart is now finally fully dressed and ready for the Lord Mayor's Ball.
c = alt.Chart(df).mark_bar().encode(
x=alt.X('day:O', sort=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],
axis=alt.Axis(title="Days of the Week", labelFontSize=12, titleFontSize=14)),
y=alt.Y('mean(precipitation)', axis=alt.Axis(title="Average Precipitation", labelFontSize=12, titleFontSize=14)),
color='year:O',
tooltip=['day', 'year', alt.Tooltip('mean(precipitation)', format=".2f")]).interactive().properties(
title='Average Precipitation in Seattle by Day, by Year',
height=370,
width=600)
c