import pandas as pd
import altair as alt
from vega_datasets import data
As pie charts are a relatively recent addition to Altair - the standing of pie charts is not high among in charting/plotting academia - we need to check out Altair version to make sure it's up to date. If it isn't, updating is easy - just open a terminal and run pip install -U altair
.
alt.__version__
'4.2.0'
The Seattle weather data has six fields. To these, we're going to add a sixth, year
, as we're going to use that year
category to slice our pie.
Note that we're converting out year
field to int
from str
. That'll be important later.
df = data.seattle_weather()
df['year'] = df.date.apply(lambda x: int(x.strftime('%Y')))
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1461 entries, 0 to 1460 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 1461 non-null datetime64[ns] 1 precipitation 1461 non-null float64 2 temp_max 1461 non-null float64 3 temp_min 1461 non-null float64 4 wind 1461 non-null float64 5 weather 1461 non-null object 6 year 1461 non-null int64 dtypes: datetime64[ns](1), float64(4), int64(1), object(1) memory usage: 80.0+ KB
While the other .mark_*()
methods we've seen so far all use x
and y
parameters, .mark_arc()
doesn't. It uses:
theta
, which is set to the numerical field that will be shown the chart, andcolor
, which is set to the categorical field by which we'll slice the chart.It's really a very simple chart.
c = alt.Chart(df).mark_arc().encode(
theta='sum(precipitation)',
color='year')
c
This pie chart isn't quite what you were expecting - or at least, it shouldn't be. Altair as recognised year as a number. It's thousand-separated the digits - "2,015" rather than "2015" - and it's used a color scale instead of discrete colors. This isn't what we want. We want it to see year as a discrete category. There are two ways to do this.
The first way is to change the data dtype(dtype
) in the data frame itself - or, in this particular case, to have left the year as a string rather than converting it to an integer when we called our lambda
function. And that's fine, and often the best way to do it.
Altair
's Own Encoding Data Types¶Altair has a shorthand that covers five data types, and allows users to explicitly state if they'd like a particular category treated in a particular way by suffixing a colon and one of five letters to the column name in setting the parameter = x=something:O
, y=something:N
, z=something:Q
, or whather. These are the five abbreviations.
Abbreviation | Meaning | Example |
---|---|---|
N |
Nominal | These are the names of things - any thing at all. Cats, dogs, apples, oranges. |
O |
Ordinal | These are the names of things that we associate as having a certain order. Days of the week, months of the year, gold-silver-bronze, those sorts of things. |
Q |
Qualitative | Qualitative data is any numerical data at all. |
T |
Temporal | Temporal data is datetime data. |
G |
geographical | Geographical Data is .geojson data, latitudes and longitudes. |
And here's the pie chart proper, with color set to year:N
.
c = alt.Chart(df).mark_arc().encode(
theta='sum(precipitation)',
color='year:N',
tooltip=['year', 'sum(precipitation)'])
c