import pandas as pd
import altair as alt
from vega_datasets import data
The .mark_text()
method doesn't work quite the other mark methods - .mark_circle()
, .mark_bar()
, etc - in altair. Sometimes it's quite simple, as in this example in the documentation. Other times it can be quite complex.
We're going to use the .iowa_electricity
dataset, because it's very straightforward. there are three columns:
df = data.iowa_electricity()
df.head()
year | source | net_generation | |
---|---|---|---|
0 | 2001-01-01 | Fossil Fuels | 35361 |
1 | 2002-01-01 | Fossil Fuels | 35991 |
2 | 2003-01-01 | Fossil Fuels | 36234 |
3 | 2004-01-01 | Fossil Fuels | 36205 |
4 | 2005-01-01 | Fossil Fuels | 36883 |
c = alt.Chart(df).mark_line().encode(
x='year:T',
y='net_generation',
color='source',
tooltip=df.columns.to_list()).properties(
title="Iowa Electricity",
width=600,
height=377)
c
Modern charting theory suggests that it's a bit of a pain having to check colors against a legend in a line chart. Better to label the lines, as labels would then be found as the eye naturally tracks the line from left to right. But how to attach the labels? In the documentation example, labelling the data was easy because the chart was a bar chart. It's not so easy with a line chart. To correctly label these lines, we have to
holder = []
grouper = df.groupby('source')
for a, b in grouper:
holder.append(b[b.year==b.year.max()])
df2 = pd.concat(holder)
df2
year | source | net_generation | |
---|---|---|---|
16 | 2017-01-01 | Fossil Fuels | 29329 |
33 | 2017-01-01 | Nuclear Energy | 5214 |
50 | 2017-01-01 | Renewables | 21933 |
c2 = alt.Chart(df2).mark_point().encode(
x='year:T',
y='net_generation')
c2
c + c2
c3 = c2.mark_text().encode(text='source:N')
c3
c + c2 + c3
mark_point()
aren't very pretty.c3 = c2.mark_text(dx=40).encode(text='source:N')
c3
A mistake I made in the video was not realising the Nuclear Energy
label was closer to the left than the other other two labels. It's because mark_text()
aligns to the middle by default, something I should have realised just by looking at the labels. But it's an easy fix with align='left'
. And setting align='left'
means we have to reduce our dx
parameter too, as it's measuring against the leftmost part of the text now, rather than the center.
c3 = c2.mark_text(dx=10,
align='left').encode(text='source:N')
c3
c + c2 + c3
As easy as setting size=0
as a parameter in .mark_point()
c2 = alt.Chart(df2).mark_point(size=0).encode(
x='year:T',
y='net_generation')
c + c2 + c3
Just as the x
parameter in .encode()
can take an alt.X()
method and the y
an alt.Y()
, so the color
parameter can take a legend
parameter. Here, we set color=alt.Color('source', legend=None)
, and our properly-labelled line chart is now complete.
c = alt.Chart(df).mark_line().encode(
x='year:T',
y='net_generation',
color=alt.Color('source', legend=None),
tooltip=df.columns.to_list()).properties(
title="Iowa Electricity",
width=600,
height=377)
c + c2 + c3