Standard Imports¶

In [1]:

import altair as alt
import pandas as pd

Seattle Weather Data¶

The Seattle Weather dataset is a record of the daily weather taken daily over four years, 2012 to 2015 inclusive, in Seattle, WA, USA.

In this example I'm using a slightly-modified verison of the data with four extra columns to show the year, month, day and dayOfYear for each row of data, because that makes it easier to create useful and informative graphs, which is what we're all about.

These are the first five rows of data:

In [2]:

df = pd.read_csv('seattle_weather.csv')
df.head()

Out[2]:

	date	precipitation	temp_max	temp_min	wind	weather	year	month	day	dayOfYear
0	2012-01-01	0.0	12.8	5.0	4.7	drizzle	2012	1	Sun	1
1	2012-01-02	10.9	10.6	2.8	4.5	rain	2012	1	Mon	2
2	2012-01-03	0.8	11.7	7.2	2.3	rain	2012	1	Tue	3
3	2012-01-04	20.3	12.2	5.6	4.7	rain	2012	1	Wed	4
4	2012-01-05	1.3	8.9	2.8	6.1	rain	2012	1	Thu	5

And this is the standard summary of the data:

In [3]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1461 entries, 0 to 1460
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   date           1461 non-null   object 
 1   precipitation  1461 non-null   float64
 2   temp_max       1461 non-null   float64
 3   temp_min       1461 non-null   float64
 4   wind           1461 non-null   float64
 5   weather        1461 non-null   object 
 6   year           1461 non-null   int64  
 7   month          1461 non-null   int64  
 8   day            1461 non-null   object 
 9   dayOfYear      1461 non-null   int64  
dtypes: float64(4), int64(3), object(3)
memory usage: 114.3+ KB

A Bar Chart¶

We can create a kind of a histogram here using temp_min, because a, temp_min has discrete data, making it suitable for this sort of thing, and b, it's a good way of understanding how altair makes histograms.

In [4]:

c = alt.Chart(df).mark_bar().encode(
    x='temp_min',
    y='count()',
    tooltip=['temp_min', 'count()'])

c

Out[4]:

Error loading script: Script error for: @popperjs/core http://requirejs.org/docs/errors.html#scripterror

`alt.X()` and Bins¶

So what happens when the data isn't discrete, like temp_min? Well. It turns out that the x=someColumn construction is shorthand for the real works. The x and y parameters are actually alt.X() and alt.Y() methods, and we can use these to add additional parameters to our values.

To create a histogram, then, is just the same as making a bar chart in that we call .mark_bar() as above. The difference is that we add the parameter bin=True to the alt.X() method.

In [5]:

c = alt.Chart(df).mark_bar().encode(
    x=alt.X('temp_min', bin=True),
    y='count()',
    tooltip=['temp_min', 'count()'])

c

Out[5]:

Those bins are a little on the portly side. We can tweak that by changing bin=True to bin=alt.Bin(maxbins=50).

In [6]:

c = alt.Chart(df).mark_bar().encode(
    x=alt.X('temp_min', bin=alt.Bin(maxbins=50)),
    y='count()',
    tooltip=['temp_min', 'count()'])

c

Out[6]:

Histograms show the central tendency of the data. If we want to see the mirror function of that, and identify the outliers, we look to boxlplots.

Boxplots¶

Making a boxplot in altair is a piece of cake. Call mark_boxplot() on alt.Chart(df), where df is the data frame with your data, and set either the x or y parameters to whichever numerical column you wish to explore. Setting the x parameter returns a horizontal boxplot, setting a y parameter returns a vertical boxplot. The sizing is done automatically.

In [7]:

alt.Chart(df).mark_boxplot().encode(
    x='temp_min')

Out[7]:

Outliers¶

If we add a tooltip, we can easily identify the outliers in the data.

In [8]:

alt.Chart(df).mark_boxplot().encode(
    x='precipitation',
tooltip=['date', 'precipitation'])

Out[8]:

Categorial Breakdown¶

If you wish to break out the data by a categorical parameter, feel free. Here we break out precipitation by year by setting y to the categorical parameter (note the year:N rather than year, to enforce categorical recognition), and x remains the numerical category, precipitation. Again, if we wanted a horizontal chart we'd just swith those around.

In [9]:

alt.Chart(df).mark_boxplot().encode(
    y='year:N',
x='precipitation',
tooltip=['date', 'precipitation'])

Out[9]:

Altair does boxplots so well it makes you feel like cheering.

Histograms and Boxplots in Python's Altair Package

Standard Imports¶

Seattle Weather Data¶

A Bar Chart¶

`alt.X()` and Bins¶

Boxplots¶

Outliers¶

Categorial Breakdown¶

The Original Video

Histograms and Boxplots in Python's Altair Package

Standard Imports¶

Seattle Weather Data¶

A Bar Chart¶

alt.X() and Bins¶

Boxplots¶

Outliers¶

Categorial Breakdown¶

The Original Video

`alt.X()` and Bins¶