Unemployment Heatmap by County

unemployment

Using data from the United States Department of Agriculture, I’ve created heatmaps like the one above. It shows the unemployment rate by county in the United States. In this post I’ll take you through my methodology. The result is a monochromatic gif showing unemployment rates from 2007 through 2015, included at the bottom of the post. The code is a quick and dirty version I threw together in Jupyter, but it should serve as a viable guide for the techniques used throughout.

Preliminary Steps

Importing all the necessary libraries:

import pandas as pd
import math
import statistics
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.basemap import Basemap
from matplotlib.patches import Polygon
import matplotlib.font_manager as font_manager
%matplotlib inline

Pruning the data to include only unemployment rates and some necessary labels:

data = pd.read_excel('../Unemployment.xls', skiprows = [0,1,2,3,4,5])

data = data.drop(data.columns[3:10], axis=1)
for i in range(1,9):
    data = data.drop(data.columns[3+i:6+i], axis=1)

For standardizing the data later, I found the average unemployment and standard deviations for all of the years:

std = statistics.mean(((data['Unemployment_rate_2015'].std()),
(data['Unemployment_rate_2014'].std()),
(data['Unemployment_rate_2013'].std()),
(data['Unemployment_rate_2012'].std()),
(data['Unemployment_rate_2011'].std()),
(data['Unemployment_rate_2010'].std()),
(data['Unemployment_rate_2009'].std()),
(data['Unemployment_rate_2008'].std()),
(data['Unemployment_rate_2007'].std())))

mean = statistics.mean(((data['Unemployment_rate_2015'].mean()),
(data['Unemployment_rate_2014'].mean()),
(data['Unemployment_rate_2013'].mean()),
(data['Unemployment_rate_2012'].mean()),
(data['Unemployment_rate_2011'].mean()),
(data['Unemployment_rate_2010'].mean()),
(data['Unemployment_rate_2009'].mean()),
(data['Unemployment_rate_2008'].mean()),
(data['Unemployment_rate_2007'].mean())))

It became necessary to reclassify the “FIPS_Code” field as a 6-digit string instead of a 5 or 6 digit number. This is used later on to detect the corresponding polygons for each county.

def addZero(s):
    if (len(s) == 4):
        return '0' + s
    return s

data['FIPS_Code'] = data['FIPS_Code'].astype(str)
data['FIPS_Code'] = data['FIPS_Code'].apply(addZero)

 

Defining Color Functions

Throughout the years, county unemployment rates remained between 1% and 30%. To create an accurate color-scale, I chose to standardize the data. From there it was easy to create functions that took the unemployment rate for the county and returned RGB values for the map:

r_max = (30 - mean)/std
r_min = (1 - mean)/std

def get_color(r):
    stand_r = (r - mean)/std
    val = (stand_r - r_min)/(r_max - r_min)
    return (int(255-255 * val), int(255-255 * val), int(255-255 * val))

def get_color_float(r):
    stand_r = (r - mean)/std
    val = (stand_r - r_min)/(r_max - r_min)
    return ((255 - 255 * val)/255,(255-255 * val)/255,(255-255 * val)/255)
#The color_list variable is used later for the map's key
color_list = []
for i in range(1,30):
    color_list.append(get_color_float(i))

 

Generating the Map

This is the most dense section, as creating the map requires a lot of different libraries and visual tweaks. The polygons for the counties were created from a shapefile provided by the US Census Bureau. I attempted to make this segment clearer by adding comments, but there’s certainly a lot going on:

def makemap(m_idx):
    #Creating a box around the US
    lon_min, lon_max = -128,-67
    lat_min, lat_max = 25,50

    plt.figure(1, figsize=(24,10))

    #Create the Basemap
    m = Basemap(projection='merc',
                 llcrnrlat=lat_min,
                 urcrnrlat=lat_max,
                 llcrnrlon=lon_min,
                 urcrnrlon=lon_max)

    #Add some visual parameters
    m.fillcontinents(color='#ffffff', lake_color = '#ffffff')
    m.drawcoastlines(linewidth= 0.2, color= '#ffffff')
    m.drawcountries(linewidth= 0.2, color = '#ffffff')
    m.drawstates(linewidth= 0.2, color = '#ffffff')
    m.drawmapboundary(color = '#ffffff', fill_color = '#ffffff')

    #Open the shapefile of county lines downloaded from the US Census
    m.readshapefile('../cb_2015_us_county_5m/cb_2015_us_county_5m', name='counties', drawbounds = False)
    ax = plt.gca()

    #Create a list of the GEOIDs of counties
    county_nums = []
    for shape_dict in m.counties_info:
        county_nums.append(shape_dict['GEOID'])

    #Create polygons using county lines, coloring them using the custom color function
    count = -1
    for c in data['FIPS_Code']:
        count = count + 1
        if c in county_nums and math.isnan(data.iloc[count, m_idx+2]) == False:
            seg = m.counties[county_nums.index(c)]
            poly = Polygon(seg, facecolor= '#%02x%02x%02x' % get_color(data.iloc[count, m_idx+2]), edgecolor= '#ffffff')
            ax.add_patch(poly)

    #Add in the color key on the right side
    mymap = mpl.colors.LinearSegmentedColormap.from_list('mycolors',color_list)
    mn, mx = (1, 30)
    step = 1
    Z = [[1,1],[30,30]]
    levels = range(mn,mx+step,step)
    CS3 = plt.contourf(Z, levels, cmap=mymap)
    bar = plt.colorbar(CS3, shrink = 0.9)

    #Custom labels for the color key
    tick_locs   = [1,5,10,15,20,25,30]
    tick_labels = ['1%','5%','10%','15%','20%','25%','30%']
    bar.locator     = mpl.ticker.FixedLocator(tick_locs)
    bar.formatter   = mpl.ticker.FixedFormatter(tick_labels)
    bar.update_ticks()

    #Fonts for both the title and key
    title_font = dict(family='sans-serif', style = 'normal', weight = 'bold', size=20)
    font = dict(family='sans-serif', style = 'normal', weight = 'bold', size=8)
    mpl.rc('font', **font)  

    #Plot graphs, save files
    plt.title('Unemployment Rate by County (' + str(m_idx + 2006) + ')', fontdict = title_font)
    fig = plt.gcf()
    plt.show()
    fig.savefig('unemployment_' + str(m_idx + 2006) + '.png', transparent =    False)

To create the final product, a loop runs through the map-making function for each year. This yields the images necessary to create the gif below:

for i in range(1,10):
    makemap(i)

unemployment-over-time-cropped

 

(c) Copyright 2016 Keith Selover, all rights reserved.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s