Choropleth Maps in Python

choropleth map of the french population, by department

Choropleth maps can be used to immediately convey important information about a geographical dataset.

Like heat maps, they show the local variations of a measurement, such as population density. However, while heat maps average measurements in arbitrary bins, choropleth maps do that according to predefined boundaries, such as country and state frontiers.

In this post, you will use state-of-the art python visualization libraries to draw choropleth maps. You will learn:

  • How to install holoviz, geopandas, and geoviews in a conda environment
  • How to create your first maps with world geographical boundaries
  • Where to find definitions of geographical boundaries to make maps
  • How to connect your own data to geographical boundaries, to plot whatever you like.

Installation

First, Install Anaconda if not yet done.

Then, create a new conda environment for this exercise, named holoviz:

conda create -n holoviz

Activate it:

conda activate holoviz

And install holoviz, which is a set of high-level tools for visualization in python:

conda install -c pyviz holoviz

We now need to install geopandas and geoviews, which are additional packages for analysis and visualization of geographical data, respectively:

conda install geopandas geoviews

Finally, here are the files needed for this tutorial (jupyter notebook, input dataset), in case you need them.

The python visualization landscape : orientation

All in all, we have actually installed a large number of python packages, that are (or might be) needed for geographical data analysis and visualization.

Here is a simplified description of the dependencies between some of these packages:

  • geoviews : geographical visualization
    • holoviews : high-level visualization of multidimensional data
      • bokeh : low-level visualization backend, based on JavaScript
    • geopandas : describe and analyze geographical data
      • pandas : describe and analyze data
        • numpy : efficient manipulation of multidimensional data arrays, and fundamental package for scientific computing in python

This might seem a bit complicated, and indeed it is!

Currently, at the end of 2019, the landscape of python visualization is transforming rapidly, and it can be quite difficult to choose and learn the right tools. Personally, here's what I'm looking for:

  1. a concise syntax: I want to do plots without wasting time writing code, to get fast insight on my data.
  2. interactive plots in the browser: this basically calls for JavaScript under the hood.
  3. big data: display lots of information without killing the client browser.
  4. python: so JavaScript should remain mostly hidden from me.

Formalizing these four points helped me a lot in choosing my tools. For example, point 4. rules out pure JavaScript libraries such as D3.js. I still believe that these libraries are the way to go for professional and large scale display in the browser. But D3 for example has quite a steep learning curve... I did spend at least a week on it already! I will probably prefer to hire an expert when I really need a D3-based display rather than investing time into learning it.

Bokeh, on the other hand, drives JavaScript from python code (without relying on D3). So that's exactly what I need to address point 2.

Many times, I use bokeh directly, like in Show your Data in a Google Map with Python or Interactive Visualization with Bokeh in a Jupyter Notebook. But making a single plot in bokeh can require a dozen lines of code or more.

Holoviews, which can be used as a high-level interface to bokeh or matplotlib, makes it easy to create complex plots in just two lines of code, and is thus addressing point 1.

When I need to display big data, I use datashader, a library that compresses big data into an image dynamically before sending it to a bokeh plot. Again, bokeh and holoviews are the way to go here.

In june 2019, the holoviz project was launched. The holoviz team is packaging the tools I need, and seems to share my views of how data visualization in python should evolve. So I'm now using holoviz as a main entry point to visualization. If you want to know more about this project, you can refer to their FAQ.

First maps with geoviews

We import the tools we need, and we initialize geoviews:

In [1]:
import geoviews as gv
import geoviews.feature as gf
import xarray as xr
from cartopy import crs

gv.extension('bokeh')