Create an interactive display for geographical data with python: real-estate prices near Geneva.

In this post, you will learn how to use python to overlay your data on top of a dynamic Google map.

As an example, we will use a dataset containing all the real-estate sells that occurred in 2018 and 2019 in France, near the swiss town of Geneva. If you just want to see the prices, you'll find a ready-to-use interactive plot at the end of the post.

In real world data science, geographical datasets are everywhere. In fact, as soon as measurements are done at a given place in the world, the dataset becomes geographical. Think about census, real estate, a distributed system of IOT sensors, geological or weather data, etc.

To gain insight into such datasets, you need to be able to display or segment them as a function of geographical coordinates. As soon as you do that, obvious features will jump at your eyes. You'll see and fix bugs in your data processing, and you'll start thinking about ways to extract valuable information from these datasets.

• Installation: set up python for this exercise
• Get a Google Map API key : this is necessary to be able to display google maps in your applications
• How to prepare your data for geographical display : we will use pandas to read the dataset from file, and have a first look at the data before display.
• Dynamic Google Map with data overlay : we will create a nice interactive plot with bokeh.

## Installation

In the article Interactive Visualization with Bokeh in a Jupyter Notebook, we have seen how to use bokeh to easily create interactive and engaging visualizations. And in Simple Text Mining with Pandas, you can see how pandas can be used to process and analyse data efficiently, in a few lines of code.

As mentioned above, we will need pandas for data analysis and bokeh for visualization. So we are now going to set up a new Anaconda environment with both tools.

First, Install Anaconda if not yet done, and create the new environment, and activate it:

conda create --name geovis python=3.7
conda activate geovis


Then, install the additional packages that we need:

conda install pandas bokeh jupyter


## Get a Google Map API Key

The API key is necessary to be able to create a Google Map from an application or a website such as this one.

Before getting started please note that the Google Map API is NOT free. But Google offers 200 dollars of free credit per month, which is more than enough to follow this tutorial, and even to use the API as a hobby. For example, this web page is not going to cost me anything, given the amount of traffic I'm currently getting.

After you get your key, put it in an environment variable (we will read this variable later on to draw the maps:)

export GOOGLE_API_KEY=<your_key>


First, download the dataset csv file here, and save it as dvf_gex.csv.

We start by importing pandas and by setting up bokeh for integrated display within the jupyter notebook:

In [1]:
import pandas as pd
from bokeh.io import output_notebook
output_notebook()
bokeh_width, bokeh_height = 500,400


Then, we load our data into a pandas dataframe, and we print the first rows:

In [2]:
df = pd.read_csv('dvf_gex.csv')

Out[2]:
Unnamed: 0 price area_tot area_build lon lat
0 83897 741139.0 386.0 0.0 6.072922 46.319225
1 83912 716500.0 2731.0 0.0 6.072922 46.319225
2 83927 15000.0 727.0 0.0 6.072922 46.319225
3 83957 741139.0 338.0 0.0 6.068902 46.323598
4 83997 582000.0 4643.0 0.0 6.072211 46.316697

Each row in the data frame corresponds to a single transfer of real-estate ownership. And here is a description of the columns:

• price : price at which the property was sold
• area_tot : total floor surface, including buildings, garden, etc.
• area_build : surface of the buildings. If it's 0, it means that this property has no buildings yet.
• lon : longitude
• lat : latitude

The first column is the index of the df dataframe, and the second column is the former index of the dataframe from which I extracted this small sample. You can just forget about them.

We will start by simply displaying a dynamic Google Map, and we will gradually improve our plot by adding more and more features. Finally, we will orverlay our data.

## Dynamic Google Map in the Jupyter notebook¶

First, we need to choose a coordinate for the center the map. I decided to use the one of Saint-Genis-Pouilly, France, which is in the middle of the area we're interested in. To find the coordinate of a place, you can type search Google for the name of this place, followed by the keywords "lat lon". Here is what I got:

In [3]:
lat, lon = 46.2437, 6.0251


We now have to read the Google Map API key from the environment variable (see above:)

In [4]:
import os


Then, we import the bokeh tools needed to show a simple dynamic map, and we write a small function to show the map:

In [5]:
from bokeh.io import show
from bokeh.plotting import gmap
from bokeh.models import GMapOptions

gmap_options = GMapOptions(lat=lat, lng=lng,
map_type=map_type, zoom=zoom)
p = gmap(api_key, gmap_options, title='Pays de Gex',
width=bokeh_width, height=bokeh_height)
show(p)
return p


And we call this function:

In [6]:
p = plot(lat, lon)


You can now try and call again the function with different arguments. For example, you could use different coordinates for the center (maybe the ones of your place?), a different zoom level, or different map types (try satellite or terrain).

Now, let's add a marker showing the center of the map:

In [7]:
def plot(lat, lng, zoom=10, map_type='roadmap'):
gmap_options = GMapOptions(lat=lat, lng=lng,
map_type=map_type, zoom=zoom)
p = gmap(api_key, gmap_options, title='Pays de Gex',
width=bokeh_width, height=bokeh_height)
# beware, longitude is on the x axis ;-)
center = p.circle([lng], [lat], size=10, alpha=0.5, color='red')
show(p)
return p

p = plot(lat, lon, map_type='terrain')


You can use the toolbar on the right side of the map to activate the pan, wheel zoom, and reset tools.

## Google Map with Data Overlay in the Jupyter Notebook¶

That's actually not much more difficult than what we've already done!

We just need to declare a bokeh ColumnDataSource for the data we want to overlay, from our dataframe. Once this is done, we just need to tell bokeh which columns to use for the x and y coordinates.

But before we do this, we first need to check how many points we're going to display. Indeed, you should keep in mind that bokeh will send these points to the client browser. If you send too many, you're just going to kill it. Let's see:

In [8]:
df.shape

Out[8]:
(3031, 6)

Only 3000 points or so, that's perfect. As a rule of thumb, you can display up to 50 000 points. If you have more, you will need to resort to other strategies, and we will see how to do that in a future post.

In [9]:
from bokeh.models import ColumnDataSource

gmap_options = GMapOptions(lat=lat, lng=lng,
map_type=map_type, zoom=zoom)
p = gmap(api_key, gmap_options, title='Pays de Gex',
width=bokeh_width, height=bokeh_height)
# definition of the column data source:
source = ColumnDataSource(df)
# see how we specify the x and y columns as strings,
# and how to declare as a source the ColumnDataSource:
center = p.circle('lon', 'lat', size=4, alpha=0.2,
color='yellow', source=source)
show(p)
return p

p = plot(lat, lon, map_type='satellite')


Nice! We're now ready to make this plot actually useful.

The first thing we're going to do is to add a bit of interactivity: wouldn't it be nice to get information about a point by just hovering over the point with the mouse?

Then, we can encode information in the point display style. For now, we show all points in yellow, and with the same size. But we could use size and color to display the property price and surface, for example.

## Bokeh HoverTool and ToolTips¶

We can choose and configure the tools that appear on the top right side of the plot. By default, we get the pan, wheel zoom, and reset tools. Let's add the hover tool:

In [10]:
def plot(lat, lng, zoom=10, map_type='roadmap'):
gmap_options = GMapOptions(lat=lat, lng=lng,
map_type=map_type, zoom=zoom)
# the tools are defined below:
p = gmap(api_key, gmap_options, title='Pays de Gex',
width=bokeh_width, height=bokeh_height,
tools=['hover', 'reset', 'wheel_zoom', 'pan'])
source = ColumnDataSource(df)
center = p.circle('lon', 'lat', size=4, alpha=0.5,
color='yellow', source=source)
show(p)
return p

p = plot(lat, lon, map_type='satellite', zoom=12)


You can now move your mouse to a point, and a tooltip will appear. But the information from the tooltip is still very limited. Let's improve this. For that, we stop using the default hover tool, and we define our own.

In [11]:
from bokeh.models import HoverTool

gmap_options = GMapOptions(lat=lat, lng=lng,
map_type=map_type, zoom=zoom)
# the tools are defined below:
hover = HoverTool(
tooltips = [
# @price refers to the price column
# in the ColumnDataSource.
('price', '@price euros'),
('building', '@area_build m2'),
('terrain', '@area_tot m2'),
]
)
# below we replaced 'hover' (the default hover tool),
# by our custom hover tool
p = gmap(api_key, gmap_options, title='Pays de Gex',
width=bokeh_width, height=bokeh_height,
tools=[hover, 'reset', 'wheel_zoom', 'pan'])
source = ColumnDataSource(df)
center = p.circle('lon', 'lat', size=4, alpha=0.5,
color='yellow', source=source)
show(p)
return p

p = plot(lat, lon, map_type='satellite', zoom=12)


And you can now interactively inspect any point with the hover tool.

## Variable marker size in bokeh¶

Marker size and marker color is a great way to immediately convey information about the dataset. We can decide to affect any information to these visual attributes.

For example, I'd like to find out what are the most expensive properties, and the ones that went away at a price that is way too high.

So I will relate the marker size to the price, and the color to the price per square meter.

First, we define a radius column in our dataframe, related to the price:

In [12]:
import numpy as np

Out[12]:
Unnamed: 0 price area_tot area_build lon lat radius
0 83897 741139.0 386.0 0.0 6.072922 46.319225 4.304472
1 83912 716500.0 2731.0 0.0 6.072922 46.319225 4.232316
2 83927 15000.0 727.0 0.0 6.072922 46.319225 0.612372
3 83957 741139.0 338.0 0.0 6.068902 46.323598 4.304472
4 83997 582000.0 4643.0 0.0 6.072211 46.316697 3.814446

Two things to note:

• I made the radius proportional to the square root of the price, so that the surface of each circle is proportional to the price (since the surface is equal to $\pi R^2$). We could have made a different choice.
• I divided by 200 to get a radius that is in the right ball park (see below).
In [13]:
def plot(lat, lng, zoom=10, map_type='roadmap'):
gmap_options = GMapOptions(lat=lat, lng=lng,
map_type=map_type, zoom=zoom)
hover = HoverTool(
tooltips = [
('price', '@price euros'),
('building', '@area_build m2'),
('terrain', '@area_tot m2'),
]
)
p = gmap(api_key, gmap_options, title='Pays de Gex',
width=bokeh_width, height=bokeh_height,
tools=[hover, 'reset', 'wheel_zoom', 'pan'])
source = ColumnDataSource(df)
# we use the radius column for the circle size:
alpha=0.5, color='yellow', source=source)
show(p)
return p

p = plot(lat, lon, map_type='satellite', zoom=11)


Now try to zoom in and out a bit. You'll see that the size of the circles does not change. So the circles will start to overlap if you zoom out too much. To cure this, we just need to make a very small change. Instead of setting the size of the circles, we will set their radius, which is expressed in the units of x and y (longitude and latitude).

In [14]:
# I need to change the radius coefficient
# so that the points are visible

gmap_options = GMapOptions(lat=lat, lng=lng,
map_type=map_type, zoom=zoom)
hover = HoverTool(
tooltips = [
('price', '@price euros'),
('building', '@area_build m2'),
('terrain', '@area_tot m2'),
]
)
p = gmap(api_key, gmap_options, title='Pays de Gex',
width=bokeh_width, height=bokeh_height,
tools=[hover, 'reset', 'wheel_zoom', 'pan'])
source = ColumnDataSource(df)