100 Million places visualized with geopandas, datashader and geoparquet ¶
Alexander Dunkel*, Dominik Weckmüller**
*Leibniz Institute of Ecological Urban and Regional Development, Transformative Capacities & Research Data Centre (IÖR-FDZ); **TU Dresden, Institute of Cartography
TL;DR Visualization of 100 Million places posts from Foursquare with geoparquet and datashader.
Sources:
- do-me/foursquare_places_100M
- Foursquare Open Source Places: A new foundational dataset for the geospatial community
Some of the code in this notebook is based on two previous explorations:
- Twitter 395 Million reactions visualized with datashader and parquet
- 65 Million reactions in Germany visualized with datashader and parquet
Preparations¶
Use the lbsn Jupyterlab Docker container as a starting point.
Also, you need to install either pyarrow
or fastparquet
(recommended) from conda-forge
import numpy as np
import pandas as pd
import geopandas as gp
import dask.diagnostics as diag
import datashader.transfer_functions as tf
import datashader as ds
from datashader.utils import lnglat_to_meters
from IPython.display import clear_output
from pathlib import Path
Preparations
OUTPUT = Path.cwd().parents[0] / "out"
OUTPUT.mkdir(exist_ok=True)
INPUT = Path.cwd().parents[0] / "00_data"
parquet_input = INPUT / "foursquare_places.parquet"
datasize = parquet_input.stat().st_size/(1024*1024*1024)
print(
f"Size: {datasize:,.1f} GB")
Read the parquet file, but limit reading to column geometry
. This will read the full dataset, no chunking or streamed processing. Keep an eye on your memory.
df = gp.read_parquet(
parquet_input, columns=["geometry"])
Project to web mercator
gdf = df.to_crs(3857)
gdf.head()
Visualize with datashader¶
Define areas
def bounds(x_range, y_range):
x,y = lnglat_to_meters(x_range, y_range)
return dict(x_range=x, y_range=y)
Earth = ((-180.00, 180.00), (-59.00, 74.00))
France = ((-12.00, 16.00), (41.26, 51.27))
Berlin = ((12.843018, 14.149704), (52.274880, 52.684292))
USA = ((-126, -64), (24.92, 49.35))
Paris = ((2.05, 2.65), (48.76, 48.97))
Dresden = ((13.415680, 14.703827), (50.740090, 51.194905))
DE = ((4.605469, 15.372070), (46.697243, 55.065885))
plot_width = 1000
plot_height = int(plot_width*0.5)
canvas = ds.Canvas(plot_width=plot_width, plot_height=plot_height, **bounds(*Earth))
with diag.ProgressBar(), diag.Profiler() as prof, diag.ResourceProfiler(0.5) as rprof:
agg = canvas.points(gdf, geometry="geometry")
tf.shade(agg.where(agg > 1), cmap=["lightblue", "darkblue"])
def plot(x_range, y_range, plot_width: int = None):
"""Plot df using tf-shade()
Calculates aspect ratio based on
web mercator distance ratio lat/lng
"""
if plot_width is None:
plot_width = 1000
lng_width = x_range[1]-x_range[0]
lat_height = y_range[1]-y_range[0]
plot_height = int(((plot_width * lat_height) / lng_width)*1.5)
cvs = ds.Canvas(plot_width=plot_width, plot_height=plot_height, **bounds(x_range, y_range))
with diag.ProgressBar(), diag.Profiler() as prof, diag.ResourceProfiler(0.5) as rprof:
agg = cvs.points(gdf, geometry="geometry")
return tf.shade(agg, cmap=["lightblue", "darkblue"])
def save_image(img, output_name, return_img: bool = True, ):
"""Saves image as PNG"""
ds.utils.export_image(
img=img, filename= str(OUTPUT / output_name), fmt=".png", background='white')
if return_img:
return img
%time save_image(plot(*France), output_name='France_map')
%time save_image(plot(*Paris), output_name='Paris_map')
%time save_image(plot(*Berlin), output_name='Berlin_map')
%time save_image(plot(*USA), output_name='USA_map')
%time save_image(plot(*DE), output_name='DE_map')
%time save_image(plot(*Dresden), output_name='Dresden_map')
Create Notebook HTML¶
!jupyter nbconvert --to html_toc \
--output-dir=../resources/html/ ./2024-11-25_foursquare_parquet.ipynb \
--template=../nbconvert.tpl \
--ExtractOutputPreprocessor.enabled=False >&- 2>&- # create single output file