Identifying city influence spheres in Germany

Alexander Dunkel, Leibniz Institute of Ecological Urban and Regional Development,
Transformative Capacities & Research Data Centre (IÖR-FDZ)

No description has been provided for this image
There are 83 "Großstädte" (Big Cities, = cities with a population above 100,000) in Germany as of December 31, 2023. The thesis we are investigating here is whether these cities have an influence on user topic perception and, further, on Transformative Capacities (TC) in regions.

No description has been provided for this image
Fig. 1: Location of cities with more than 50,000 inhabitants in Germany, area proportional to the number of inhabitants (Wikipedia)

People who visit cities from rural areas will (sometimes) perceive the increased transformation they get in contact with when visiting cities. Vice versa, people who live in bigger cities who visit rural areas will (sometimes) interact with the local population. Both types may have measurable effects on TC in these areas. Our hypothesis is that areas that are visited by many people from different areas have a higher TC than areas where people mostly remain without contact.

The data approach we use is a mapping of users and frequentation patterns:

  1. Select all users who visited larger cities at least once, based on at least a single geosocial media post shared from in these cities.
    1. Optionally (to test): filter users based on some type of topic criteria (e.g. interested in sustainability, transformation etc.)
  2. Get all other posts from these users, including outside these cities (i.e. other places where these people travelled to and from where they communicated online)
  3. Return as List of Lat/Lng coordinates. Calculate Kernel Density Estimation (KDE). This is our influence sphere. We can freely define a cutoff value for the minimum KDE class.
  4. Calculate and overlay all KDEs for all big cities. Calculate the number of overlapping classes as a measure for city-influence-spheres

Preparations

•••
List of package versions used in this notebook
package python bokeh colorcet fiona geopandas geoviews holoviews hvplot ipywidgets mapclassify
version 3.12.5 3.4.2 3.1.0 1.10.0 1.0.1 1.12.0 1.19.1 0.10.0 8.1.5 2.8.0
package matplotlib matplotlib-venn numpy pandas python-dotenv shapely xarray
version 3.9.2 1.1.1 1.26.4 2.2.2 1.0.1 2.0.6 2024.7.0

Load dependencies:

In [5]:
import os
import csv
import sys
import colorcet
import psycopg2 # Postgres API
import geoviews as gv
import holoviews as hv
import mapclassify as mc
import geopandas as gp
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geoviews.feature as gf
from pathlib import Path
from collections import namedtuple
from pathlib import Path
from typing import List, Tuple, Dict, Optional
from pyproj import Transformer, CRS, Proj
from geoviews import opts
from shapely.geometry import shape, Point, Polygon
from shapely.ops import transform
from holoviews import dim
from cartopy import crs
from matplotlib import colors
from IPython.display import clear_output, display, HTML
from bokeh.models import HoverTool, FuncTickFormatter, FixedTicker
# optionally, enable shapely.speedups 
# which makes some of the spatial 
# queries running faster
import shapely.speedups as speedups
from shapely import geometry

Activate autoreload of changed python files:

In [6]:
%load_ext autoreload
%autoreload 2

Load helper module.

In [8]:
module_path = str(Path.cwd().parents[0] / "py")
if module_path not in sys.path:
    sys.path.append(module_path)
from modules.base import tools, preparations

Initialize Bokeh and shapely.speedups

In [9]:
preparations.init_imports()
No description has been provided for this image No description has been provided for this image

Parameters

Define initial parameters that affect processing

In [24]:
WORK_DIR = Path.cwd().parents[0] / "tmp"     # Working directory
EPSG_CODE = 25832                            # Target projection EPSG Code
CRS_PROJ = f"epsg:{EPSG_CODE}"               # Target projection
CRS_WGS = "epsg:4326"                        # Input projection (Web Mercator)
OUTPUT = Path.cwd().parents[0] / "out"       # Define path to output directory (figures etc.)
SHAPE_DIR = OUTPUT / "shapes"                # Shapefiles
In [25]:
WORK_DIR.mkdir(exist_ok=True)
In [26]:
tools.create_paths(OUTPUT)

City shapes

We will get the city shape areas from the BKG product "VG250-EW", see Verwaltungsgebiete.

In [49]:
zip_citybd = "https://daten.gdz.bkg.bund.de/produkte/vg/vg250-ew_ebenen_1231/aktuell/vg250-ew_12-31.utm32s.gpkg.ebenen.zip"
In [50]:
tools.get_zip_extract(
    uri_filename=zip_citybd, output_path=SHAPE_DIR)
Loaded 71.81 MB of 71.82 (100%)..
Extracting zip..
Retrieved vg250-ew_12-31.utm32s.gpkg.ebenen.zip, extracted size: 0.01 MB
In [51]:
vg_shapes = SHAPE_DIR / "vg250-ew_12-31.utm32s.gpkg.ebenen" / "vg250-ew_ebenen_1231" / "DE_VG250.gpkg"

Load from file and plot preview

In [45]:
shapes = gp.read_file(vg_shapes, layer="vg250_vwg")
In [59]:
ax = shapes.plot()
ax.set_axis_off()
Out[59]:
(248324.99351113, 953338.4367755986, 5192574.431712993, 6144768.422681536)
No description has been provided for this image

Select all larger Cities

In [57]:
tools.drop_cols_except(df=shapes, columns_keep=["geometry", "EWZ", "GEN"])
In [58]:
shapes.head()
Out[58]:
GEN EWZ geometry
0 Flensburg 92550 MULTIPOLYGON (((526513.753 6075133.412, 526547...
1 Kiel 247717 MULTIPOLYGON (((575841.569 6032148.032, 575869...
2 Lübeck 218095 MULTIPOLYGON (((623056.151 5983746.445, 623191...
3 Neumünster 79502 MULTIPOLYGON (((565015.652 6000637.513, 565128...
4 Brunsbüttel 12603 MULTIPOLYGON (((510789.928 5977425.102, 510965...
In [72]:
lcities_shapes = shapes[shapes["EWZ"]>100000]

Load Bundesländer shapes, for plotting references.

In [69]:
de_shapes = tools.get_shapes("de", shape_dir=SHAPE_DIR)
de_shapes = de_shapes.to_crs(f"EPSG:{EPSG_CODE}")
Already exists

Plot preview in two layers, select all larger cities.

In [102]:
plt_kwags = {
    "color":'none',
    "edgecolor":'black',
    "linewidth":0.2,
    "figsize":(2, 4),
}
ax = de_shapes.plot(**plt_kwags)
ax = lcities_shapes.plot(ax=ax, color="red")
ax.set_axis_off()
No description has been provided for this image
In [74]:
lcities_shapes.head()
Out[74]:
GEN EWZ geometry
1 Kiel 247717 MULTIPOLYGON (((575841.569 6032148.032, 575869...
2 Lübeck 218095 MULTIPOLYGON (((623056.151 5983746.445, 623191...
172 Hamburg 1892122 MULTIPOLYGON (((548446.373 5934566.984, 548491...
173 Braunschweig 251804 MULTIPOLYGON (((603674.529 5801815.287, 603852...
174 Salzgitter 104548 MULTIPOLYGON (((596376.613 5786390.436, 596380...

Dissolve shapes by GEN Column (City Reference)

In [78]:
lcities_shapes = lcities_shapes.dissolve(by='GEN')
In [79]:
len(lcities_shapes)
Out[79]:
83

Confirmed with 83 specified in Wikipedia article. We now have our base for measuring city sphere influence.

Intersect with Social Media Data

Test city: Berlin

Select a single city from our previous dataset: Berlin

In [100]:
berlin_shp = lcities_shapes[lcities_shapes.index=="Berlin"]
In [103]:
ax = berlin_shp.plot(**plt_kwags)
ax.set_axis_off()
No description has been provided for this image

Simplify shape with a 1000 m tolerance.

In [119]:
berlin_shp = berlin_shp.simplify(tolerance=1000)
In [120]:
ax = berlin_shp.plot(**plt_kwags)
ax.set_axis_off()
No description has been provided for this image

Get the Well-Known Text (WKT) from the shape, for using this in the PostGis query.

In [126]:
berlin_wkt = berlin_shp.iloc[0].wkt
In [127]:
berlin_wkt[:50]
Out[127]:
'POLYGON ((802831.6866067576 5845501.783151679, 805'

Get posts for Berlin

Connect to DB

In [134]:
from dotenv import load_dotenv
load_dotenv(
    Path.cwd().parents[0] / '.env', override=True)
Out[134]:
False
In [135]:
db_user = "postgres"
db_pass = os.getenv('POSTGRES_PASSWORD')
db_host = "hlldb"
db_port = "5432"
db_name = "hlldb"
In [136]:
db_connection_hll = psycopg2.connect(
        host=db_host,
        port=db_port,
        dbname=db_name,
        user=db_user,
        password=db_pass
)
db_conn_hll = tools.DbConn(db_connection_hll)
cur_hll = db_connection_hll.cursor()
cur_hll.execute("SELECT 1;")
db_conn = tools.DbConn(db_connection_hll)
print(cur_hll.statusmessage)
SELECT 1
In [137]:
MVIEWS_REF = "all_posts_de"

Check if foreign table has been imported already:

In [138]:
sql_query = f"""
SELECT EXISTS (
   SELECT FROM information_schema.tables 
   WHERE  table_schema = 'mviews'
   AND    table_name   = '{MVIEWS_REF}'
   );
"""
result = db_conn_hll.query(sql_query)
In [139]:
result["exists"][0]
Out[139]:
True

PostGIS SQL Query DB

Select all posts that have a post_latlng within berlin_wkt. Remove the LIMIT 10 after testing.

In [ ]:
%%time
cols = "origin_id, post_body, hashtags, emoji"
sql_query = f"""
SELECT {cols}
FROM mviews.{MVIEWS_REF} p1
WHERE
    ST_Intersects(p1.post_latlng,
        ST_GeographyFromText(
        'SRID=4326; 
        {berlin_wkt}
        '))
--LIMIT 10
"""
samples = db_conn_hll.query(sql_query)
In [143]:
samples.head()
Out[143]:
origin_id post_body hashtags emoji
0 3 #LitEvent in #Berlin: Bürger vs. Künstler: Tho... [Berlin, LitEvent] []
1 3 I'm at Giatarget in Köln, Nordrhein Westfalen ... [] []
2 3 #moniseum #iphoneonly #berlin #igersberlin #ip... [berlin, berliner, berlino, berlinstagram, ige... []
3 3 I'm at iHOT in Thayngen https://t.co/GU4lYodAcw [] []
4 3 I'm at Niedertor in Bad Neuenahr-Ahrweiler w/ ... [] []

Create notebook HTML

In [2]:
!jupyter nbconvert --to html_toc \
    --output-dir=../resources/html/ ./04_influence_sphere_cities.ipynb \
    --template=../nbconvert.tpl \
    --ExtractOutputPreprocessor.enabled=False >&- 2>&-
In [ ]:
 

IOER RDC Jupyter Base Template v0.10.0