Snippet: Query Ollama Deepseek through Python in JupyterLab ¶
Alexander Dunkel, Leibniz Institute of Ecological Urban and Regional Development, Transformative Capacities & Research Data Centre (IÖR-FDZ)
Prerequisites:
ollama serve
must listen on localhost on port11434
- jupyter docker must be in networking: host mode
See my related blog post for the setup behind this.
1 Test connect¶
import requests
url = "http://localhost:11434/api/generate"
data = {
"model": "deepseek-r1:32b",
"prompt": "What is the meaning of life?",
"stream": False
}
response = requests.post(url, json=data)
print(response.json())
We can get the answer directly, formatted as Markdown:
from IPython.display import display, Markdown
display(Markdown(response.json().get('response')))
2 Load Geosocial Media (DE) data¶
import pandas as pd
This CSV file contains all Geosocial Media Posts in Germany, sorted by (psyeudonymized) User-ID, starting with the user with the most posts.
from pathlib import Path
file_path = Path.cwd() / "00_data" / "2025-03-24_DE_All.csv"
size_gb = file_path.stat().st_size / (1024 ** 3)
print(f"File size: {size_gb:.2f} GB")
The data includes data from Twitter, Flickr, Instagram and iNaturalist.
Query the first entries:
# Read the file in chunks
chunk_size = 100
chunk_iterator = pd.read_csv(file_path, chunksize=chunk_size)
# Get the first 100 rows
df_first_100 = next(chunk_iterator)
# Display the first few rows
print(df_first_100.head())
%%time
def process_user_guid(df):
"""Process all rows of a single user_guid."""
user_guid = df["user_guid"].iloc[0]
print(f"Processing user_guid: {user_guid}, Rows: {len(df)}")
processed_rows_count = 11505815 # Track the number of rows processed so far
chunk_size = 100000 # Process larger chunks to reduce overhead
processed_users = 0
max_users = 1 # Process first 3 unique user GUIDs
buffer = []
current_user_guid = None
# Read only the 'post_body' and 'user_guid' columns, ignore the rest
columns_to_read = ['user_guid', 'post_body']
header = pd.read_csv(file_path, nrows=1) # This reads just the first row
for chunk in pd.read_csv(
file_path, chunksize=chunk_size, usecols=columns_to_read, header=None,
names=header.columns, low_memory=False, skiprows=processed_rows_count):
# Skip rows already processed (if any)
chunk['is_valid'] = chunk['post_body'].notna() # Mark rows as valid (True) or invalid (False)
for _, row in chunk.iterrows():
processed_rows_count += 1
user_guid = row["user_guid"]
# Skip rows where 'post_body' is NaN
if not row['is_valid']:
continue # Skip this row
if current_user_guid is None:
# First row case
current_user_guid = user_guid
if user_guid == current_user_guid:
# Accumulate rows for the same user_guid
if row['post_body']:
# if not n/a
buffer.append(row) # Collect rows in the buffer (list)
else:
# Process the previous user_guid
process_user_guid(pd.DataFrame(buffer))
processed_users += 1
if processed_users >= max_users:
print(f"Processed {processed_rows_count} rows so far.")
# Stop once we've processed `max_users` user GUIDs
break
# Start a new buffer for the next user_guid
buffer = [row] # Start with the new row
current_user_guid = user_guid
if processed_users >= max_users:
break # Stop processing further chunks
# Process any remaining user_guid in the buffer (if within max_users)
if buffer and processed_users < max_users:
process_user_guid(pd.DataFrame(buffer)) # Process remaining buffered rows
The first user with the highest activity on Social Media in Germany has 584688
posts. The second highest active user has 305
. And so on. Note: this excludes empty post_body, so the second user mayhave many more posts, just without text (maybe only emoji, or only pictures etc.).
Let's look at some posts from user 4.
for ix in range(20):
print(buffer[ix].post_body)
3 Classify using prompt¶
Let's write a prompt to classify some of these posts. We first need to group several posts, so there is a better base for evaluation.
social_media_posts = "__next_post__: ".join([row["post_body"] for row in buffer[:20]])
social_media_posts
llm_prompt = f"""
"You are an expert in urban planning and sustainability. Your task is to evaluate geosocial media posts and classify them based on the degree to which they demonstrate "Urban Transformative Capacity" (UTC). UTC refers to the collective ability of stakeholders in urban development to conceive of, prepare for, initiate, and perform path-deviant change towards sustainability within and across multiple complex systems that constitute cities.
**Here's a breakdown of the key components of UTC, which you should use to inform your classification:**
* **C1: Inclusive and Multiform Urban Governance:** Does the post suggest broad participation, diverse governance modes, and effective intermediaries?
* **C2: Transformative Leadership:** Does the post indicate visionary, inclusive, and collaborative leadership that drives change?
* **C3: Empowered and Autonomous Communities of Practice:** Does the post highlight the empowerment of communities to address social needs and exercise autonomy?
* **C4: System(s) Awareness and Memory:** Does the post show understanding of urban systems, path dependencies, and the need for collective learning?
* **C5: Urban Sustainability Foresight:** Does the post suggest the creation of collective visions and alternative scenarios for a sustainable urban future?
* **C6: Diverse Community-Based Experimentation:** Does the post showcase innovative and disruptive solutions developed at the community level?
* **C7: Innovation Embedding and Coupling:** Does the post indicate efforts to integrate new ideas into routines, organizations, plans, and legal frameworks?
* **C8: Reflexivity and Social Learning:** Does the post suggest ongoing monitoring, assessment, and evaluation of transformative initiatives?
* **C9: Working Across Agency Levels:** Does the post highlight actions that involve individuals, households, groups, organizations, networks, and society at large?
* **C10: Working Across Political-Administrative Levels and Geographical Scales:** Does the post show consideration of interactions between local, regional, national, and global levels?
**Classification Instructions:**
Based on the content of the geosocial media post, assign a numerical score between 0 and 1, where:
* **0:** The post shows no evidence of UTC or relates to the issues mentioned above.
* **0.2:** The post touches upon one or two isolated elements related to any of the TC components.
* **0.4:** The post refers to a few of the TC components and reflects a certain awareness of them.
* **0.6:** The post refers to a good understanding of at least one TC component.
* **0.8:** The post exhibits a deeper understanding and reflects a relationship between at least a few TC components.
* **1:** The post shows a comprehensive understanding and reflects relationships across many of the TC components and the holistic concept of TC.
**Output:**
Your response should be a single number between 0 and 1, representing the degree of Urban Transformative Capacity demonstrated in the post. Do not include any text or explanation beyond the numerical score.
**Examples:**
* **Post:** "Another new car park is planned in the City. More cars, more traffic."
**Classification:** 0
* **Post:** "Attending a town hall meeting about improving our city's recycling program."
**Classification:** 0.2
* **Post:** "Great turnout at the community garden workshop! So many people eager to learn how to grow their own food."
**Classification:** 0.4
* **Post:** "Excited to be part of the city council's new initiative to promote energy efficiency in homes and businesses."
**Classification:** 0.6
* **Post:** "Our neighborhood association is working with local businesses to create a green business certification program. We're building a more sustainable community, one business at a time!"
**Classification:** 0.8
* **Post:** "Delighted to see the city adopting a multi-stakeholder approach, fostering a common vision, embedding innovative concepts to address our sustainability concerns by providing reflexive governance to the city's system!"
**Classification:** 1
Now, classify the following geosocial media post:{social_media_posts}
"""
data = {
"model": "deepseek-r1:32b",
"prompt": llm_prompt,
"stream": False
}
response = requests.post(url, json=data)
print(response.json())
4 Convert notebook to HTML¶
!jupyter nbconvert --to html_toc \
--output-dir=../resources/html/ ./2025-03-25_Deepseek.ipynb \
--template=../nbconvert.tpl \
--ExtractOutputPreprocessor.enabled=False >&- 2>&- # create single output file