Alexander Dunkel, Leibniz Institute of Ecological Urban and Regional Development,
Transformative Capacities & Research Data Centre (IÖR-FDZ)

Dunkel, A., Burghardt, D. (2024). Assessing perceived landscape change from opportunistic spatio-temporal occurrence data. Land 2024

In this notebook, reddit API is used to query posts and comments for selected subreddits (National Parks)

Prepare environment

List of package versions used in this notebook
package python praw python-dotenv
version 3.9.16 7.7.0 1.0.0

Load dependencies:

In [33]:
import os
from pathlib import Path
import pandas as pd
from typing import List, Tuple, Dict, Optional
from IPython.display import clear_output, display, HTML, Markdown

Activate autoreload of changed python files:

In [2]:
%load_ext autoreload
%autoreload 2


Define initial parameters that affect processing

In [5]:
WORK_DIR = Path.cwd().parents[0] / "tmp"     # Working directory
OUTPUT = Path.cwd().parents[0] / "out"       # Define path to output directory (figures etc.)
In [4]:

Environment setup

We use praw, the Python Reddit API Wrapper. Have a look at the Reddit API Rules. Reddit allows 60 requests per minute. Requests for multiple resources at a time are always better than requests for single-resources in a loop. There are further limits to the Reddit API introduced recently, which limits us to the top recent 1000 submissions in a subreddit.

We'll first prepare the environment using a --prefix in Carto-Lab Docker, for persistence.

In [5]:
if [ ! -d "$DIR" ]; then
  echo "Installing environment in ${DIR}..."
  conda create \
      --prefix "$DIR" \
      --channel conda-forge \
      python=3.9 pip praw ipykernel python-dotenv \
      --yes > /dev/null 2>&1
  echo "Environment already exists."
Environment already exists.

Install kernelspec to jupyter.

In [6]:
if [ ! -d "/root/.local/share/jupyter/kernels/praw_env" ]; then
    echo "Linking environment to jupyter"
    /envs/praw/bin/python -m ipykernel install --user --name=praw_env

Hit CTRL+F5 and select praw_env on the top-right corner of jupyter lab.

Reddit API

Check the Authenticating via OAuth praw docs:

Create/Update Refresh Token

Add this to your docker-compoye.yml:

version: '3.6'

Since we are running in Carto-Lab Docker, we want to connect the script in py/modules/ (source) from the outside to Docker internal localhost:8063.

If you're working with JupyterLab on a remote computer, you need to add an SSH tunnel, e.g. ssh user@ -L :8063: -p 22 -N -v

In [19]:
from dotenv import load_dotenv
    Path.cwd().parents[0] / '.env', override=True)
In [15]:
if [ -z "$REFRESH_TOKEN" ]; then
    /envs/praw/bin/python {Path.cwd().parents[0]}/py/modules/
Now open this url in your browser:
New refresh token written to .env file (REFRESH_TOKEN).


In [20]:
CLIENT_ID = os.getenv("CLIENT_ID")
USER_AGENT = os.getenv("USER_AGENT")
In [21]:
import praw
reddit = praw.Reddit(
In [22]:

Sample queries

In [24]:
sub_yosemite = reddit.subreddit("yosemite")
In [25]:
for submission in reddit.subreddit("test").hot(limit=10):
Format submission and get comments for a sample submission id:

In [62]:
    f'<div style="width:500px"> \n\n**Original submission**:\n> {list([2].selftext.replace(f"{os.linesep}{os.linesep}", f"{os.linesep}{os.linesep}>")} \n\n</div>'))
for ix, top_level_comment in enumerate(list([2].comments):
    display(Markdown(f'<div style="width:500px"> \n\n**Comment #{ix:02}**:\n>> {top_level_comment.body} \n\n</div>'))

Original submission: seems to be somewhat contradictory. It says "A waitlist is offered to non-hotel guests based upon availability". If it's offered to non-hotel guests then that would imply that, as a hotel guest, I am not able to get on that waitlist.

The website also says "A reservation at the hotel does not automatically include a dining reservation". Maybe hotel guests can make reservations and non-guests have to get on a waitlist to make reservations?

In any event, when I click on the link to make reservations it does not look like it's possible to actually do so. Like in there's a "Make a reservation" box in the top right but not for

So it's not clear to me how you're supposed to make reservations. Maybe their reservation system is down whilst the Ahwahnee Dining Room is rennovated?

Comment #00:

The best option is usually to call the dining room or inquire upon check in and they will book it for you. It sounds as if you are planning in advance - aside from the special dinner events you can almost always get a dinner or Sunday brunch reservation as a guest.

Comment #01:

I, as a non-hotel guest, made reservations via open table last year in the fall. My guess is the dining room is closed so they aren't taking reservations (but it's a guess).

Download all posts and comments from a single subreddit

First, get the number of maximum posts:

In [68]:
all_submissions = list(
In [70]:

There's an API query limit of 1000. If your subreddit has more than 1000 submissions, you need to find another way to retrieve the enteriety os posts/comments.

Have a look at the available attributes:

In [74]:
import pprint
See the different available submission attributes in the PRAW api docs.

We are going to write this to a json first.


  • permalink & name of submission are only captured with the url field if selfpost, so this will need to be queried, too
In [85]:
import json
list_of_items = []
submission_fields = (
    'id', 'created_utc', 'author_flair_text', 'author', 'is_original_content', 'is_self', 
    'link_flair_text', 'name', 'num_comments', 'permalink', 'media', 'over_18', 'score', 
    'selftext', 'title', 'total_awards_received', 'url', 'view_count')

Turn selected field to dictionary and attach values from yosemite values list. author field needs to be casted to str, in order to be json serializable.

In [94]:
for submission in all_submissions:
    to_dict = vars(submission)
    sub_dict = {field:str(to_dict[field]) if field == 'author' else to_dict[field] for field in submission_fields}
In [92]:
print(json.dumps(list_of_items[:3], indent=2))
    "id": "12ierfu",
    "created_utc": 1681206754.0,
    "author_flair_text": null,
    "author": "Street_Touch_8732",
    "is_original_content": false,
    "is_self": true,
    "link_flair_text": null,
    "name": "t3_12ierfu",
    "num_comments": 0,
    "permalink": "/r/Yosemite/comments/12ierfu/seasonal_yosemite_start_date_nps/",
    "media": null,
    "over_18": false,
    "score": 1,
    "selftext": "I was given a tentative start date of April 9; of course, with the heavy snowfall, I anticipated a delay. Does anyone have a start date or am I out of the loop?",
    "title": "Seasonal Yosemite start date (NPS)",
    "total_awards_received": 0,
    "url": "",
    "view_count": null
    "id": "12idfgt",
    "created_utc": 1681202417.0,
    "author_flair_text": null,
    "author": "Solid_Ad884",
    "is_original_content": false,
    "is_self": true,
    "link_flair_text": null,
    "name": "t3_12idfgt",
    "num_comments": 2,
    "permalink": "/r/Yosemite/comments/12idfgt/halfdome_permits/",
    "media": null,
    "over_18": false,
    "score": 1,
    "selftext": "Just got my halfdome permit for July 20th. I listed my group as 3 individuals including myself but recently found out two of my friends can\u2019t make it. Any suggestions for finding people in their early-mid 20\u2019s to hike halfdome?",
    "title": "Halfdome Permits",
    "total_awards_received": 0,
    "url": "",
    "view_count": null
    "id": "12id35a",
    "created_utc": 1681201270.0,
    "author_flair_text": null,
    "author": "dogemaster00",
    "is_original_content": false,
    "is_self": true,
    "link_flair_text": null,
    "name": "t3_12id35a",
    "num_comments": 3,
    "permalink": "/r/Yosemite/comments/12id35a/half_dome_permits/",
    "media": null,
    "over_18": false,
    "score": 4,
    "selftext": "Rejected :/\n\nPut down all Sundays in June/July. Curious how easy it is to win Saturday/Sunday permits in the daily lottery.",
    "title": "Half Dome Permits",
    "total_awards_received": 0,
    "url": "",
    "view_count": null

Write to file

In [95]:
with open(OUTPUT / 'yosemite_submissions.json', 'w') as f:
    json.dump(list_of_items, f)

Print the latest timestamp in dataset:

In [100]:
from datetime import datetime

This means that it is not possible to get all posts for this subreddit using the Reddit API, since we are limited by the newest 1000 posts. An alternative way would be to use the

We continue in the following notebook pmaw.html.

Create notebook HTML

In [1]:
!jupyter nbconvert --to html_toc \
    --output-dir=../resources/html/ ./02_reddit_api.ipynb \
    --template=../nbconvert.tpl \
    --ExtractOutputPreprocessor.enabled=False >&- 2>&-
