Marketplace backend-hang-debug

Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.

install

source · Clone the upstream repo

git clone https://github.com/aiskillstore/marketplace

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/benderfendor/backend-hang-debug" ~/.claude/skills/aiskillstore-marketplace-backend-hang-debug && rm -rf "$T"

manifest: skills/benderfendor/backend-hang-debug/SKILL.md

Backend Hang Debug

Purpose

Detect and resolve event-loop hangs where the FastAPI app stops responding (e.g.,
```
curl http://localhost:8000/
```
times out) due to synchronous executor shutdown in the SSE news stream.
Provide a repeatable triage flow using
```
py-spy
```
to capture live stacks and pinpoint blocking code.

Scope

Backend:

backend/app/api/routes/stream.py

(news stream),

backend/app/services/rss_ingestion.py

(RSS workers), startup processes.

Tooling:
```
py-spy
```
for live stack dumps;
```
curl
```
with timeouts for smoke tests.

Quick Triage

Reproduce hang:

curl -m 5 http://localhost:8000/

and

curl -m 5 http://localhost:8000/health

; note timeouts.

Process check:

ss -tlnp | grep 8000

to confirm listener;

ls /proc/$(pgrep -f "uvicorn app.main")/fd | wc -l

to rule out FD leak.

Stack capture (inside backend venv):

uv pip install py-spy

then

sudo /home/bender/classwork/Thesis/backend/.venv/bin/py-spy dump --pid $(pgrep -f "uvicorn app.main")

(and worker pid if multiprocess). Look for

ThreadPoolExecutor.shutdown

api/routes/stream.py

frames.

Fix Pattern (non-blocking executor)

Replace synchronous context manager
```
with ThreadPoolExecutor(...):
```
inside
```
event_generator
```
with a long-lived executor plus explicit non-blocking shutdown:
- Create executor outside the context manager.
- On client disconnect, cancel pending futures instead of awaiting shutdown.
- In
```
finally
```
  , call
```
executor.shutdown(wait=False, cancel_futures=True)
```
  .
Rationale: context manager calls
```
shutdown(wait=True)
```
, blocking the event loop if RSS worker threads hang on network I/O.

Implementation Steps

Update stream executor usage in

backend/app/api/routes/stream.py

Instantiate

executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)

Dispatch work via

loop.run_in_executor(executor, _process_source_with_debug, ...)

On disconnect,
```
cancel()
```
pending futures.

finally

executor.shutdown(wait=False, cancel_futures=True)

Keep RSS executor as-is (
```
rss_ingestion.py
```
) since it runs in background threads, but ensure request timeouts remain reasonable (currently 60s per RSS
```
requests.get
```
).
Retest:
- Restart uvicorn;
```
curl -m 5 http://localhost:8000/health
```
  should respond.
- Start a stream request and abort the client; server must stay responsive.
- Re-run
```
py-spy dump
```
  to verify no
```
ThreadPoolExecutor.shutdown(wait=True)
```
  frames in main thread.

Verification Checklist

```
curl -m 5 http://localhost:8000/
```
returns a response (no hang).
```
curl -m 5 http://localhost:8000/health
```
succeeds.
Aborting
```
/news/stream
```
does not freeze subsequent requests.
```
py-spy dump
```
shows event loop not blocked on
```
ThreadPoolExecutor.shutdown
```
.
Frontend no longer stalls waiting on root/health while backend is busy with streams.

Notes & Future Hardening

Consider adding request timeout middleware to fail fast on slow handlers.
Add per-source network timeouts and shorter retries for RSS feeds to reduce long-lived threads.
If multi-worker uvicorn is used, run
```
py-spy
```
on each worker pid when diagnosing hangs.