Marketplace backend-hang-debug
Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.
install
source · Clone the upstream repo
git clone https://github.com/aiskillstore/marketplace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/benderfendor/backend-hang-debug" ~/.claude/skills/aiskillstore-marketplace-backend-hang-debug && rm -rf "$T"
manifest:
skills/benderfendor/backend-hang-debug/SKILL.mdsource content
Backend Hang Debug
Purpose
- Detect and resolve event-loop hangs where the FastAPI app stops responding (e.g.,
times out) due to synchronous executor shutdown in the SSE news stream.curl http://localhost:8000/ - Provide a repeatable triage flow using
to capture live stacks and pinpoint blocking code.py-spy
Scope
- Backend:
(news stream),backend/app/api/routes/stream.py
(RSS workers), startup processes.backend/app/services/rss_ingestion.py - Tooling:
for live stack dumps;py-spy
with timeouts for smoke tests.curl
Quick Triage
- Reproduce hang:
andcurl -m 5 http://localhost:8000/
; note timeouts.curl -m 5 http://localhost:8000/health - Process check:
to confirm listener;ss -tlnp | grep 8000
to rule out FD leak.ls /proc/$(pgrep -f "uvicorn app.main")/fd | wc -l - Stack capture (inside backend venv):
thenuv pip install py-spy
(and worker pid if multiprocess). Look forsudo /home/bender/classwork/Thesis/backend/.venv/bin/py-spy dump --pid $(pgrep -f "uvicorn app.main")
inThreadPoolExecutor.shutdown
frames.api/routes/stream.py
Fix Pattern (non-blocking executor)
- Replace synchronous context manager
insidewith ThreadPoolExecutor(...):
with a long-lived executor plus explicit non-blocking shutdown:event_generator- Create executor outside the context manager.
- On client disconnect, cancel pending futures instead of awaiting shutdown.
- In
, callfinally
.executor.shutdown(wait=False, cancel_futures=True)
- Rationale: context manager calls
, blocking the event loop if RSS worker threads hang on network I/O.shutdown(wait=True)
Implementation Steps
- Update stream executor usage in
:backend/app/api/routes/stream.py- Instantiate
.executor = concurrent.futures.ThreadPoolExecutor(max_workers=5) - Dispatch work via
.loop.run_in_executor(executor, _process_source_with_debug, ...) - On disconnect,
pending futures.cancel() - In
,finally
.executor.shutdown(wait=False, cancel_futures=True)
- Instantiate
- Keep RSS executor as-is (
) since it runs in background threads, but ensure request timeouts remain reasonable (currently 60s per RSSrss_ingestion.py
).requests.get - Retest:
- Restart uvicorn;
should respond.curl -m 5 http://localhost:8000/health - Start a stream request and abort the client; server must stay responsive.
- Re-run
to verify nopy-spy dump
frames in main thread.ThreadPoolExecutor.shutdown(wait=True)
- Restart uvicorn;
Verification Checklist
-
returns a response (no hang).curl -m 5 http://localhost:8000/ -
succeeds.curl -m 5 http://localhost:8000/health - Aborting
does not freeze subsequent requests./news/stream -
shows event loop not blocked onpy-spy dump
.ThreadPoolExecutor.shutdown - Frontend no longer stalls waiting on root/health while backend is busy with streams.
Notes & Future Hardening
- Consider adding request timeout middleware to fail fast on slow handlers.
- Add per-source network timeouts and shorter retries for RSS feeds to reduce long-lived threads.
- If multi-worker uvicorn is used, run
on each worker pid when diagnosing hangs.py-spy