Archon validate-ui
git clone https://github.com/coleam00/Archon
T=$(mktemp -d) && git clone --depth=1 https://github.com/coleam00/Archon "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/validate-ui" ~/.claude/skills/coleam00-archon-validate-ui && rm -rf "$T"
.claude/skills/validate-ui/SKILL.mdArchon Web UI — Comprehensive E2E Validation
Run exhaustive end-to-end browser automation tests and codebase review of the Archon Web UI. The goal: determine whether Archon is doing the best it possibly can to solve the problem of managing parallel agents, executing custom workflows, and providing full visibility into agent work.
Optional focus argument:
$ARGUMENTS (e.g., "workflows", "chat", "projects"). If empty, run ALL sections.
Phase 0: Environment Setup
0.1 Kill Old Archon Processes
# Kill any running Archon dev servers (backend + frontend) pkill -f "bun.*dev:server" 2>/dev/null || true pkill -f "bun.*dev:web" 2>/dev/null || true pkill -f "bun.*packages/server" 2>/dev/null || true pkill -f "bun.*packages/web" 2>/dev/null || true pkill -f "vite.*5173" 2>/dev/null || true # Kill any leftover processes on our ports lsof -ti:3090 | xargs kill -9 2>/dev/null || true lsof -ti:5173 | xargs kill -9 2>/dev/null || true # Wait for ports to free up sleep 2 # Verify ports are free ! lsof -i:3090 && ! lsof -i:5173 && echo "Ports 3090 and 5173 are free" || echo "WARNING: Ports still in use"
0.2 Install agent-browser (if needed)
# Check if agent-browser is available which agent-browser 2>/dev/null || npx agent-browser --version 2>/dev/null # If not installed globally, install it: # npm install -g agent-browser && agent-browser install # On WSL2/Linux, use --with-deps to get Chromium system dependencies: # agent-browser install --with-deps # IMPORTANT: Do NOT use bunx — Bun skips postinstall scripts that agent-browser needs. # Use npx or global npm install.
0.3 Start Archon Backend + Frontend
Start both services. Backend must be up before frontend SSE connections work.
# From the repo root: /path/to/archon # Start backend (port 3090) cd /path/to/archon && bun run dev:server & sleep 5 # Wait for server initialization + DB # Verify backend is healthy curl -s http://localhost:3090/api/health | head -c 200 # Start frontend (port 5173) cd /path/to/archon && bun run dev:web & sleep 5 # Wait for Vite dev server # Verify frontend is serving curl -s http://localhost:5173 | head -c 200
URLs:
- Frontend:
http://localhost:5173 - Backend API:
http://localhost:3090/api - SSE streams:
(bypasses Vite proxy in dev)http://localhost:3090/api/stream/{conversationId}
0.4 Seed Test Data (if needed)
Check if there are existing codebases and conversations. If empty, create test data:
# Check existing codebases curl -s http://localhost:3090/api/codebases | python3 -m json.tool 2>/dev/null || curl -s http://localhost:3090/api/codebases # Register the current repo as a codebase (if none exist) curl -s -X POST http://localhost:3090/api/codebases \ -H "Content-Type: application/json" \ -d '{"path": "/path/to/archon"}' # Create a test conversation curl -s -X POST http://localhost:3090/api/conversations \ -H "Content-Type: application/json" \ -d '{}' | python3 -m json.tool 2>/dev/null
Phase 1: Browser Automation — End-to-End Testing
Use the
agent-browser CLI for all browser interactions. Follow the snapshot-refs workflow:
— navigateagent-browser open <url>
— get interactive elements with refsagent-browser snapshot -i- Interact using refs (click, fill, etc.)
- Re-snapshot after navigation or DOM changes
Take screenshots at each major test point:
agent-browser screenshot /tmp/archon-test-{name}.png
Test Suite 1: Dashboard (Route: /
)
/1.1 Initial Load
- Open
http://localhost:5173 - Verify dashboard renders: stats cards (Running Workflows, Conversations, System Status)
- Check system health indicator shows "Healthy" (green)
- Screenshot the full dashboard
1.2 Stats Accuracy
- Compare "Running Workflows" count against
GET /api/workflows/runs?status=running - Compare "Conversations" count against
GET /api/conversations - Verify numbers update after creating new data
1.3 Recent Items
- Verify "Recent Conversations" list shows up to 10 items
- Verify "Recent Workflow Runs" list shows up to 10 items
- Click a conversation — verify navigation to
/chat/{id} - Click a workflow run — verify navigation to
/workflows/runs/{id} - Use browser back button — verify return to dashboard
1.4 Empty State
- If no conversations/runs exist: verify the empty state with "New Chat" CTA renders
- Click "New Chat" from empty state — verify navigation to
/chat
Test Suite 2: Project Management
2.1 Add Project (GitHub URL)
- Click the
button next to "Projects" in the sidebar+ - Fill in a GitHub URL (e.g.,
)https://github.com/anthropics/claude-code - Submit and verify the project appears in the sidebar
- Verify the project is auto-selected
2.2 Add Project (Local Path)
- Click
again+ - Fill in a local path (e.g.,
)/path/to/archon - Submit and verify the project appears
- Verify deduplication: if the path was already registered, it should not create a duplicate
2.3 Select/Deselect Project
- Click a project in the sidebar — verify it becomes selected (highlighted)
- Verify the sidebar content switches to
view (shows project name, repo URL, conversations scoped to project, workflow runs)ProjectDetail - Click "All Projects" — verify sidebar switches to
(all conversations, no project filter)AllConversationsView - Verify
persists selection across page refreshlocalStorage
2.4 Delete Project
- Hover over a project — verify the trash icon appears
- Click trash — verify confirmation dialog appears
- Confirm deletion — verify project is removed from list
- Verify conversations and runs associated with the project are handled gracefully
2.5 Project Selector in Collapsible
- When a project is selected, verify the collapsible header shows the project name
- Click the chevron to expand — verify other projects are listed
- Switch projects via the collapsible — verify the view updates
Test Suite 3: Chat Interface
3.1 New Chat (No Project)
- Click "New Chat" in sidebar (with no project selected)
- Verify empty chat interface renders with message input
- Type a message and send
- Verify: user message appears right-aligned, assistant "thinking" dots appear
- Verify: conversation is created and URL updates to
/chat/{conversationId} - Verify: conversation appears in sidebar
3.2 New Chat (With Project)
- Select a project first
- Click "New Chat"
- Send a message (e.g.,
)/status - Verify: conversation is scoped to the selected project
- Verify: project context (cwd, codebase) is attached
3.3 Slash Commands
- Send
— verify response shows session status/status - Send
— verify help text renders in markdown/help - Send
— verify command list renders/commands - Send
— verify working directory is shown/getcwd - Verify: commands execute instantly (no "thinking" animation needed)
3.4 Message Rendering
- Send a message that triggers a markdown response from the AI
- Verify: code blocks render with syntax highlighting
- Verify: tables render properly in assistant messages
- Verify: links open in new tabs (
)target="_blank" - Verify: blockquotes render with left border
- Verify: inline code renders with monospace font
- Send a very long message — verify no layout overflow
3.5 Streaming & Real-time Updates
- Send a message that triggers an AI response
- Verify: blinking cursor appears during streaming
- Verify: text appears incrementally (not all at once)
- Verify: lock indicator shows "Agent is working..."
- Verify: lock indicator hides when response completes
- Verify: message
flag clears on completionisStreaming
3.6 Tool Call Cards
- Send a message that triggers tool usage (e.g., a code question in a project context)
- Verify: tool call cards appear below the assistant message
- Verify: card shows tool name and input summary
- Click to expand a tool card — verify full input JSON and output render
- Verify: running tools show spinner animation and primary border
- Verify: completed tools show duration badge
- Test "Show N more lines" for long tool outputs
3.7 Error Handling
- Trigger an error condition (e.g., send a message with no AI credentials configured)
- Verify: error card renders with AlertCircle icon
- Verify: error classification badge shows (transient/fatal)
- Verify: suggested actions are listed
- Verify: the chat remains functional after an error
3.8 Queue Position
- If possible, trigger multiple concurrent messages to the same conversation
- Verify: queue position indicator appears ("Position N in queue")
- Verify: the lock indicator updates when the queue advances
3.9 Auto-scroll Behavior
- Scroll up during a streaming response
- Verify: auto-scroll stops (respects user scroll position)
- Verify: "Jump to bottom" button appears
- Click "Jump to bottom" — verify scroll snaps to latest message
- Scroll back to bottom manually — verify auto-scroll resumes
3.10 Conversation Navigation
- Create multiple conversations
- Click between them in the sidebar
- Verify: each conversation loads its own message history
- Verify: messages are not leaked between conversations
- Verify: the correct conversation is highlighted in the sidebar
Test Suite 4: Conversation Management
4.1 Rename Conversation
- Hover over a conversation in the sidebar — verify pencil icon appears
- Click pencil — verify inline edit input appears
- Type a new title and press Enter
- Verify: title updates in sidebar and in the chat header
- Press Escape during rename — verify it cancels without saving
4.2 Delete Conversation
- Hover over a conversation — verify trash icon appears
- Click trash — verify confirmation dialog appears
- Confirm deletion — verify conversation is removed
- If the deleted conversation was active: verify redirect to
/ - Verify: soft-delete (conversation is hidden, not destroyed)
4.3 Auto-title
- Create a new conversation and send a non-command message
- Wait 2-3 seconds
- Verify: the conversation title updates automatically based on the first message
- Verify: title is truncated to ~80 characters
4.4 Search
- Type in the sidebar search bar
- Verify: conversations are filtered by title match
- Clear search — verify all conversations reappear
- Press
key — verify search input focuses/ - Press Escape — verify search clears
Test Suite 5: Workflow Management
5.1 Workflow List Page (
)/workflows
- Navigate to
/workflows - Verify: "Available Workflows" tab shows all discovered workflows
- Verify: each workflow card shows name and description
- Verify: "Recent Runs" tab shows recent workflow runs
- Verify: running workflows show a pulsing dot on the "Recent Runs" tab label
5.2 Invoke Workflow from Workflows Page
- Click on a workflow card (e.g.,
)archon-assist - Verify: inline run panel expands with project selector and message input
- Select a project from the dropdown
- Type a message and click "Run"
- Verify: conversation is created and navigation goes to
/chat/{conversationId} - Verify: workflow execution begins (messages appear from the AI)
5.3 Invoke Workflow from Sidebar (WorkflowInvoker)
- Select a project in the sidebar
- Verify: workflow dropdown appears in
viewProjectDetail - Select a workflow from the dropdown
- Type a message and submit
- Verify: new conversation created, navigation to chat, workflow runs
5.4 Workflow Router (Agent Orchestrator)
- In a project chat, send a natural language message (e.g., "Help me understand the authentication flow")
- Verify: the router detects the intent and routes to the appropriate workflow
- Verify: workflow dispatch status message appears (e.g., "Dispatching workflow: archon-assist (background)")
- Verify:
badge appears with spinnerWorkflowDispatchInline - Verify: clicking the dispatch badge navigates to the workflow run or worker conversation
5.5 Workflow Progress in Chat
- While a workflow is running, verify
appears in the chatWorkflowProgressCard - Verify: compact mode shows workflow name, step count, elapsed time
- Verify: elapsed timer updates every second
- Click "Open Full View" — verify navigation to
/workflows/runs/{runId} - Verify: returning to chat still shows the progress card
5.6 Workflow Execution Page (
)/workflows/runs/:runId
- Navigate to an active or completed workflow run
- Verify: header shows workflow name, status, and elapsed time
- Verify: step progress panel (left side) shows all steps with status icons
- Click different steps — verify the log panel (right side) updates
- Verify: "Chat" link back to parent conversation works
- For dispatched workflows: verify
renders the worker conversation messagesWorkflowLogs
5.7 Parallel Agent Steps
- Run a workflow with parallel agents (e.g.,
has 5 parallel agents)archon-comprehensive-pr-review - Verify:
renders showing parent step and nested agent listParallelBlockView - Verify: each agent shows its own status (pending/running/completed/failed)
- Verify: overall block status derives correctly (any failed = failed, any running = running, all complete = complete)
- Verify: progress counter shows
(completed/total agents)
5.8 Loop Iterations
- Run a loop workflow (e.g.,
orarchon-test-loop
)archon-ralph-fresh - Verify:
renders with iteration counterLoopIterationView - Verify: progress bar fills proportionally (current/max)
- Verify: each iteration shows status
- Verify: completion signal (
) ends the loop<promise>COMPLETE</promise>
5.9 Workflow Artifacts
- After a workflow completes that produces artifacts (PR URLs, commits, branches)
- Verify:
component renders at the bottomArtifactSummary - Verify: URLs are clickable links opening in new tabs
- Verify: artifact type icons are correct (PR, Commit, Branch, File)
5.10 Workflow Stale Detection
- During a running workflow, if the SSE connection drops briefly
- Verify:
indicator appears on the workflow cardstale - Verify: polling fallback kicks in (checks every 15 seconds)
- Verify: stale state clears when fresh data arrives
5.11 Cancel Workflow
- While a workflow is running, look for "Cancel" button
- If present: click and verify the workflow status changes to failed/cancelled
- If not present: note this as a UX gap
Test Suite 6: Project-Scoped Views
6.1 Project Detail — Conversations
- Select a project
- Verify: only conversations scoped to that project appear
- Create a new chat within the project
- Verify: the new conversation appears in the filtered list
- Verify: conversations from other projects are NOT shown
6.2 Project Detail — Workflow Runs
- Verify: workflow runs scoped to the selected project appear
- Verify: runs are sorted by priority: failed > running > completed
- Click a run — verify navigation to
/workflows/runs/{id} - Verify: conversation status dots show on conversations with active runs
6.3 Cross-Project Navigation
- Start a workflow in Project A
- Switch to Project B in the sidebar
- Verify: Project A's workflow is not visible in Project B's view
- Switch back to Project A — verify the workflow run is still visible
- Click "All Projects" — verify you can see conversations from all projects
Test Suite 7: SSE & Real-time Infrastructure
7.1 SSE Connection
- Open browser DevTools Network tab (via
or console)agent-browser eval - Verify: EventSource connection to
is established/api/stream/{conversationId} - Verify: heartbeat events arrive every ~30 seconds
- Verify: connection state is OPEN (readyState 1)
7.2 SSE Reconnection
- Kill the backend server temporarily
- Verify: the UI shows a disconnected state (grey dot in header)
- Restart the backend
- Verify: SSE reconnects automatically
- Verify: the connection indicator turns green again
- Verify: buffered messages are delivered on reconnect
7.3 Multiple Tabs
- Open the same conversation in two browser tabs (use
for parallel)agent-browser --session - Send a message from tab 1
- Verify: response streams in BOTH tabs (SSE fan-out via stream registry replacement)
- Note: the web adapter replaces old streams on new connections, so only the latest tab gets live SSE
Test Suite 8: UI/UX Quality Audit
8.1 Visual Hierarchy & Dark Theme
- Screenshot the full app at different states
- Verify: text hierarchy (primary/secondary/tertiary) is readable
- Verify: interactive elements have clear hover states
- Verify: accent colors (blue-purple) are used consistently
- Verify: success (green), warning (amber), error (red) colors are correct
- Verify: borders and dividers create clear visual separation
8.2 Loading States
- Observe loading states when:
- Dashboard is loading
- Conversation messages are loading
- Workflows list is loading
- Workflow runs are fetching
- Verify: all loading states show appropriate feedback (spinners, skeletons, or text)
- Verify: no blank/flash-of-unstyled-content moments
8.3 Empty States
- Check empty states for:
- No conversations (dashboard + sidebar)
- No projects registered
- No workflows available
- No workflow runs
- No messages in a conversation
- Verify: each empty state has a helpful message and CTA
8.4 Responsiveness
- Set viewport to different sizes:
agent-browser set viewport 1920 1080 # Desktop agent-browser set viewport 1366 768 # Laptop agent-browser set viewport 1024 768 # Tablet landscape agent-browser set viewport 768 1024 # Tablet portrait agent-browser set viewport 375 812 # Mobile - At each size: screenshot and check for layout breakage, overflow, truncation
8.5 Sidebar Resize
- Drag the sidebar resize handle
- Verify: sidebar width changes smoothly (240-400px range)
- Verify: width persists in localStorage across refresh
- Verify: content reflows properly at different sidebar widths
8.6 Keyboard Navigation
- Press
— verify search focuses/ - Press
— verify search clearsEscape - Press
in message input — verify sends messageEnter - Press
— verify inserts newline (does NOT send)Shift+Enter - Tab through interactive elements — verify focus order is logical
8.7 Copy/Clipboard
- Click the working directory path in the chat header
- Verify: path copies to clipboard
- Verify: visual feedback (tooltip or flash) indicates copy succeeded
8.8 External Links
- Click "Open in IDE" button (VSCode link)
- Verify:
URL is constructed correctlyvscode://file/... - Click links in assistant messages — verify they open in new tabs
Test Suite 9: Edge Cases & Stress Tests
9.1 Rapid Message Sending
- Send multiple messages in quick succession (before previous responses complete)
- Verify: messages are queued properly (no duplicate or lost messages)
- Verify: lock indicator shows queue position
- Verify: responses arrive in order
9.2 Long Content
- Send a message that produces very long output (e.g., "List all files in the project")
- Verify: markdown renders without layout overflow
- Verify: code blocks have horizontal scroll
- Verify:
truncation works (500 chars / 8 lines with "Show more")WorkflowResultCard - Verify: tool call output truncation works (20 lines shown, expandable)
9.3 Special Characters
- Send messages with special characters:
, markdown chars<script>alert('xss')</script>
, emoji*_[]() - Verify: no XSS vulnerability (HTML is escaped)
- Verify: markdown renders correctly
- Verify: emoji displays properly
9.4 Browser Refresh During Streaming
- While AI is streaming a response, refresh the page
- Verify: on reload, historical messages are loaded from the API
- Verify: any in-progress response is not lost (persisted segments appear)
- Verify: SSE reconnects and picks up new events
9.5 Concurrent Workflows
- Launch 2-3 workflows simultaneously (different projects or same project)
- Verify: each workflow tracks independently
- Verify: workflow progress cards in respective chats are correct
- Verify: no cross-contamination of events between workflows
9.6 Network Latency
- Add artificial network latency if possible
- Verify: UI remains responsive during slow responses
- Verify: loading indicators appear for slow API calls
- Verify: no timeout errors in normal usage
Phase 2: Codebase Review
Read the source code of every component and module listed below. For each, evaluate:
- Correctness: Are there logic bugs, race conditions, or broken state transitions?
- UX quality: Does the component provide good feedback, handle edge cases, feel polished?
- Performance: Are there unnecessary re-renders, missing memoization, or expensive operations?
- Accessibility: Are interactive elements properly labeled? Keyboard navigable?
- Error handling: Are errors caught, displayed, and recoverable?
Frontend Files to Review
| File | Focus Areas |
|---|---|
| Route config, error boundary, QueryClient settings |
| SSE handler correctness, message state management, new-chat flow, workflow dispatch handling |
| Auto-scroll, WorkflowDispatchInline polling, WorkflowResultCard truncation |
| Markdown rendering, streaming cursor, thinking dots |
| Expand/collapse, running state animation, output truncation |
| Timer accuracy, compact vs full mode, stale indicator |
| Show/hide transitions, queue position display |
| Enter vs Shift+Enter, auto-resize, disabled state |
| Resize drag, project add flow, search, new chat, localStorage persistence |
| Scoped queries, conversation status dots, workflow run sorting |
| Workflow fetch, create conversation + run flow, error handling |
| Search filtering, codebase map construction, "New Chat" |
| Delete confirmation, "All Projects" button |
| Rename inline edit, delete flow, active state highlighting |
| Two-tab layout, inline run panel, running indicator pulse |
| Initial data reconstruction from events, live SSE overlay, worker vs parent flows |
| Read-only chat view, SSE handlers, message filtering by timestamp |
| Step list rendering, parallel block delegation, active step highlight |
| Virtual scrolling, auto-scroll, metadata header |
| Artifact type icons, URL links, path display |
| Progress bar math, max iteration capping |
| Overall status derivation, nested agent list |
| Text batching (50ms flush), reconnection, handler ref stability |
| Workflow state map, polling fallback (15s), stale detection |
| Scroll threshold (50px), user scroll-up detection |
| SSE_BASE_URL calculation, error handling, 404 swallowing |
| SSEEvent union completeness, ChatMessage fields, WorkflowState shape |
| Cache key correctness, memory management |
| Stale project ID cleanup, codebase polling interval |
Backend Files to Review
| File | Focus Areas |
|---|---|
| Endpoint correctness, CORS, SSE heartbeat loop, workflow run endpoint, codebase deduplication |
| sendMessage category filtering, structured event handling, lock event flushing |
| Segment splitting logic, tool call duration tracking, flush timing, 50-segment cap |
| Stream replacement race condition fix, buffer limits (100 msg / 200 conv), zombie reaper |
| Event mapping completeness, bridge subscription lifecycle, parent forwarding |
| Router prompt construction, background dispatch fire-and-forget, isolation resolution |
| Stale workflow detection (15min), step session continuity, parallel Promise.all, loop completion signal |
| Case-insensitive matching, multiline regex, fallback behavior |
| Listener error isolation, max listener cap, run registration lifecycle |
Review Checklist
For every file reviewed, note findings in these categories:
- Bugs — Logic errors, race conditions, state inconsistencies, crashes
- UX Issues — Missing feedback, confusing interactions, unclear states, dead ends
- Performance — Unnecessary re-renders, missing React.memo/useMemo/useCallback, expensive computations in render
- Accessibility — Missing ARIA labels, focus management gaps, screen reader issues
- Error Handling — Unhandled promise rejections, missing try/catch, silent failures
- Code Quality — Dead code, TODOs, inconsistent patterns, missing types
Phase 3: Report
After completing all tests and reviews, produce a structured report:
Report Format
# Archon Web UI Validation Report **Date**: {date} **Tester**: Claude Code (agent-browser + codebase review) **Archon Version**: {git commit hash} **Screenshots**: /tmp/archon-test-*.png ## Executive Summary {2-3 sentences: overall quality assessment, critical issues count, UX rating} ## Critical Bugs (P0) {Bugs that break core functionality or lose data} ## Major Issues (P1) {Issues that significantly degrade the experience} ## Minor Issues (P2) {Polish items, edge cases, visual inconsistencies} ## UX Recommendations {Suggestions for improving the user experience — not just bugs but "could be better"} ## Accessibility Findings {Keyboard nav gaps, ARIA issues, contrast problems} ## Performance Observations {Slow renders, unnecessary work, optimization opportunities} ## Codebase Quality Notes {Dead code, inconsistencies, architectural concerns} ## What's Working Well {Positive findings — features that are solid, patterns that are good} ## Detailed Test Results ### Dashboard Tests | Test | Status | Notes | |------|--------|-------| | 1.1 Initial Load | PASS/FAIL | ... | ... ### Project Management Tests ... ### Chat Interface Tests ... ### Workflow Management Tests ...
Key Question to Answer
Is Archon currently doing the best it possibly can to solve the problem of managing a lot of agents in parallel and executing custom workflows with full visibility?
Specifically evaluate:
- Can users easily see what all their agents are doing at a glance?
- Is workflow status visible and understandable without clicking through multiple pages?
- Can users quickly navigate between the orchestrator chat, individual workflow runs, and task logs?
- Is the experience of kicking off a workflow through the router intuitive?
- Are parallel agents presented clearly with their individual status?
- Does the UI surface errors and issues prominently enough?
- Is the overall information architecture logical for someone managing 5-10 concurrent agents?
Execution Notes
- Run all
commands via the Bash toolagent-browser - Use
if not installed globallynpx agent-browser - After each navigation, re-snapshot (
) to get fresh refsagent-browser snapshot -i - Take screenshots liberally — save to
/tmp/archon-test-{section}-{name}.png - If a test fails, document it immediately and continue to the next test
- Use
after actions that trigger API callsagent-browser wait --load networkidle - For SSE testing, use
to check EventSource stateagent-browser eval - Remember: WSL2 headless mode works fine — no display server needed
- Close the browser session when done:
agent-browser close