Luke Davis

Spoke, part 3: in production

This post was autogenerated by Claude Code as part of an ongoing personal knowledgebase project.

At 5:25am on April 22nd, a progress log entry landed in the project file:

ENTIRE K3-K8 MVP CRITICAL PATH NOW COMPLETE - PHASE-PLAN Phase 8 (Kanban app) shipped in one night.

Two consecutive nights past 4am. Around 13 commits in this final stretch alone. K3 through K8 - board UI, orchestrator, live log tail, session monitor, approvals, reviewer loop, subagent spawning, host-access permissions - all of it, done.

Then I went to sleep.

That afternoon I came back and used the thing for 3.5 hours.

The reviewer loop (K7)

The last meaningful feature of the night was K7, and it’s the one that changed what Spoke actually is.

Before K7, the Kanban board was a pretty good session monitor with drag-to-spawn. You’d drag a ticket to IN PROGRESS, an agent would run, and when it finished the ticket moved to DONE. That’s useful, but it’s not qualitatively different from a fancy task queue. The agent either did the work or it didn’t, and you’d find out when it was over.

K7 adds a reviewer agent. When a work agent finishes a ticket, a second agent automatically spawns to assess the work. The reviewer reads the predecessor’s full event log - every tool call, every model message, every decision - via the session_events table. It produces a structured verdict: { kind: "pass" } or { kind: "needs_changes", scope: string, reasoning: string }.

If the verdict is pass, the ticket moves to DONE. That’s the happy path.

If the verdict is needs_changes, the work bounces. The orchestrator takes the reviewer’s scope and reasoning, creates a new work session with those as a refocus prompt, increments the attempt counter, and moves the ticket back to IN PROGRESS. The card shows “Attempt 2 of 3.” The cycle repeats - work agent, reviewer agent, verdict. Up to three times. If the third reviewer still rejects the work, the ticket parks itself in a needs_human state with a red banner carrying the final verdict text. At that point the system is telling you: I tried three times, here’s exactly what’s wrong, I need you.

What makes this feel like a team rather than a script is the artifact-based handoff. The reviewer doesn’t get some pre-digested summary of what happened. It reads the actual event stream - the same stream you’d see in the session monitor, every tool call and tool result. And the next work session doesn’t get a handed-off state object. It gets a plain-language scope description from the reviewer, injected as a refocus prompt, and starts fresh from there. No parentSessionId. No shared memory. Just a clear written brief.

The board’s visual system was already built to support this. The SCORE row (derived from kanban_submit_verdict events), the amber progress banner, the attempt counter - all of it was wired up in K3.10 with fixture data. K7 was the phase where real reviewer verdicts started driving those slots instead of fake ones. The infrastructure was waiting; K7 turned it on.

Subagent spawning (K8)

K8 gives agents the ability to create sub-tickets, and sub-tickets spawn their own agents.

The mechanic is simple. An in-progress agent calls kanban_create_subticket with a title, body, workspace, and an optional wait_for_outcome: true flag. That creates a child ticket in the database. That child ticket spawns its own work agent, runs through the full reviewer loop, and completes. If the parent passed wait_for_outcome: true, the system tracks the dependency - when the child reaches a terminal state, it nudges the parent session with a summary of what happened, and the parent’s loop resumes with that information in context.

What this enables is work decomposition at runtime. A ticket like “research X and then implement it” can self-decompose: the agent creates a research sub-ticket, waits on it, gets back a summary, then proceeds to the implementation with actual findings. The board shows sub-ticket cards indented under their parent. The parent’s reviewer, when it runs, can see the sub-ticket outcomes in the parent session’s event stream - no special coupling required.

There’s a depth limit at three levels of nesting. If an agent tries to create a fourth level, it gets a typed error back. This is partly a guard against runaway recursion, and partly a forcing function to think carefully about how you structure work.

K8 also included kanban_search_history - the tsvector full-text search over the entire session_events corpus. Any agent can search across all past sessions. That’s cross-session memory without a dedicated memory system: the event log is already there, already indexed, and now it’s queryable as a tool.

#107 - host-access permissions

The last thing to ship before sleep was #107, which ports Claude Code’s pipe-permissions pattern into Spoke. Agents running against a host_path workspace - where they have access to the actual host filesystem - can now make tool calls that cross the container boundary back to the host, but with a permission layer in front of them.

The implementation classifies commands as allow, deny, or abstain. A SAFE_COMMANDS list covers the obvious read-only operations: ls, cat, rg, git log, git status, git diff, bun --version. A set of DENY_PATTERNS catches the dangerous ones: rm -rf /, curl | bash, force-pushes to main, find with -exec rm or -delete, fork bombs, SQL DROP. Anything not clearly safe or clearly dangerous lands as abstain, which routes through the K6 approval gate - the same modal you’d see in the session monitor for any other requiresApproval tool.

This is the same mental model Claude Code uses for its own tool permissions. If you’ve used the “allow this command” flow in Claude Code, you already understand how Spoke’s host-access layer works.

5:25am

The progress log entry that marked the build complete was logged at 5:25am. By then K3.10 through #107 had all shipped in a single overnight push - the pretty card system, reviewer loop, approvals gate, live log tail, session monitor, subagent spawning, and host permissions. The test count was sitting at 392 passing, across both @spoke/core and the Kanban app.

The note in the log said: “Every ticket in Phase 8’s critical-path block closed.”

Then I went to sleep for a few hours.

3.5 hours of actually using it

That afternoon - April 22nd, starting around 4:48pm - I used the finished Kanban to do real work. Not a demo. Not a “let’s create a ticket that says hello world.” I used it to plan and execute actual Spoke development work, with the board running against the Spoke repository itself.

The dogfood loop was the whole point. Kanban was never meant to be a finished product; it was meant to prove that @spoke/core could support a real workflow built on top of it. If it could, the architecture was sound. If it couldn’t, better to find out by using it than by shipping it.

What worked well was the board as a thinking surface. Creating tickets, assigning workspaces, setting up the work queue - that process forced me to be clearer about what I actually wanted done. A ticket you’re about to hand to an agent has to be specific. Vague tickets get vague work. That sharpening effect was real.

The live session monitor held up. Watching the event stream as an agent ran - seeing each tool call come in, the model messages, the status transitions - felt like useful transparency. Not overwhelming, but not opaque either. The log panel on the IN PROGRESS card gave you enough to know what was happening without making you stare at a wall of text.

The reviewer loop was slower than I expected, in a good way. The reviewer agent being deliberate, actually reading the event log, producing reasoning - it felt like it was doing something real rather than rubber-stamping. A couple of verdicts came back as needs_changes with specific scope descriptions that were genuinely useful for the retry prompt.

What felt rough was the workspace setup. Creating and configuring workspaces, wiring tickets to the right one, handling the edge cases where an agent runs in a workspace it doesn’t have full context about - that part required more manual supervision than I wanted. It’s the right problem to have, because the solution is better defaults and better agent briefing, not a different architecture.

What building Spoke taught me

The thing I didn’t expect was how much the architecture decisions early in the build constrained and enabled everything that came later. The choice to make session_events the canonical corpus - every card slot derived from the event stream, no snapshot columns - meant that K7’s reviewer could read a rich history without any special plumbing. The choice to never use parentSessionId anywhere meant that handoff had to be explicit, which made it more robust.

The constraint I’d go back and relax is the sandbox coupling. The K3.13 Docker sandbox work was necessary and it came out well, but the decision to make agents provision sandboxes on-demand rather than always providing one adds friction to tickets where the agent does actually need code execution. A smarter default might be to provision a lightweight workspace by default and let agents opt out, rather than opt in.

Next up is Phase 8.5 - the eval loop. The idea is to take real sessions from the session_events corpus, replay them against modified system prompts or different models, and grade the results mechanically. Kanban-builds-Kanban is the dogfood target. It’ll be the first real test of whether the reviewer loop is calibrated, or whether I’ve been watching it work on easy cases.

Phase 9 is Channels: Slack-like rooms with agent personas. Same @spoke/core, different surface.


Spoke is a 3-part series. Part 1: The Overnight Sprint | Part 2: Building the Kanban