Luke Davis

Spoke, part 2: building the kanban

This post was autogenerated by Claude Code as part of an ongoing personal knowledgebase project.

The overnight sprint that built @spoke/core wrapped at 3:56am. By the time I came back to the machine, the library was feature-complete: events, session store, sandbox, tool registry, model adapters, the full agent loop. 222 tests passing. The core was done.

Now came the harder question: does it actually work when you build something real on top of it?

The answer to that was a full kanban board. One build day. From nothing to running agents.

Scaffold in 90 minutes

The first commit on packages/kanban hit at 8:45am. By 8:30 - less than two hours later - the skeleton was done and smoke-tested end to end.

The stack: Hono for the API server, Vite plus React and TanStack Router/Query for the frontend, Drizzle ORM against Postgres 17, Tailwind for styling, Bun workspaces tying it all together. There was nothing novel about any individual piece. What mattered was how fast it all came together once you had a clear spec.

K1 was eight sub-tasks: scaffold the package directory, get a Hono health endpoint responding, write the Drizzle schema, wire up ticket CRUD routes, stand up the Vite/React frontend, get Tailwind in place, configure Docker Compose for local development, and prove the full stack round-trip against a real Postgres instance. Each step was a commit. By K1.8, I could POST a ticket and GET it back from a real database.

One small annoyance: the kanban stack needed Postgres on port 5433 because 5432 was already in use by agent-chatroom. Different project names, different named volumes, completely isolated. The Docker Compose project isolation is a genuinely underrated feature.

K1 through K3: board knows it’s a board

The K-ticket notation came from the kanban sub-phases in the phase plan - K1, K2, K3 and so on. I was tracking the kanban build using a kanban-style tracking system, filing tickets like K1.4 and K3.8 for each step. The board was self-aware before it had any agent on it.

After the skeleton, I swapped the build order. K3 (board UI) moved before K2 (orchestrator) so there would be something visual to look at sooner. K3 only needed the CRUD API. K2 needed the full orchestrator running.

The board UI came together column by column. Four columns: BACKLOG, IN PROGRESS, REVIEW, DONE. Each got its own card component - not one component with fifteen optional props, but four distinct components that composed the right slots for their lifecycle state. BacklogCard showed priority and body excerpt. InProgressCard added a timer and a live event log stub. ReviewCard had a verdict badge. DoneCard was read-only.

K3.8 was drag-and-drop: HTML5 DnD, no library. Drag a BACKLOG card to IN PROGRESS and it fires a PATCH request. Optimistic update with rollback on failure. The column glows with an accent outline when you drag over it. Agent-browser verified the whole flow with screenshots.

K2 wired the real orchestrator: a StandardAgentLoop session spawning when a ticket moves to IN PROGRESS, SSE streams for the event log, a ticket bus publishing status changes to all connected clients. By K2.6, a real smoke test passed - “read package.json, tell me the name, don’t write or exec” - the agent read the file, reported back kanban, and the ticket moved to done in about two seconds.

The 200K-token blowout

This is the part I’ll remember.

K3.6 had shipped the full kanban tool kit and MCP server. The orchestrator was wired. Real sessions, real models. I dragged ticket #1 to IN PROGRESS against the live spoke repo workspace and watched it go.

The agent spawned. It called kanban_list_my_sessions. Fine. It called kanban_get_ticket. Fine. Then it called exec(find .).

Find. On the full repository.

The tool result came back as the entire directory tree - every file, every path, thousands of lines. That output hit the API as a tool result payload. The next Claude request was 212,000 tokens. The API returned a 400: prompt_too_long.

The session errored out. The ticket landed in needs_human. And because I hadn’t built the needs_human column yet, the card just vanished from the board entirely.

Three bugs surfaced simultaneously:

  • No stdout cap on the exec builtin - any command could dump unbounded output
  • No output truncation before it hit the model
  • No error recovery for a prompt_too_long API error - the session died rather than degrading gracefully

The workspace wiring had worked perfectly. The orchestrator had spawned correctly, handed off the right tools, run in the right directory. The architecture was fine. The failure was in the assumptions about what agents would actually do when left alone with an exec tool and a large repository.

The recovery

I had three things to fix. The roadmap got reprioritized in real-time.

First: provider switch. Iterating against premium Anthropic models while debugging tool behavior is expensive. K3.8 switched the default to OpenRouter with moonshotai/kimi-k2.6 - cheap, tool-capable, good enough for shakedown work. The lazy credential check meant the server still booted without API keys configured. Claude stayed available behind SPOKE_PROVIDER=anthropic when needed.

After the switch: dragged ticket #1 again. Kimi spawned, ran typecheck and tests (267 passing), added notes, requested review, moved through the full loop. No token blowout.

Second: exec stdout cap. K3.11 added a maxOutputBytes option to builtinExecTool in @spoke/core, defaulting to 32KB per stream. Anything over that gets truncated with a trailer: \n[truncated N bytes from stdout]. The cap applies to stdout and stderr independently. Any future find . stays bounded. Four new tests in core, 228/228 passing.

Third: spawn prompt design. The smoke-test agent had flailed because it had no framing, no scope, no workspace briefing. K3.12 rebuilt the initial prompt from scratch: a ticket markdown document as the opening user message (YAML frontmatter, title, description, conditional handoff summary from the prior session), a system prompt with a Kanban-agent persona and explicit handoff discipline rules, and a three-layer handoff enforcement system. Layer 1: agent must call kanban_handoff before exiting. Layer 2: if no handoff on exit, the orchestrator nudges with a reminder message and re-wakes the loop. Layer 3: if still no handoff on the second exit, a synthetic handoff event gets generated from the last model message. Exactly one handoff per session, guaranteed.

Sandbox as capability

The last major piece that day was K3.13: sandboxes.

The original design had every session running directly against the host working copy. That works for one ticket at a time. It breaks the moment two tickets run concurrently, or an agent makes a mistake, or you want to review what changed before it hits main.

The reframe: sandbox is a capability, not a fixture. Default sessions get the kanban tools and the ticket prompt only - no container, nothing provisioned. The agent reads the ticket, thinks, and decides whether it actually needs a code environment. Most sessions never do. Research, triage, board management tasks are all tool calls against the kanban API, not filesystem work.

When an agent does need a code environment, it calls kanban_provision_sandbox. That tool generates a Docker Compose YAML from a Zod-validated spec, creates a named volume per workspace (clone happens once, git worktree add per session), spins up the container, and registers scoped sandbox tools on the session’s tool registry: sandbox_exec, sandbox_read, sandbox_write, sandbox_git_status, sandbox_git_commit, sandbox_git_push.

The base image bakes in a pre-push hook that rejects any push whose ref doesn’t start with refs/heads/claude/. Agents can push their work. They cannot push to main.

kanban_provision_sandbox is flagged requiresApproval: true. Every provision call requires a human to approve it before the container comes up. That approval gate was K6, built later the same night.

End of day

By the time K3.13 shipped (technically at 12:52am the following morning - night-owl accounting), the critical path was in place.

The orchestrator spawned real sessions. Agents ran in isolated sandboxes with git worktrees on named branches. The exec cap prevented token blowouts. The spawn prompt gave agents scope and handoff discipline. The board showed four columns and live card state. SSE streams kept everything in sync.

What wasn’t done yet: the reviewer loop, subagent spawning, the session monitor page, approval UI, and the dogfood smoke test that would prove the whole thing end to end against a real ticket on the real codebase. That’s what Part 3 covers.


Spoke is a 3-part series. Part 1: The Overnight Sprint | Part 3: In Production