Spoke, part 1: the overnight sprint

This post was autogenerated by Claude Code as part of an ongoing personal knowledgebase project.

I’ve been running Claude Code agents for months - multiple sessions at once, ten or more, different branches, different parts of the codebase. What I kept hitting wasn’t a model problem or a context problem. It was a runtime problem. There was no runtime.

Each agent I spawned was stateless from the outside. It did its work, exited, and its history evaporated. If I wanted agents doing sustained work inside individual tickets - a kanban board where a work session lives, where a reviewer agent can actually see what the worker did, where retries pick up with real context - I couldn’t just spawn Claude Code processes and hope. I needed a runtime I owned.

Not a cloud service. Not a framework locking you into one model. An SDK: TypeScript interfaces, a default agent loop, pluggable arms. The kind of thing you bun add, wire up, and build your actual app on top of.

That’s what I decided to build the evening of April 20.

The architecture clicked into place

A conversation about Anthropic’s managed agents architecture crystallized the design. By evening it had resolved into four arms:

SessionStore - where events live. Pure append-only log. append(), getEvents(), nothing else. No lifecycle management, no session creation - just a dumb persistent store.

Sandbox - where code runs. exec(), read(), write(). It knows nothing about sessions or workspaces. LocalSandbox uses Bun.spawn. Later there would be DockerSandbox.

ToolRegistry - what tools the agent can call. ajv-compiled validators, MCP server registration, allow/deny list gating.

AgentLoop - the runner. wake(), pause(), onEvent(). Think-act loop that reads the session store, calls the model, dispatches tools, writes results back. StandardAgentLoop covers the common case; the interface exists for everything else.

Four focused runtime dependencies - ulid, ajv, and the MCP and Anthropic SDKs - no framework, no ORM, no magic. Just TypeScript interfaces and a default loop. I wrote TDD-GUIDE.md before a single line of implementation.

The rules before the build

TDD-GUIDE.md came first. Red-green-refactor only. One failing test at a time. RED commits contain tests only; GREEN commits contain implementation only - never both in the same diff. Data-driven test.each tables over repetitive blocks. Mock at system boundaries only.

The anti-fraud section was explicit: deleting a failing test to go green is fraud. Weakening an assertion is fraud. Writing implementation first and backfilling tests is tests-after wearing a TDD costume.

That document was the contract. Then the build started.

88 minutes

2:28am - Phase 0

Workspace scaffold. git init. A Bun workspaces package.json covering packages/* and apps/*. Strict TypeScript base config. @spoke/core skeleton: empty src/index.ts, tsconfig extending base, tsc --noEmit clean.

Two tasks. Three minutes.

[2026-04-21T02:31:41-0500]: Phase 0 complete. commit 5646954.

2:31am - Phase 1: Events

ULID helper first - 4 tests covering format, sortability, strict monotonicity within a millisecond, uniqueness across 1000 IDs. All passing.

Then the discriminated union event type. Fourteen frozen variants: user_message, model_message, tool_call, tool_result, approval_request, approval_granted, approval_denied, outcome, handoff, system_provisioned, system_destroyed, harness_woke, harness_exited, note. Frozen at 14 - apps extend via payload metadata, not new event types.

Then assertNever for exhaustiveness. Data-driven test.each over all 14 values. A missing branch is a tsc failure, not a runtime surprise.

[2026-04-21T02:46:45-0500]: Phase 1 complete. 37/37 tests pass.

2:46am - Phase 2: SessionStore

InMemorySessionStore backed by a Map. append() assigns a ULID and timestamp. Filters covered by a 15-case test.each table: type, timestamp bounds, ULID bounds (inclusive), limit. Events deep-frozen on append via structuredClone - you can’t mutate what comes back from getEvents(). Characterization tests pin ULID ordering under Promise.all concurrency.

[2026-04-21T03:09:58-0500]: Phase 2 complete. 66/66 tests pass.

3:09am - Phase 3: Sandbox

LocalSandbox via Bun.spawn. Path containment on read() and write() - absolute paths, ~/ paths, ../ escape attempts all rejected. Tests in fresh mkdtemp dirs, cleaned in afterEach. Three-layer env merging: process, sandbox-level, call-level. CWD locked to sandbox root. Timeout watchdog kills on deadline, timedOut: true typed with exactOptionalPropertyTypes.

[2026-04-21T03:15:16-0500]: Phase 3 complete. 93/93 tests pass.

3:15am - Phase 4: Tools

ToolRegistry with ajv-compiled per-tool validators - schema validation before the tool ever runs. Three typed errors. Builtin exec, read, write factory functions wrap a Sandbox. Allow/deny list gating: deny wins, empty allow list allows nothing. connectMcpServer() takes a pre-connected MCP SDK Client and auto-registers every tool with a mcp:<namespace>: prefix. InMemoryTransport-based tests - no real MCP server needed.

[2026-04-21T03:26:18-0500]: Phase 4 complete. 133/133 tests pass across 14 files.

3:26am - Phase 5: Model adapters

ModelAdapter interface with normalized ModelInput and ModelResponse. A ScriptedAdapter for tests: feed it scripted responses, it returns them in order. ScriptedAdapterOverconsumedError when you run off the end.

OpenRouter adapter: handles system injection, tool_use to tool_calls, tool_result to role:tool, finish_reason mapping. Fetch-injectable - no network in tests. Claude SDK adapter: cache-aware system prompts, cache_control on the last tool definition, token normalization. Duck-typed ClaudeAnthropicLike keeps tests hermetic.

[2026-04-21T03:34:14-0500]: Phase 5 complete. 172/172 tests pass across 17 files.

3:34am - Phase 6: AgentLoop

StandardAgentLoop built in six sub-tasks, each its own red-green cycle:

buildMessages() - event-to-AnthropicMessage assembly, tool_result grouping, 11 filtered types via test.each. Single-turn model call plus model_message emission - end_turn maps to outcome(success) + harness_exited(completed). Tool dispatch: tool_use fires tool_call, runs the tool, emits tool_result (ok or error), next turn starts. Unknown and denied tools produce error results - the loop continues. maxTurns ceiling. pause() checked at top of each turn. Error handling: model.complete() throws caught, stringified, harness_exited(error, message).

[2026-04-21T03:53:42-0500]: Phase 6 complete. 212/212 tests pass across 24 files.

3:53am - Phase 7: Integration

E2E smoke with real LocalSandbox, real builtin tools, scripted model - write-then-read-then-summarize plus a deny list path. runSessionStoreContract() extracted as a reusable harness: any future SessionStore impl runs 8/8 contract tests for free. CLI proof-of-concept at packages/core/examples/loop.ts, runnable with bun run example:loop.

[2026-04-21T03:56:45-0500]: Phase 7 complete. 222/222 tests pass across 26 files. @spoke/core is ready for app development.

3:56am. The sprint started at 2:28am. 88 minutes, start to finish, to a fully-tested TypeScript SDK.

What existed at dawn

222 tests across 26 files, every one hermetic. Four runtime dependencies, no framework, no ORM. A library you could bun add and build an agent system on top of, with clean interfaces for all four arms and a default implementation that covered the common case.

Later that morning I ran it against real Claude. A 17-check E2E validator passed 17/17. The SDK did what it said it would do.

The kanban app was next. That’s Part 2.

Spoke is a 3-part series. Part 2: Building the Kanban | Part 3: In Production