The Agent Harness Blueprint¶
How to Build an AI Agent CLI from Scratch — Patterns from Claude Code
This document is a comprehensive engineering tutorial for developers who want to build their own AI agent harness — a system like Claude Code, Cursor, or Codex that wraps an LLM API with tool execution, permission control, streaming, session management, and a conversation loop. Every pattern here is extracted from Claude Code's real TypeScript source. If you can read TypeScript and have called an LLM API at least once, you have enough background to follow along and build your own.
Master Architecture¶
User Input --> Bootstrap --> QueryEngine --> queryLoop()
|
+--------+----------++---------+-----------+
| | | | |
API Call Tools Permissions State Sessions
| | | | |
Stream Dispatch Rules AppState Persist
Events Pipeline +Classify Store Transcript
| | | | |
+--------+----+-----+----------+-----------+
|
tool_result
messages
back to
queryLoop()
Conventions¶
- Source: references point to TypeScript files in Claude Code's source tree.
- All code snippets are real TypeScript, cleaned up for readability (imports, UI rendering, and analytics stripped).
- "Build It Yourself" sections at the end of each chapter give a minimal recipe you can implement in any language.
Chapter 1: Tool System¶
Tools are the agent's hands. Everything the agent does in the real world — reading files, running commands, searching code — goes through the tool interface.
1.1 The Tool Interface¶
Every tool in the system implements this interface. The methods below are the essential contract — rendering, analytics, and documentation noise have been stripped.
interface Tool {
// Identity
name: string
aliases?: string[]
searchHint?: string
// Schemas
inputSchema: ZodSchema
outputSchema?: ZodSchema
// Core execution
call(
input: unknown,
context: ToolUseContext,
canUseTool: CanUseTool,
parentMessage: AssistantMessage,
onProgress?: (progress: ProgressEvent) => void
): Promise<ToolResult>
// Permission & validation gates
checkPermissions(
input: unknown,
context: ToolUseContext
): Promise<PermissionResult>
validateInput(
input: unknown,
context: ToolUseContext
): Promise<ValidationResult>
// Capability flags
isEnabled(): boolean
isReadOnly(input: unknown): boolean
isConcurrencySafe(input: unknown): boolean
isDestructive(input: unknown): boolean
// Output control
maxResultSizeChars: number
// Result formatting
mapToolResultToToolResultBlockParam(
content: ToolResultContent,
toolUseID: string
): ToolResultBlockParam
}
Key observations:
call()receives acanUseToolcallback — tools can spawn sub-tool-calls (the Agent tool does this), and the permission system flows through.onProgressis optional — long-running tools (Bash, Agent) emit intermediate progress; simple tools (Glob, Grep) return in one shot.checkPermissionsis separate fromvalidateInput— validation catches malformed input, permissions check policy. They run in sequence, never merged.
1.2 buildTool() Factory¶
Tools are defined as partial objects and completed by buildTool(), which spreads fail-closed defaults:
const TOOL_DEFAULTS = {
isEnabled: () => true,
isConcurrencySafe: () => false,
isReadOnly: () => false,
isDestructive: () => false,
checkPermissions: (input) =>
Promise.resolve({ behavior: "allow", updatedInput: input }),
toAutoClassifierInput: () => "",
}
function buildTool(def: Partial<Tool> & { name: string }): Tool {
return { ...TOOL_DEFAULTS, userFacingName: () => def.name, ...def }
}
Why fail-closed defaults matter:
| Default | Value | Effect |
|---|---|---|
isConcurrencySafe |
false |
New tools run serially until proven safe |
isReadOnly |
false |
New tools are treated as mutating |
isDestructive |
false |
Not flagged as dangerous (opt-in) |
checkPermissions |
allow |
Open by default — tightened at registration |
The caller (tool author) spreads their overrides on top. A tool that only defines name, inputSchema, and call() gets safe defaults for everything else. This means forgetting to set isConcurrencySafe results in serial execution, not accidental parallel mutation — the system fails toward safety.
1.3 Tool Registration¶
getAllBaseTools()¶
Built-in tools are registered in a single function. Feature-gated tools use conditional spreads:
function getAllBaseTools(): Tool[] {
return [
AgentTool,
BashTool,
FileReadTool,
FileEditTool,
FileWriteTool,
GlobTool,
GrepTool,
WebFetchTool,
WebSearchTool,
SkillTool,
...(isWorktreeEnabled() ? [EnterWorktreeTool, ExitWorktreeTool] : []),
...(isTodoV2() ? [TaskCreateTool, TaskUpdateTool, TaskListTool] : []),
...(isToolSearchEnabled() ? [ToolSearchTool] : []),
]
}
This pattern keeps the tool list declarative. Feature flags gate entire tools — there is no runtime if inside each tool checking whether it should exist.
assembleToolPool()¶
The final tool list merges built-in and MCP (Model Context Protocol) tools:
function assembleToolPool(
permissionContext: PermissionContext,
mcpTools: Tool[]
): Tool[] {
// Built-in tools, filtered by deny rules and isEnabled()
const builtIn = getTools(permissionContext)
// MCP tools, also filtered by deny rules
const allowedMcp = filterToolsByDenyRules(mcpTools, permissionContext)
// Sort each partition for prompt-cache stability, built-ins first
return uniqBy(
[...builtIn].sort(byName).concat(allowedMcp.sort(byName)),
"name"
)
}
Why sort for cache stability? The tool list is serialized into the system prompt. If tools appear in a different order between calls, the prompt changes, and the API's prompt cache misses. Sorting built-ins first (alphabetically) and MCP tools second (alphabetically) ensures the same tool set always produces the same prompt prefix. uniqBy on name means a built-in tool always wins over an MCP tool with the same name.
1.4 Reference Implementation: GlobTool¶
Here is GlobTool — one of the simplest real tools — cleaned up to show the pattern without noise:
const GlobTool = buildTool({
name: "Glob",
inputSchema: z.object({
pattern: z.string().describe("Glob pattern to match files against"),
path: z
.string()
.optional()
.describe("Directory to search in. Defaults to cwd."),
}),
outputSchema: z.object({
filenames: z.array(z.string()),
truncated: z.boolean(),
}),
isReadOnly: () => true,
isConcurrencySafe: () => true,
async checkPermissions(input, context) {
return checkReadPermissionForTool(
input.path ?? context.workingDirectory,
context
)
},
async validateInput(input, context) {
const targetPath = input.path ?? context.workingDirectory
if (!existsSync(targetPath)) {
return { result: false, message: `Path does not exist: ${targetPath}` }
}
const stat = statSync(targetPath)
if (!stat.isDirectory()) {
return { result: false, message: `Path is not a directory: ${targetPath}` }
}
return { result: true }
},
async call(input, context) {
const cwd = input.path ?? context.workingDirectory
const matches = await glob(input.pattern, {
cwd,
nodir: true,
absolute: true,
maxResults: MAX_GLOB_RESULTS,
})
// Sort by modification time, most recent first
const sorted = matches
.map((f) => ({ file: f, mtime: statSync(f).mtimeMs }))
.sort((a, b) => b.mtime - a.mtime)
.map((entry) => entry.file)
const truncated = sorted.length >= MAX_GLOB_RESULTS
return {
data: { filenames: sorted, truncated },
}
},
mapToolResultToToolResultBlockParam(content, toolUseID) {
const { filenames, truncated } = content.data
const text = filenames.join("\n") + (truncated ? "\n(results truncated)" : "")
return {
type: "tool_result",
tool_use_id: toolUseID,
content: text || "No files matched the pattern.",
}
},
})
What to notice:
isReadOnly: trueandisConcurrencySafe: true— multiple Glob calls can run in parallel, and they never mutate the filesystem.checkPermissionsdelegates to a shared read-permission helper — it does not reinvent permission logic.validateInputchecks preconditions (path exists, is a directory) beforecall()runs.call()returns{ data: ... }— the data is typed byoutputSchema.mapToolResultToToolResultBlockParamcontrols what the model sees — it joins filenames with newlines, a format the model parses easily.
1.5 Build It Yourself¶
5-step recipe for a minimal tool system:
-
Define a Tool protocol/interface with at minimum:
name,inputSchema,call(),checkPermissions(). AddisReadOnly()andisConcurrencySafe()from the start — you will need them for the executor. -
Create a
buildTool()factory that applies fail-closed defaults. Every boolean capability defaults to the safe value (not concurrent, not read-only). Every permission check defaults to allow (tightened at the pool level). -
Register tools in a flat array. Use feature flags with conditional spreads for optional tools. Do not use a registry class or dependency injection — a function returning an array is simpler and easier to debug.
-
Write
assembleToolPool()that merges built-in tools with external (MCP) tools. Sort deterministically for prompt-cache stability. Deduplicate by name, built-ins winning. -
Start with GlobTool-level simplicity. Your first tool should have a Zod schema (or equivalent), a permission check that delegates to a shared helper, and a single async function body. Get one tool working end-to-end before building complex ones.
ToolDef (partial) --> buildTool() --> Tool (complete)
|
getAllBaseTools() --> [Tool, Tool, ...]
|
getTools(permCtx) --> filter deny rules + isEnabled
|
assembleToolPool() --> dedup, sort, merge MCP
Chapter 2: Tool Execution Pipeline¶
Between the model saying "use this tool" and the tool actually running, six phases of validation, permission, and hook processing ensure nothing executes without authorization.
2.1 The 6-Phase Pipeline¶
Every tool call passes through these phases in order. Failure at any phase short-circuits — the remaining phases do not run.
Phase 1: Schema Validation
const parsed = tool.inputSchema.safeParse(input)
if (!parsed.success) {
return createErrorResult(toolUseID, formatZodError(parsed.error))
}
The model sometimes sends malformed JSON or missing required fields. Schema validation catches this before any side effects occur. The error message goes back to the model so it can retry with correct input.
Phase 2: Semantic Validation
const valid = await tool.validateInput(parsed.data, context)
if (!valid.result) {
return createErrorResult(toolUseID, valid.message)
}
Schema validation checks structure; semantic validation checks meaning. "Does this file path exist?" "Is this a directory, not a file?" "Is the timeout within bounds?" These are business rules the schema cannot express.
Phase 3: Pre-Tool Hooks
for await (const result of runPreToolUseHooks(context, tool, input)) {
if (result.hookPermissionResult) {
// Hook can approve or deny the tool call
hookDecision = result.hookPermissionResult
}
if (result.updatedInput) {
// Hook can transform input (e.g., rewrite paths, add defaults)
input = result.updatedInput
}
}
Hooks are user-defined code that runs before and after tools. A pre-tool hook can approve a tool call (skipping the interactive permission prompt), deny it outright, or transform the input. Hooks are configured in project settings and can implement auto-accept rules, audit logging, or input sanitization.
Phase 4: Permission Decision
const decision = await resolvePermission(hookResult, tool, input, canUseTool)
if (decision.behavior !== "allow") {
return createDenialResult(toolUseID, decision.message)
}
This phase resolves the final permission. Sources of permission (in priority order): hook approval/denial, tool's own checkPermissions(), interactive user prompt via canUseTool(). If no hook has decided, and checkPermissions() returns ask, the system calls canUseTool() to prompt the user.
Phase 5: Tool Execution
The tool runs. This is the only phase with side effects. onProgress streams intermediate output for long-running tools. canUseTool is passed through so tools that spawn sub-tools (like the Agent tool) can request permissions for those sub-calls.
Phase 6: Post-Tool Hooks
for await (const hookResult of runPostToolUseHooks(context, tool, result)) {
if (hookResult.updatedMCPToolOutput) {
result = hookResult.updatedMCPToolOutput
}
}
Post-tool hooks can transform the output before it is sent back to the model. Use cases: auto-formatting code output, redacting sensitive data, adding metadata.
2.2 runToolUse() -- The Async Generator¶
The pipeline is wrapped in an async generator so progress events and the final result flow through the same channel:
async function* runToolUse(
toolUse: ToolUseBlock,
assistantMessage: AssistantMessage,
canUseTool: CanUseTool,
context: ToolUseContext
): AsyncGenerator<ToolUpdate> {
// Find tool by name (or alias)
const tool = findToolByName(context.tools, toolUse.name)
if (!tool) {
yield {
message: createErrorResult(toolUse.id, `No such tool: ${toolUse.name}`),
}
return
}
// Run the 6-phase pipeline, yielding progress along the way
for await (const update of checkPermissionsAndCallTool(
tool,
toolUse.id,
toolUse.input,
assistantMessage,
canUseTool,
context
)) {
yield update
}
}
Why an async generator? The tool execution pipeline produces multiple events over time: permission prompts, progress updates, partial output, and the final result. An async generator naturally models this sequence without callbacks or event emitters. The caller (StreamingToolExecutor) consumes updates as they arrive and forwards them to the UI.
2.3 StreamingToolExecutor¶
The model can request multiple tool calls in a single response. The StreamingToolExecutor manages concurrent execution with a safety invariant: tools marked isConcurrencySafe run in parallel; all others run exclusively.
class StreamingToolExecutor {
private tools: TrackedTool[] = []
addTool(block: ToolUseBlock, assistantMessage: AssistantMessage) {
const tool = findToolByName(this.allTools, block.name)
const isConcurrencySafe = tool?.isConcurrencySafe(block.input) ?? false
this.tools.push({
id: block.id,
name: block.name,
input: block.input,
status: "queued",
isConcurrencySafe,
assistantMessage,
})
this.processQueue()
}
private canExecuteTool(isConcurrencySafe: boolean): boolean {
const executing = this.tools.filter((t) => t.status === "executing")
if (executing.length === 0) return true
// Parallel only if ALL executing tools AND the new tool are concurrency-safe
return isConcurrencySafe && executing.every((t) => t.isConcurrencySafe)
}
private processQueue() {
for (const tool of this.tools) {
if (tool.status !== "queued") continue
if (this.canExecuteTool(tool.isConcurrencySafe)) {
tool.status = "executing"
this.executeTool(tool)
} else if (!tool.isConcurrencySafe) {
break // Must wait — this tool needs exclusive access
}
// If concurrency-safe but blocked by a non-safe tool, skip and check next
}
}
private async executeTool(tracked: TrackedTool) {
for await (const update of runToolUse(tracked, ...)) {
tracked.updates.push(update)
}
tracked.status = "completed"
this.processQueue() // Unblock waiting tools
}
getCompletedResults(): ToolUpdate[] {
return this.tools
.filter((t) => t.status === "completed")
.flatMap((t) => t.updates)
}
}
Concurrency rules:
| Currently executing | New tool | Action |
|---|---|---|
| Nothing | Any | Execute immediately |
| Concurrency-safe tools only | Concurrency-safe | Execute in parallel |
| Concurrency-safe tools only | Non-safe | Wait for all to finish |
| Non-safe tool | Any | Wait for it to finish |
This means three concurrent Glob calls (all isConcurrencySafe: true) run in parallel, but a BashTool call (isConcurrencySafe: false) waits for exclusive access. The queue processes in order, so a non-safe tool blocks everything behind it.
2.4 Progress Callbacks¶
Long-running tools use onProgress to stream intermediate output to the UI. Here is the pattern from BashTool:
async call(input, context, canUseTool, parentMessage, onProgress) {
const generator = runShellCommand({
command: input.command,
timeout: input.timeout,
cwd: context.workingDirectory,
abortSignal: context.abortSignal,
})
let counter = 0
let result
do {
result = await generator.next()
if (!result.done && onProgress) {
onProgress({
toolUseID: `bash-progress-${counter++}`,
data: {
type: "bash_progress",
output: result.value.output,
isStderr: result.value.isStderr,
},
})
}
} while (!result.done)
return {
data: {
exitCode: result.value.exitCode,
stdout: result.value.stdout,
stderr: result.value.stderr,
},
}
}
The shell command itself is an async generator that yields output chunks as they arrive from the subprocess. BashTool consumes these chunks and re-emits them as progress events. The UI renders them as live terminal output. When the command finishes, BashTool returns the final result with exit code.
2.5 Build It Yourself¶
4-step recipe for a minimal execution pipeline:
-
Implement the 6-phase pipeline as a function. Each phase returns early on failure. Start with just schema validation + call; add permission checks and hooks incrementally.
-
Wrap the pipeline in an async generator (or your language's equivalent — channels in Go, streams in Rust). This gives you a single return type for progress events, permission prompts, and the final result.
-
Build
StreamingToolExecutorwith a queue and concurrency check. The invariant is simple: count executing tools, check if all are concurrency-safe. Start with serial-only execution and add parallelism once your tests pass. -
Add
onProgressto your tool interface but make it optional. Only long-running tools (shell commands, agent sub-calls) need it. Short tools (file read, glob) return immediately.
tool_use block from API
|
runToolUse() <-- async generator
|
6-phase pipeline
1. Schema validate
2. Semantic validate
3. Pre-tool hooks
4. Permission decision
5. Tool execution <-- side effects here
6. Post-tool hooks
|
tool_result message --> back to query loop
Chapter 3: Query Engine & Conversation Loop¶
The query engine is the brain's control loop: send messages to the API, collect tool calls, execute them, feed results back, and repeat until the model stops or a budget is exhausted.
3.1 QueryEngine Class¶
The QueryEngine owns the conversation state and provides the entry point for all interactions:
class QueryEngine {
private mutableMessages: Message[]
private totalUsage: Usage
private permissionDenials: PermissionDenial[]
private readFileState: FileStateCache
constructor(config: QueryEngineConfig) {
this.mutableMessages = config.initialMessages ?? []
this.totalUsage = {
input_tokens: 0,
output_tokens: 0,
cache_creation_input_tokens: 0,
cache_read_input_tokens: 0,
}
this.permissionDenials = []
this.readFileState = config.readFileCache
}
}
Key state:
mutableMessages— the full conversation history. This is the one piece of mutable state in the system (necessary because the API expects the full history on each call).totalUsage— accumulated token counts across all turns, used for budget enforcement.permissionDenials— tracks which permissions the user denied, so the system does not re-ask within the same session.readFileState— caches file contents so the system can detect when a file has changed between reads (stale read detection).
3.2 submitMessage() Entry Point¶
submitMessage() is the public API. It takes a user prompt and yields messages as they arrive:
async *submitMessage(
prompt: string,
options?: SubmitOptions
): AsyncGenerator<NormalizedMessage> {
// 1. Build system prompt from context
const { systemPrompt, userContext, systemContext } =
await fetchSystemPromptParts(this.config, this.readFileState)
// 2. Process user input (check for slash commands like /help, /clear)
const { messages, shouldQuery } = await processUserInput(
prompt,
this.mutableMessages,
this.config
)
this.mutableMessages.push(...messages)
// 3. Early return for slash commands that don't need the API
if (!shouldQuery) {
yield { type: "command_result", content: messages }
return
}
// 4. Persist transcript BEFORE API call (crash recovery)
await recordTranscript(this.mutableMessages)
// 5. Enter query loop
const tools = assembleToolPool(this.config.permissionContext, this.mcpTools)
for await (const message of queryLoop({
messages: this.mutableMessages,
systemPrompt,
tools,
canUseTool: this.canUseTool.bind(this),
maxTurns: this.config.maxTurns,
maxBudgetUsd: this.config.maxBudgetUsd,
abortSignal: this.config.abortSignal,
})) {
// 6. Accumulate usage, track messages
this.totalUsage = accumulateUsage(this.totalUsage, message.usage)
this.mutableMessages.push(message)
yield normalizeMessage(message)
}
// 7. Yield final result with totals
yield {
type: "result",
usage: this.totalUsage,
stop_reason: this.lastStopReason,
cost_usd: calculateCost(this.totalUsage),
}
}
Why persist before the API call? If the process is killed during a long API stream, the transcript file contains the user's message. On restart with --resume, the system loads the transcript and replays from the last user message. If persistence happened after the API call, a crash would lose the user's prompt entirely.
3.3 queryLoop() -- The Heart¶
This is the central loop. It calls the API, executes tools, handles errors, and decides whether to continue or stop.
type QueryLoopState = {
messages: Message[]
toolUseContext: ToolUseContext
turnCount: number
maxOutputTokensRecoveryCount: number
hasAttemptedReactiveCompact: boolean
}
async function* queryLoop(
params: QueryLoopParams
): AsyncGenerator<Message> {
let state: QueryLoopState = {
messages: params.messages,
toolUseContext: params.toolUseContext,
turnCount: 1,
maxOutputTokensRecoveryCount: 0,
hasAttemptedReactiveCompact: false,
}
while (true) {
const { messages, turnCount } = state
// ── Budget gates ──
if (turnCount >= params.maxTurns) {
yield createErrorMessage("error_max_turns")
return
}
if (params.totalCostUsd >= params.maxBudgetUsd) {
yield createErrorMessage("error_max_budget")
return
}
if (params.abortSignal?.aborted) {
return
}
// ── 1. STREAMING API CALL ──
let needsFollowUp = false
let withheld = false
let stopReason: string | null = null
const toolResults: ToolResultMessage[] = []
const executor = new StreamingToolExecutor(state.toolUseContext.tools)
for await (const event of callModel({
messages,
systemPrompt: params.systemPrompt,
tools: params.tools,
abortSignal: params.abortSignal,
})) {
if (event.type === "assistant") {
// ── 2. COLLECT TOOL USE BLOCKS ──
const toolBlocks = event.content.filter(
(c) => c.type === "tool_use"
)
if (toolBlocks.length > 0) {
needsFollowUp = true
for (const block of toolBlocks) {
executor.addTool(block, event)
}
}
stopReason = event.stop_reason
}
// ── 3. YIELD COMPLETED TOOL RESULTS ──
for (const result of executor.getCompletedResults()) {
yield result.message
toolResults.push(result.message)
}
// ── 4. WITHHOLD RECOVERABLE ERRORS ──
if (isPromptTooLong(event)) {
withheld = true
}
if (isMaxOutputTokens(event)) {
withheld = true
}
if (!withheld) {
yield event
}
}
// Drain any remaining completed tools after stream ends
for (const result of executor.getCompletedResults()) {
yield result.message
toolResults.push(result.message)
}
// ── 5. RECOVERY: prompt too long ──
if (withheld && isPromptTooLong && !state.hasAttemptedReactiveCompact) {
const compacted = await compactMessages(messages)
state = {
...state,
messages: compacted,
hasAttemptedReactiveCompact: true,
}
continue // Retry with compressed history
}
// ── 5b. RECOVERY: max output tokens ──
if (withheld && isMaxOutputTokens && state.maxOutputTokensRecoveryCount < 3) {
state = {
...state,
maxOutputTokensRecoveryCount: state.maxOutputTokensRecoveryCount + 1,
}
continue // Retry — model will continue where it left off
}
// ── 6. FEED TOOL RESULTS BACK ──
if (needsFollowUp && turnCount < params.maxTurns) {
const updatedMessages = [
...messages,
createUserMessage({ content: toolResults }),
]
state = {
...state,
messages: updatedMessages,
turnCount: turnCount + 1,
}
continue // Next API call with tool results
}
// ── 7. TERMINAL — no more tool calls, model is done ──
return
}
}
The immutable state pattern: Notice that state is reassigned, never mutated. Each iteration creates a new state object with the spread operator. This makes the loop's behavior predictable — you can inspect any state snapshot without worrying about hidden mutations from earlier in the iteration.
The continue/return discipline: Every branch at the bottom of the loop either continues (more work to do) or returns (done). There is no fallthrough. This makes the control flow explicit and prevents accidental infinite loops.
3.4 Recovery Mechanisms¶
The query loop handles four categories of recoverable failure:
| Scenario | Detection | Recovery | Limit |
|---|---|---|---|
| Prompt too long | API returns 400 with prompt_too_long |
Reactive compact: summarize older messages, keep recent context | 1 attempt |
| Max output tokens | stop_reason === "max_tokens" |
Truncate partial output, retry — model continues from truncation point | 3 attempts |
| Context collapse | Token count crosses threshold | Drain oldest messages from history proactively | Feature-gated |
| Model fallback | Streaming error (network, server) | Insert tombstone message explaining gap, optionally switch model | 1 fallback |
Reactive compaction deserves detail: when the prompt exceeds the model's context window, the system summarizes the conversation history into a condensed form. Recent messages (last N turns) are kept verbatim. Older messages are replaced with a system-generated summary. This preserves the model's understanding of recent context while freeing token budget.
Max output token recovery works because the API returns partial output. The system keeps the partial output, appends it to the history, and calls the API again. The model sees its own partial response and continues from where it stopped. Three retries are allowed before the system gives up.
3.5 Budget Gates¶
Budget checks run at the top of each loop iteration, before the API call:
// Turn limit — prevents runaway loops
if (turnCount >= maxTurns) {
yield createErrorMessage("error_max_turns")
return
}
// Dollar cost limit — prevents surprise bills
if (totalCostUSD >= maxBudgetUsd) {
yield createErrorMessage("error_max_budget")
return
}
// Token budget — finer-grained than dollar cost
if (tokenBudget.exceeded) {
yield createErrorMessage("error_max_budget")
return
}
// User cancellation — Ctrl+C or programmatic abort
if (abortSignal.aborted) {
return
}
// Structured output retries — prevents infinite retry loops
if (structuredOutputRetries >= 5) {
yield createErrorMessage("error_max_structured_output_retries")
return
}
These gates are checked before every API call, so the system cannot spend more than one turn's worth of tokens over budget. The dollar cost is calculated from token counts using the model's pricing table.
3.6 Transcript Persistence Strategy¶
Transcript writes are strategically timed for crash resilience without sacrificing performance:
BEFORE query: user messages --> BLOCKING write (~4ms on SSD)
DURING query: assistant messages --> fire-and-forget (non-blocking)
AFTER query: final flush --> BLOCKING write for safety
The reasoning:
- User messages are irreplaceable — they represent the human's intent. A blocking write ensures they are on disk before the API call starts. If the process crashes during the API call,
--resumehas the user's message and can replay. - Assistant messages are reproducible — the API can regenerate them. Fire-and-forget writes are fast and usually succeed, but losing one is recoverable by re-calling the API.
- Final flush is blocking — at the end of a turn, the full conversation (including tool results) is persisted. This ensures
--resumestarts from a complete state.
The transcript file is append-only JSON Lines. Each line is a complete message object. This format survives partial writes — a crash mid-write loses at most one message, and the file parser skips malformed trailing lines.
3.7 Build It Yourself¶
5-step recipe for a minimal query engine:
-
Start with a
while (true)loop. Call the API, check for tool_use blocks in the response, execute them, append results, and call the API again. Break when the response has no tool_use blocks. -
Add budget gates at the top of the loop. At minimum: turn count limit and abort signal. Add dollar cost limits once you have token counting.
-
Implement
submitMessage()as the public entry point. It builds the system prompt, appends the user message, enters the query loop, and returns the accumulated result. -
Add transcript persistence. Write user messages to disk before the API call. Write everything else after. Use append-only JSON Lines — it is the simplest crash-safe format.
-
Add recovery. Start with max-output-token retry (easiest — just call the API again). Add reactive compaction later when you hit context window limits.
submitMessage(prompt)
|
build system prompt
append user message
persist transcript <-- blocking write
|
+-- queryLoop() --+
| |
v |
call API |
| |
tool_use? ---no----> return result
| ^
yes |
| |
execute tools |
(6-phase pipeline) |
| |
append tool_results |
to messages |
| |
budget check |
| |
continue -----------+
Chapter 4: Bootstrap & Startup¶
The bootstrap pipeline gets the agent from a cold process to a warm query loop in under 500ms, using parallel I/O and deferred initialization to hide latency.
4.1 The 7-Stage Pipeline¶
// ── Stage 1: Top-level prefetch ──
// These start BEFORE any imports finish loading.
// The goal is to overlap I/O with module initialization.
startMdmRawRead() // MDM subprocess — reads device config in parallel
startKeychainPrefetch() // Keychain read (~65ms) — overlaps with import time
// ── Stage 2: Warning handler + environment guards ──
process.on("warning", suppressExperimentalWarnings)
if (getNodeMajorVersion() < 18) {
exitWithError("Node.js 18+ required")
}
validatePlatform() // Linux, macOS, WSL — not raw Windows
// ── Stage 3: CLI parser + trust gate ──
const program = createCommand("claude")
.option("--print", "Non-interactive mode")
.option("--resume", "Resume last session")
.option("--model", "Model override")
.hook("preAction", authenticateUser) // Auth check before any action
// ── Stage 4: setup() + parallel loads ──
await Promise.all([
getCommands(), // Lazy-load all slash command definitions
loadAgentsDir(), // Discover agent YAML files from ~/.claude/agents/
initSessionMemory(), // Register memory hooks for session persistence
connectMcpServers(), // Start MCP server connections (can be slow)
])
// ── Stage 5: Deferred init after trust ──
if (userIsTrusted) {
initPlugins() // Load plugins from settings
initSkills() // Load skills from 5 directories:
// project, user, agents, built-in, MCP
initMcpPrefetch() // Pre-fetch MCP tool schemas in background
initHooks() // Register PreToolUse, PostToolUse, Stop hooks
}
// ── Stage 6: Mode routing ──
switch (mode) {
case "local":
startLocalRepl(queryEngine) // Standard interactive REPL
break
case "remote":
startRemoteSession(queryEngine) // Remote session control
break
case "ssh":
startSshProxy(queryEngine) // SSH tunneled session
break
case "teleport":
resumeTeleportedSession(config) // Resume from another machine
break
case "bridge":
startBridgeMode(queryEngine) // IDE extension communication
break
}
// ── Stage 7: Query engine submit loop ──
if (singlePrompt) {
// --print mode: one prompt, one response, exit
for await (const msg of queryEngine.submitMessage(prompt)) {
render(msg)
}
process.exit(0)
} else {
// REPL mode: read prompt, submit, render, repeat
while (true) {
const input = await readUserInput()
for await (const msg of queryEngine.submitMessage(input)) {
render(msg)
}
}
}
Latency hiding strategy: Stages 1 and 4 are the key performance wins. By starting keychain reads and MDM subprocess reads before imports finish, the I/O completes during time that would otherwise be spent loading modules. Stage 4 uses Promise.all to load commands, agents, memory, and MCP connections in parallel — any one of these can take 50-200ms, but they overlap.
Trust gating (Stage 5): Plugins, skills, and hooks only load after authentication succeeds. This prevents untrusted code from executing during the bootstrap — a plugin cannot run before the user's identity is verified.
4.2 System Prompt Assembly¶
The system prompt is built from two memoized context functions. Memoization ensures expensive operations (git status, file system reads) run at most once per session:
const getSystemContext = memoize(async (): Promise<SystemContext> => {
const parts: SystemContext = {}
// Git context — expensive subprocess call, cached
const gitStatus = await getGitStatus()
if (gitStatus) {
parts.gitStatus = gitStatus
}
// Working directory, platform info
parts.cwd = process.cwd()
parts.platform = process.platform
parts.shell = process.env.SHELL ?? "bash"
return parts
})
const getUserContext = memoize(async (): Promise<UserContext> => {
const parts: UserContext = {}
// CLAUDE.md files — project instructions from multiple directories
const memoryFiles = await getMemoryFiles()
const claudeMds = getClaudeMds(memoryFiles)
if (claudeMds.length > 0) {
parts.claudeMd = claudeMds
}
// Current date — models do not know today's date
parts.currentDate = `Today's date is ${getLocalISODate()}.`
return parts
})
Why two separate functions? System context (git status, platform) changes rarely and is project-scoped. User context (CLAUDE.md files, date) is user-scoped and includes personal instructions. Separating them allows different cache invalidation strategies — system context can be refreshed on directory change, while user context stays stable within a session.
CLAUDE.md resolution order: The system searches for instruction files in multiple directories: the project root, parent directories up to the git root, the user's home directory, and the ~/.claude/ config directory. Files found in more specific locations (project root) take precedence over general ones (home directory). This lets users set global defaults that projects can override.
4.3 Build It Yourself¶
3-step recipe for a minimal bootstrap:
-
Parse CLI arguments, then authenticate. Use any CLI parser (Commander, clap, argparse). Check authentication before loading any user-defined code (plugins, hooks, skills). The auth check should be a "pre-action" hook on the CLI parser so it runs before any command handler.
-
Load configuration in parallel. Identify your independent I/O operations (reading config files, connecting to external servers, discovering plugins) and run them concurrently. Even in a simple system,
Promise.all([loadConfig(), loadTools(), loadInstructions()])can save 100-300ms on startup. -
Build the system prompt from memoized functions. Separate context into categories (system, user, project) with independent cache lifetimes. Assemble the final prompt by concatenating sections. Include the current date — models do not know what day it is.
Process start
|
+-- prefetch I/O (keychain, config) } parallel with
+-- import modules } module loading
|
CLI parse + auth gate
|
Promise.all([
load commands,
load agents,
init memory,
connect MCP
])
|
mode routing --> REPL or single-prompt
|
queryEngine.submitMessage() loop
Chapter 5: Permission System¶
5.1 ToolPermissionContext¶
Show the type:
type ToolPermissionContext = DeepImmutable<{
mode: 'default' | 'auto' | 'plan' | 'bypass'
alwaysAllowRules: ToolPermissionRulesBySource
alwaysDenyRules: ToolPermissionRulesBySource
alwaysAskRules: ToolPermissionRulesBySource
additionalWorkingDirectories: Map<string, AdditionalWorkingDirectory>
}>
Rule sources in priority order:
policySettings > projectSettings > flagSettings > localSettings >
userSettings > cliArg > command > session
5.2 Permission Decision Chain¶
ASCII flow diagram:
Tool use request
↓
PreToolUse hooks → hook says allow/deny? → done
↓ (no decision)
Rule-based check (alwaysDeny > alwaysAllow > alwaysAsk)
↓ (no match)
Mode check:
auto → Transcript classifier (side-query to small model)
default → Interactive permission dialog
plan → Deny all mutations
bypass → Allow all
↓
PermissionDecision { behavior: allow|deny|ask, updatedInput?, reason? }
5.3 Auto-Mode Classifier¶
Show the concept:
type AutoModeRules = {
allow: string[] // "git status", "npm test"
soft_deny: string[] // "rm -rf", "DROP TABLE"
environment: string[] // "Node.js project", "uses PostgreSQL"
}
// Side-query to Haiku with the tool input + rules
// Returns: { matches: boolean, confidence: 'high'|'low', matchedDescription? }
5.4 Bash Security¶
- AST parsing via tree-sitter (not regex)
- Dangerous pattern detection: Bash(), python:, node* wildcards
- UNC path blocking (prevents NTLM credential leaks)
- Device file blacklist (/dev/zero, /dev/stdin, /dev/tty)
5.5 Minimal Python Model¶
@dataclass(frozen=True)
class ToolPermissionContext:
deny_names: frozenset[str] = field(default_factory=frozenset)
deny_prefixes: tuple[str, ...] = ()
def blocks(self, tool_name: str) -> bool:
lowered = tool_name.lower()
return lowered in self.deny_names or any(
lowered.startswith(p) for p in self.deny_prefixes
)
5.6 Build It Yourself¶
4-step recipe: 1. Define a PermissionContext with deny-lists (start simple like the Python model) 2. Add a rule-based checker with source priority 3. Implement an interactive dialog fallback for "ask" decisions 4. Optionally add a classifier for auto-mode (side-query to fast model)
Chapter 6: Session & State Management¶
6.1 AppState — Immutable Store¶
Show the core shape:
type AppState = DeepImmutable<{
settings: SettingsJson
mainLoopModel: ModelSetting
toolPermissionContext: ToolPermissionContext
verbose: boolean
agent: string | undefined
mcp: {
clients: MCPServerConnection[]
tools: Tool[]
resources: Record<string, ServerResource[]>
}
plugins: {
enabled: LoadedPlugin[]
disabled: LoadedPlugin[]
}
tasks: Record<string, TaskState>
}>
// Zustand-like store
type Store<T> = {
getState(): T
setState(fn: (prev: T) => T): void
subscribe(listener: (state: T) => void): () => void
}
6.2 React Context Integration¶
// Provider wraps entire app
<AppStateProvider store={store}>
<REPL />
</AppStateProvider>
// Components access state via hooks
function StatusLine() {
const model = useAppState(s => s.mainLoopModel)
const cost = useAppState(s => s.totalCost)
return <Text>{model} | ${cost.toFixed(4)}</Text>
}
6.3 Context Compression¶
Show the constants and strategy:
const POST_COMPACT_TOKEN_BUDGET = 50_000 // total budget for restored context
const POST_COMPACT_MAX_FILES = 5 // max files to restore
const POST_COMPACT_MAX_PER_FILE = 5_000 // tokens per restored file
const POST_COMPACT_MAX_PER_SKILL = 5_000 // tokens per restored skill
const POST_COMPACT_SKILLS_BUDGET = 25_000 // total budget for skills
// Compact flow:
// 1. Strip images from messages (prevent prompt-too-long in compact call)
// 2. Send to compaction API → receive summary
// 3. Replace pre-compact messages with summary + boundary marker
// 4. Restore top-N files and skills within token budget
// 5. Release pre-compact memory for GC (splice mutableMessages)
6.4 Transcript Persistence¶
Write-before-query pattern:
┌─────────────────────────────────────────────┐
│ User message → recordTranscript() [BLOCK] │ ← crash recovery
│ API call starts... │
│ Assistant streams → void recordTranscript() │ ← fire-and-forget
│ Tool results... │
│ Turn complete → flushSessionStorage() │ ← if EAGER_FLUSH
└─────────────────────────────────────────────┘
Session save/load:
# Python minimal model
@dataclass(frozen=True)
class StoredSession:
session_id: str
messages: tuple[str, ...]
input_tokens: int
output_tokens: int
def save_session(session, directory):
path = directory / f'{session.session_id}.json'
path.write_text(json.dumps(asdict(session)))
return path
6.5 Build It Yourself¶
4-step recipe.
Chapter 7: Multi-Agent Orchestration¶
7.1 Agent Spawning¶
Show the AgentTool input schema:
const inputSchema = z.object({
description: z.string(), // Short task description
prompt: z.string(), // The task for the agent
subagent_type: z.string().optional(), // Specialized agent type
model: z.enum(['sonnet', 'opus', 'haiku']).optional(),
run_in_background: z.boolean().optional(),
isolation: z.enum(['worktree']).optional(),
})
Two execution modes:
Synchronous: result = await runAgent(agentDef, prompt, subContext)
return { status: 'completed', result: result.output }
Asynchronous: const agentId = createAgentId()
registerAsyncAgent(agentId, () => runAgent(...))
return { status: 'async_launched', agentId }
7.2 Coordinator Mode¶
ASCII diagram:
┌─────────────────────────────────────────┐
│ Coordinator (restricted tools) │
│ Tools: Agent, SendMessage, TaskStop │
│ Role: orchestrate, synthesize, report │
├─────────────────────────────────────────┤
│ ↓ spawn ↓ spawn ↓ spawn │
│ Worker 1 Worker 2 Worker 3 │
│ (full tools) (full tools) (full tools) │
│ Bash,Read, Bash,Read, Bash,Read, │
│ Edit,Grep, Edit,Grep, Edit,Grep, │
│ Glob,Web... Glob,Web... Glob,Web... │
└─────────────────────────────────────────┘
Show the coordinator system prompt (condensed):
You are a coordinator. Your job is to:
- Direct workers to research, implement, and verify code changes
- Synthesize results and communicate with the user
- Answer questions directly when possible
Workers arrive as <task-notification> XML:
<task-id>{agentId}</task-id>
<status>completed|failed</status>
<result>{agent's response}</result>
Parallelism is your superpower. Launch independent workers concurrently.
7.3 Prompt Cache Sharing¶
Show CacheSafeParams:
type CacheSafeParams = {
systemPrompt: SystemPrompt // Must match parent exactly
userContext: Record<string, string>
systemContext: Record<string, string>
toolUseContext: ToolUseContext
forkContextMessages: Message[] // Parent's message history
}
// Saved after each turn, read by forked agents
let lastCacheSafeParams: CacheSafeParams | null = null
Why: Sub-agents that share CacheSafeParams get cache hits on the parent's prompt prefix → saves tokens/money.
7.4 SendMessage Routing¶
to="worker-name" → direct message to named agent
to="*" → broadcast to all agents
to="uds:<socket>" → Unix domain socket peer
to="bridge:<session>" → IDE bridge session
7.5 Build It Yourself¶
4-step recipe: 1. Define AgentTool that clones parent context and runs a nested query loop 2. Share system prompt + message history prefix for cache efficiency 3. Add SendMessage for inter-agent communication via mailbox pattern 4. Implement coordinator mode by restricting coordinator to Agent + SendMessage only
Chapter 8: MCP Integration¶
8.1 Transport Types¶
type McpTransport = 'stdio' | 'sse' | 'http' | 'ws' | 'sdk'
// stdio: Local subprocess (most common for local tools)
// sse: HTTP Server-Sent Events (remote servers)
// http: Streamable HTTP (newer protocol)
// ws: WebSocket (bidirectional)
// sdk: In-process SDK (no network)
8.2 Connection Lifecycle¶
ASCII diagram:
Settings (5 config scopes) → Config Discovery
↓
connectToServer(config)
↓
Transport selection:
stdio → spawn subprocess, StdioClientTransport
sse → SSEClientTransport with OAuth
http → StreamableHTTPClientTransport
↓
client.listTools() → MCPTool wrapper → Tool interface
↓
assembleToolPool() merges MCP tools with built-in tools
↓
Tool calls: mcp__server__toolName → route to client.callTool()
8.3 Reconnection Strategy¶
const RECONNECT = {
maxAttempts: 5,
initialBackoffMs: 1000,
maxBackoffMs: 30000, // 30 seconds
giveUpMs: 600000, // 10 minutes
}
// Exponential backoff: delay = min(initialMs * 2^attempt, maxMs)
// On reconnect: re-fetch tools (ToolListChangedNotification)
// On auth failure: transition to 'needs-auth' state
8.4 Connection States¶
type MCPServerConnection =
| { type: 'connected', client: Client, tools: Tool[] }
| { type: 'failed', error: string }
| { type: 'needs-auth', serverName: string }
| { type: 'pending', reconnectAttempt: number }
| { type: 'disabled', reason: string }
8.5 Plugin MCP Dedup¶
Plugin servers namespaced: plugin:name:server
Content-based signature: hash(URL + command_hash)
Manual configs always win over plugin duplicates
First-loaded plugin wins among plugin servers
8.6 Build It Yourself¶
3-step recipe: 1. Create an MCP client with transport abstraction (start with stdio) 2. Fetch tools from server, wrap each as your Tool interface 3. Merge into assembleToolPool() with built-in tools 4. Add reconnection with exponential backoff
Chapter 9: Slash Command System¶
9.1 Command Types¶
type Command = CommandBase & (PromptCommand | LocalCommand | LocalJSXCommand)
// PromptCommand: Sends prompt to Claude API with restricted tool access
// Example: /commit — AI generates commit message, only git tools allowed
// Returns: ContentBlockParam[] (prompt for the model)
// LocalCommand: Runs JavaScript locally, no API call
// Example: /cost — reads session cost, returns text
// Returns: { type: 'text', value: string } | { type: 'compact' } | 'skip'
// LocalJSXCommand: Renders interactive React/Ink UI
// Example: /config — opens settings panel
// Returns: JSX element with onDone callback
9.2 Command Registry¶
// All commands are lazy-loaded to keep startup fast
import commit from './commands/commit/index.js'
import review from './commands/review/index.js'
// Feature-gated conditional imports
const voiceCommand = feature('VOICE_MODE')
? require('./commands/voice/index.js').default
: null
// Properties that control command behavior:
// isEnabled() — runtime check (auth state, env vars)
// isHidden — hide from typeahead but still loadable
// disableModelInvocation — model can't auto-invoke this command
// userInvocable — available as /command in REPL
// immediate — execute without waiting for stop point
// loadedFrom — 'skills' | 'plugin' | 'managed' | 'bundled'
9.3 Skill Loading¶
type LoadedFrom = 'skills' | 'plugin' | 'managed' | 'bundled' | 'mcp'
// 5 directory sources (priority order):
// 1. Managed → .claude/settings-managed/skills (org-pushed)
// 2. Project → .claude/skills/ (repo-local)
// 3. User → ~/.claude/skills/ (global)
// 4. Plugin → from installed plugins
// 5. Bundled → compiled into the binary
// Skills are Markdown files with frontmatter:
// ---
// name: my-skill
// description: What this skill does
// model: claude-sonnet-4-6
// allowed_tools: [Bash, Read, Edit]
// ---
// Prompt content here...
// Two execution modes:
// Inline → Expand prompt into current conversation context
// Fork → Spawn isolated sub-agent with skill prompt
9.4 Build It Yourself¶
3-step recipe: 1. Define a Command interface with name, type, and execute/load methods 2. Register commands with lazy loading (import only when invoked) 3. Add skill directory scanning with frontmatter parsing
Chapter 10: Terminal UI (React/Ink)¶
10.1 Why React in a Terminal¶
Ink renders React components to ANSI terminal output. Benefits: - Component composition (reuse MessageView, ProgressBar, DiffViewer) - Reactive updates (state change → re-render) - Hooks (useState, useEffect, custom hooks like useCanUseTool)
10.2 Component Hierarchy¶
<AppStateProvider store={store}>
<REPL>
├── <MessageList>
│ ├── <UserMessage />
│ ├── <AssistantMessage />
│ ├── <ToolUseMessage>
│ │ └── tool.renderToolUseMessage(input)
│ ├── <ToolResultMessage>
│ │ └── tool.renderToolResultMessage(output)
│ └── <ToolProgressMessage>
│ └── tool.renderToolUseProgressMessage(progress)
├── <PermissionDialog />
├── <TaskPanel />
├── <StatusLine />
└── <PromptInput />
</REPL>
</AppStateProvider>
10.3 Per-Tool Rendering¶
Each tool implements its own UI:
// In GlobTool:
renderToolUseMessage(input) {
return <Text>Finding files matching {input.pattern}...</Text>
}
renderToolResultMessage(output) {
return <Box flexDirection="column">
{output.filenames.map(f => <Text key={f}>{f}</Text>)}
<Text dimColor>Found {output.numFiles} files in {output.durationMs}ms</Text>
</Box>
}
// BashTool renders streaming terminal output
// FileEditTool renders colored diffs
// AgentTool renders progress bars for sub-agents
10.4 Build It Yourself¶
3-step recipe: 1. Use Ink (React for terminals) or a simpler TUI framework (blessed, bubbletea for Go) 2. Create a message renderer that dispatches to per-tool render functions 3. Use a reactive store (Zustand, Redux, or signals) for state → UI binding 4. Implement a permission dialog component for interactive approval
Chapter 11: IDE Bridge Protocol¶
11.1 Architecture¶
IDE Extension (VS Code / JetBrains)
↕ WebSocket + SSE
Bridge Server (cloud)
↕ HTTP + WebSocket
CLI Process (local)
11.2 Key Types¶
type BridgeConfig = {
dir: string
machineName: string
branch: string
maxSessions: number
spawnMode: 'single-session' | 'worktree' | 'same-dir'
bridgeId: string // Client-generated UUID
environmentId: string // For idempotent registration
}
type WorkSecret = {
version: number
session_ingress_token: string
api_base_url: string
auth: Array<{ type: string; token: string }>
mcp_config?: unknown
environment_variables?: Record<string, string>
}
type SessionHandle = {
sessionId: string
done: Promise<SessionDoneStatus>
kill(): void
writeStdin(data: string): void
}
11.3 Spawn Modes¶
single-session → One CLI process per bridge connection
worktree → Git worktree per session (full isolation)
same-dir → Multiple sessions in same working directory
11.4 Permission Proxying¶
CLI needs permission → bridge sends control_request to IDE
IDE shows dialog → user decides → bridge sends control_response
CLI receives decision → continues tool execution
11.5 Backoff Configuration¶
const BACKOFF = {
connInitialMs: 2000,
connCapMs: 120000, // 2 minutes
connGiveUpMs: 600000, // 10 minutes
generalInitialMs: 500,
generalCapMs: 30000,
generalGiveUpMs: 600000,
}
11.6 Build It Yourself¶
3-step recipe: 1. Define a message protocol (JSON over WebSocket) for session control 2. Implement spawn modes (start with single-session) 3. Add permission proxying (forward CLI permission requests to IDE UI)
Chapter 12: Memory & Cost Tracking¶
12.1 Memory System (MEMORY.md)¶
const ENTRYPOINT_NAME = 'MEMORY.md'
const MAX_ENTRYPOINT_LINES = 200
const MAX_ENTRYPOINT_BYTES = 25_000 // ~125 chars/line
function truncateEntrypointContent(raw: string) {
const lines = raw.trim().split('\n')
// Dual-cap: lines first (natural boundary), then bytes
let truncated = lines.length > MAX_LINES
? lines.slice(0, MAX_LINES).join('\n')
: raw.trim()
if (truncated.length > MAX_BYTES) {
const cutAt = truncated.lastIndexOf('\n', MAX_BYTES)
truncated = truncated.slice(0, cutAt > 0 ? cutAt : MAX_BYTES)
}
if (wasLineTruncated || wasByteTruncated) {
truncated += '\n\n> WARNING: MEMORY.md truncated. Keep index entries short.'
}
return truncated
}
Memory types (stored as individual files with frontmatter):
---
name: user-preferences
description: How the user likes to work
type: user | feedback | project | reference
---
Content here...
MEMORY.md is an index (one line per entry, <150 chars each), pointing to topic files.
12.2 Cost Tracking¶
type StoredCostState = {
totalCostUSD: number
totalAPIDuration: number
totalToolDuration: number
totalLinesAdded: number
totalLinesRemoved: number
modelUsage: {
[model: string]: {
inputTokens: number
outputTokens: number
cacheReadInputTokens: number
cacheCreationInputTokens: number
costUSD: number
}
}
}
// Per-API-call: accumulate usage from response
function addToTotalSessionCost(cost, usage, model) {
addToTotalModelUsage(cost, usage, model)
costCounter?.add(cost, { model })
tokenCounter?.add(usage.input_tokens, { model, type: 'input' })
tokenCounter?.add(usage.output_tokens, { model, type: 'output' })
tokenCounter?.add(usage.cache_read_input_tokens, { model, type: 'cacheRead' })
}
// Saved per-session to project config
function saveCurrentSessionCosts() {
saveProjectConfig(current => ({
...current,
lastCost: getTotalCostUSD(),
lastModelUsage: getModelUsage(),
lastSessionId: getSessionId(),
}))
}
12.3 Build It Yourself¶
3-step recipe: 1. Create a MEMORY.md file with truncation (200 lines / 25KB dual cap) 2. Track per-model token usage (input, output, cache read, cache creation) 3. Calculate USD cost per model using published pricing tables 4. Persist cost state per-session to project config
Appendix¶
A.1 File Index¶
| System | Key File | Purpose |
|---|---|---|
| Tool types | src/Tool.ts |
Core Tool interface, buildTool(), ToolPermissionContext |
| Tool registry | src/tools.ts |
getAllBaseTools(), getTools(), assembleToolPool() |
| Tool execution | src/services/tools/toolExecution.ts |
runToolUse(), 6-phase pipeline |
| Streaming executor | src/services/tools/StreamingToolExecutor.ts |
Parallel tool execution |
| Query engine | src/QueryEngine.ts |
submitMessage(), session state |
| Query loop | src/query.ts |
queryLoop(), recovery, budget gates |
| Bootstrap | src/main.tsx |
Startup prefetch, CLI parsing |
| Setup | src/setup.ts |
Initialization sequence |
| Context | src/context.ts |
System prompt assembly |
| Permissions | src/utils/permissions/ |
Rule checking, classifiers |
| State | src/state/AppStateStore.ts |
AppState type, store |
| Compact | src/services/compact/compact.ts |
Context compression |
| Coordinator | src/coordinator/coordinatorMode.ts |
Multi-agent system prompt |
| Forked agents | src/utils/forkedAgent.ts |
CacheSafeParams, cache sharing |
| MCP client | src/services/mcp/client.ts |
Connection management |
| MCP types | src/services/mcp/types.ts |
Transport, config schemas |
| Skills | src/skills/loadSkillsDir.ts |
Skill directory scanning |
| Commands | src/commands.ts |
Command registry |
| Bridge | src/bridge/types.ts |
IDE bridge protocol |
| Bridge main | src/bridge/bridgeMain.ts |
Bridge orchestration |
| Memory | src/memdir/memdir.ts |
MEMORY.md management |
| Cost | src/cost-tracker.ts |
Token/USD tracking |
| GlobTool | src/tools/GlobTool/GlobTool.ts |
Reference tool implementation |
| BashTool | src/tools/BashTool/BashTool.tsx |
Complex tool with streaming |
| AgentTool | src/tools/AgentTool/AgentTool.tsx |
Sub-agent spawning |
A.2 Glossary¶
| Term | Definition |
|---|---|
| ToolUseContext | Runtime context passed to every tool call -- includes tools list, app state, abort controller, file cache, MCP clients, and permission context |
| AppState | Immutable application state (Zustand-like store) -- settings, model, permissions, MCP, plugins, tasks |
| CacheSafeParams | Parameters that must match between parent and forked agent to share prompt cache -- system prompt, user context, tools, messages |
| ToolPermissionContext | Frozen permission rules from all sources (policy, project, user, CLI, session) with deny/allow/ask behaviors |
| StreamingToolExecutor | Executes tools as they stream in from the API, managing concurrency (safe tools run in parallel, unsafe tools run exclusively) |
| Reactive compact | Automatic context compression triggered when messages exceed token threshold -- summarizes old messages, restores key files |
| MCPTool | Wrapper that adapts an MCP server's tool to the internal Tool interface -- passthrough input/output, 100K char limit |
| Forked agent | Sub-agent that shares the parent's prompt cache via CacheSafeParams -- used for skills, post-turn analysis, and background work |
| Prompt cache | Anthropic API feature where identical system prompt + message prefix = cached (cheaper, faster). Cache key includes: system prompt, tools, model, messages, thinking config |
| Trust gate | Deferred initialization that only runs after workspace trust is established -- plugins, skills, MCP, hooks are blocked until trusted |
A.3 Further Reading¶
- Claude API Documentation: https://docs.anthropic.com/en/docs
- MCP Specification: https://modelcontextprotocol.io
- Ink (React for terminals): https://github.com/vadimdemedes/ink
- Zod (TypeScript schema validation): https://zod.dev