The Claude Code Harness · v1

The harness, fully decomposed.

A visual decomposition of the runtime loop that turns a user message into a sequence of model calls and tool executions. Context assembly, the inference call, the output fork, hook intercepts, the permission gate, the dispatch fanout into built-in tools / sub-agents / MCP servers, and the result that closes the loop. Hover or click any node to jump to its detail card.

INPUTUser-facing edges of the loop

CONTEXTWhat the model sees

MODELInference

DECISIONBranch & dispatch

HOOKUser-configured shell intercepts

PERMISSIONAuto-allow / prompt / deny

OUTPUTStreamed to the user

CONFIGSettings, memory, compaction

Lifecycle of a tool-using turn

A single trip around the loop, broken into the steps the harness actually performs.

01User submits a turn
Slash command resolves to a skill; UserPromptSubmit hook runs and may append context, modify the prompt, or block with exit code 2.
02Context assembled
System prompt + environment + CLAUDE.md + tool schemas + deferred-tool index + skills + history are concatenated into one prompt.
03Model streams a response
Tokens emit interleaved text and tool_use blocks. Reasoning content may precede the visible reply when extended thinking is enabled.
04Output fork
No tool calls → text streams to the user, Stop hook fires, turn ends. Otherwise every tool_use enters the dispatch pipeline.
05PreToolUse hook
Matcher-scoped shell commands receive the call on stdin. They can approve, rewrite the input, or block (exit 2) with stderr fed back to the model.
06Permission check
Glob-matched allow/deny rules from settings.json across enterprise / user / project scopes. Unmatched calls prompt the user unless mode bypasses it.
07Tool dispatch
Built-in tools run in-process. Sub-agents spawn isolated context windows that return a single summary message. MCP calls cross to external servers.
08PostToolUse hook + result
Post hook reacts (format, test, stage). The tool_result block is keyed to its tool_use_id and inserted into the next model turn.
09Loop closes
Model is called again with updated context. The cycle continues until a turn emits no tool_use — at which point Stop hooks run.

Subsystems

Each node from the diagram, expanded. Clicking a node above scrolls here.

User Input

Prompt + slash commands + skill invocations

Every turn starts with a user message. A leading slash invokes a registered skill, which the harness loads into context before the model sees the turn.

Plain text, images, file references
Slash command resolves to a skill name and arguments
UserPromptSubmit hook can mutate or block the prompt before context assembly
Hook output is concatenated into context as additional user material

UserPromptSubmit → settings.json hooks[]
→ hook stdout appended to user turn
→ exit code 2 blocks the turn

Context Assembly

System prompt, memory, environment, tool schemas, history

Before each model call the harness builds a single context window: identity + behavior rules, project memory, runtime environment, the catalog of currently-loaded tool schemas, prior turns, and any deferred tool definitions resolved via ToolSearch.

System prompt — identity, tone, code-style rules, doing-tasks rules
Environment block — cwd, platform, model id, git status, today's date
Memory files — CLAUDE.md (project) and ~/.claude/CLAUDE.md (user)
Tool schemas — JSONSchema for every tool in scope this turn
Deferred tools — names only, hydrated on demand via ToolSearch
Skills — names + when-to-use blurbs, full body loaded only on /invoke
Conversation history — earlier turns + previous tool results

context = [
  systemPrompt,
  environmentBlock,
  memoryFiles,
  toolSchemas,
  deferredToolIndex,
  skillsRegistry,
  ...history,
  currentUserTurn,
]

Claude Model

Opus / Sonnet / Haiku inference with optional extended thinking

The assembled context is sent to a Claude model. The harness picks the model family (Opus for hardest reasoning, Sonnet for default, Haiku for fast/cheap). Extended thinking is opt-in via configuration.

Streaming tokens — text and structured tool_use blocks interleaved
Reasoning ('thinking') content may precede the visible reply
Tool-call blocks emit name + JSON input that must match the tool schema
Multiple tool_use blocks in one turn = parallel execution
Model never sees raw filesystem — only what tools return

opus     claude-opus-4-7        // hardest reasoning
sonnet   claude-sonnet-4-6      // default
haiku    claude-haiku-4-5-20251001  // fast + cheap

Output Decision

Text reply vs. tool call(s)

Each streamed turn ends either with text (the assistant talking to the user) or with one or more tool_use blocks (the assistant asking the harness to do something). The harness branches accordingly.

Text-only turn → flush to UI, fire Stop hook, await next user turn
Tool turn → suspend the model, run each tool call to completion
Mixed turn → text streamed first, then tool calls executed
Parallel tool_use blocks run concurrently; results stitched back in order

Text Reply

Streamed to the user, ends the turn

Plain text the user sees. When the assistant stops without issuing a tool call, the harness fires Stop hooks, persists the turn, and yields back to the user.

GitHub-flavored markdown rendered in CommonMark
Stop hook can append text or kick the model again
SessionEnd hook fires when the container is reclaimed

Tool Call

Structured request the harness must execute

The model emits a tool_use block with a tool name and JSON input. The harness validates the input against the tool's JSONSchema, then enters the dispatch pipeline.

name + input_schema validation rejects bad calls before they run
Calls can target built-in tools, sub-agents, or MCP servers
Bash supports run_in_background → process tracked, output streamed via Monitor
Up to N parallel calls per turn — each independent

{
  "type": "tool_use",
  "name": "Edit",
  "input": { "file_path": "...", "old_string": "...", "new_string": "..." }
}

PreToolUse Hook

Intercept tool calls before they execute

Configured shell commands receive the pending tool call on stdin. They can approve it, modify the input, or block it outright. This is where lint-on-write, format-on-write, and policy gates live.

Configured per-matcher in settings.json (e.g. matcher: 'Edit|Write')
Exit 0 → allow, exit 2 → block with stderr shown to model
stdout can rewrite the tool input before dispatch
Runs before the permission check

"hooks": {
  "PreToolUse": [{
    "matcher": "Edit|Write",
    "hooks": [{ "type": "command", "command": "npm run lint:check" }]
  }]
}

Permission Check

Auto-allow rules, mode policy, user approval

The harness consults the permission system: rule-matched tools execute silently, others prompt the user. Permission mode (default / acceptEdits / bypass / plan) tunes how aggressive auto-approval is.

settings.json permissions.allow + permissions.deny — glob-matched
Project, user, and enterprise scopes are merged
Plan mode blocks all writes/edits until user exits via ExitPlanMode
Denied calls return a tool_error that the model can react to

"permissions": {
  "allow": ["Bash(npm test:*)", "Read(**)"],
  "deny":  ["Bash(rm -rf*)", "Write(/etc/**)"],
  "defaultMode": "acceptEdits"
}

Tool Dispatch

Built-in tools, sub-agents, MCP servers

Approved calls run in three families: built-in tools (file + shell + web), sub-agents spawned via the Agent/Task tool (each gets its own context window), and MCP server tools (external integrations registered at startup).

Built-in: Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch, TodoWrite, NotebookEdit
Sub-agents: general-purpose, Explore, Plan, code-reviewer, statusline-setup — isolated context, return one message
MCP: mcp__github__*, mcp__slack__*, custom servers — stdio or http transport
Background processes tracked separately, surfaced via Monitor
Each tool result becomes a tool_result block in the next model turn

Agent({
  subagent_type: "Explore",
  description: "find auth flow",
  prompt: "Locate session-cookie issuing code..."
})  // → independent context, summarized back

PostToolUse Hook

React to a completed tool call

Mirror of PreToolUse, but runs after the tool returns. Receives the tool result on stdin. Used for auto-format, auto-test, auto-stage, or to feed extra context back into the model.

Same matcher syntax as PreToolUse
stdout is appended to the tool_result the model sees
Common uses: prettier on Edit, test runner on Write, git status after Bash

Tool Result

Stitched into the next model turn — loop closes

Every tool result is wrapped as a tool_result block keyed to its tool_use_id and inserted into the context for the next inference call. The model sees the result and decides what to do next.

tool_use_id pairs result to call
Errors arrive as is_error: true — model can self-correct
Long output may be truncated; full content stays in cache
Loop continues until the model emits a turn with no tool calls

Settings

settings.json across enterprise / user / project scopes

Config layer that shapes everything else: which permissions auto-approve, which env vars to inject, which hooks fire on which events, which MCP servers to spin up.

Resolution order: enterprise → user (~/.claude) → project (.claude) → project.local
Last write wins per leaf key
model, env, permissions, hooks, mcpServers, apiKeyHelper
/config and the update-config skill edit these safely

Memory Files

CLAUDE.md project + user files, surfaced every turn

Long-lived instructions the model should remember. Project CLAUDE.md travels with the repo; user CLAUDE.md follows the developer. Both are injected into context near the top of every turn.

CLAUDE.md — repo conventions, build commands, code-style rules
AGENTS.md — agent-targeted notes (this repo uses one)
~/.claude/CLAUDE.md — personal preferences across all projects
Edits via /remember or direct file write — picked up next turn

Compaction

Summarize history when the window is near full

Before the context window overflows, the harness summarizes earlier turns into a compact recap and replaces the raw history with it. The model keeps working without losing the thread.

Triggered as token budget approaches the configured limit
Recent turns kept verbatim; older turns folded into summary
Tool results may be elided if their content is recoverable from later state
Transparent to the model — same context shape, smaller footprint

Background & Monitor

Long-running processes outside the turn loop

Bash with run_in_background spawns a tracked child process. Its stdout streams via the Monitor tool so the model can subscribe to events (e.g. PR webhooks, dev-server logs) without sleeping.

Bash(run_in_background: true) returns a process handle
Monitor reads new lines as notifications when new output appears
Used for: PR activity subscriptions, dev servers, file watchers
Process cleaned up at session end