The Claude Code Harness · v1

The harness, fully decomposed.

A visual decomposition of the runtime loop that turns a user message into a sequence of model calls and tool executions. Context assembly, the inference call, the output fork, hook intercepts, the permission gate, the dispatch fanout into built-in tools / sub-agents / MCP servers, and the result that closes the loop. Hover or click any node to jump to its detail card.

INPUTUser-facing edges of the loop
CONTEXTWhat the model sees
MODELInference
DECISIONBranch & dispatch
HOOKUser-configured shell intercepts
PERMISSIONAuto-allow / prompt / deny
OUTPUTStreamed to the user
CONFIGSettings, memory, compaction
assembled promptno tool callstool_use blockapprovedloop · next turnUSER MESSAGEprompt · slash command · imageCONTEXT ASSEMBLYwhat the model sees this turnSystem promptEnvironment blockCLAUDE.md / AGENTS.mdTool schemas + deferred indexSkills registryConversation historyCLAUDE MODELopus · sonnet · haikuOUTPUT FORKtext or tool_useTEXT REPLYstreamed to user · Stop hook firesTOOL CALLstructured tool_use blockPreToolUse HOOKapprove · rewrite · blockPERMISSION CHECKallow · prompt · denyTOOL DISPATCHexecute approved callBUILT-INReadWriteEditBashGlobGrepTodoWriteWebFetchWebSearchNotebookEditSUB-AGENTSgeneral-purposeExplorePlancode-reviewerstatusline-setupMCP SERVERSmcp__github__*mcp__slack__*stdio transportshttp transportscustom serversPostToolUse HOOKreact · format · testTOOL RESULTtool_result blockSETTINGSsettings.json · perms · hooks · envMEMORYCLAUDE.md · AGENTS.md · ~/.claudeCOMPACTIONsummarize when window fillsBG + MONITORrun_in_background · streamshover any node to focus · click to jump to detail

Lifecycle of a tool-using turn

A single trip around the loop, broken into the steps the harness actually performs.

  1. 01User submits a turn

    Slash command resolves to a skill; UserPromptSubmit hook runs and may append context, modify the prompt, or block with exit code 2.

  2. 02Context assembled

    System prompt + environment + CLAUDE.md + tool schemas + deferred-tool index + skills + history are concatenated into one prompt.

  3. 03Model streams a response

    Tokens emit interleaved text and tool_use blocks. Reasoning content may precede the visible reply when extended thinking is enabled.

  4. 04Output fork

    No tool calls → text streams to the user, Stop hook fires, turn ends. Otherwise every tool_use enters the dispatch pipeline.

  5. 05PreToolUse hook

    Matcher-scoped shell commands receive the call on stdin. They can approve, rewrite the input, or block (exit 2) with stderr fed back to the model.

  6. 06Permission check

    Glob-matched allow/deny rules from settings.json across enterprise / user / project scopes. Unmatched calls prompt the user unless mode bypasses it.

  7. 07Tool dispatch

    Built-in tools run in-process. Sub-agents spawn isolated context windows that return a single summary message. MCP calls cross to external servers.

  8. 08PostToolUse hook + result

    Post hook reacts (format, test, stage). The tool_result block is keyed to its tool_use_id and inserted into the next model turn.

  9. 09Loop closes

    Model is called again with updated context. The cycle continues until a turn emits no tool_use — at which point Stop hooks run.

Subsystems

Each node from the diagram, expanded. Clicking a node above scrolls here.

User Input
Prompt + slash commands + skill invocations

Every turn starts with a user message. A leading slash invokes a registered skill, which the harness loads into context before the model sees the turn.

  • Plain text, images, file references
  • Slash command resolves to a skill name and arguments
  • UserPromptSubmit hook can mutate or block the prompt before context assembly
  • Hook output is concatenated into context as additional user material
UserPromptSubmit → settings.json hooks[]
→ hook stdout appended to user turn
→ exit code 2 blocks the turn
Context Assembly
System prompt, memory, environment, tool schemas, history

Before each model call the harness builds a single context window: identity + behavior rules, project memory, runtime environment, the catalog of currently-loaded tool schemas, prior turns, and any deferred tool definitions resolved via ToolSearch.

  • System prompt — identity, tone, code-style rules, doing-tasks rules
  • Environment block — cwd, platform, model id, git status, today's date
  • Memory files — CLAUDE.md (project) and ~/.claude/CLAUDE.md (user)
  • Tool schemas — JSONSchema for every tool in scope this turn
  • Deferred tools — names only, hydrated on demand via ToolSearch
  • Skills — names + when-to-use blurbs, full body loaded only on /invoke
  • Conversation history — earlier turns + previous tool results
context = [
  systemPrompt,
  environmentBlock,
  memoryFiles,
  toolSchemas,
  deferredToolIndex,
  skillsRegistry,
  ...history,
  currentUserTurn,
]
Claude Model
Opus / Sonnet / Haiku inference with optional extended thinking

The assembled context is sent to a Claude model. The harness picks the model family (Opus for hardest reasoning, Sonnet for default, Haiku for fast/cheap). Extended thinking is opt-in via configuration.

  • Streaming tokens — text and structured tool_use blocks interleaved
  • Reasoning ('thinking') content may precede the visible reply
  • Tool-call blocks emit name + JSON input that must match the tool schema
  • Multiple tool_use blocks in one turn = parallel execution
  • Model never sees raw filesystem — only what tools return
opus     claude-opus-4-7        // hardest reasoning
sonnet   claude-sonnet-4-6      // default
haiku    claude-haiku-4-5-20251001  // fast + cheap
Output Decision
Text reply vs. tool call(s)

Each streamed turn ends either with text (the assistant talking to the user) or with one or more tool_use blocks (the assistant asking the harness to do something). The harness branches accordingly.

  • Text-only turn → flush to UI, fire Stop hook, await next user turn
  • Tool turn → suspend the model, run each tool call to completion
  • Mixed turn → text streamed first, then tool calls executed
  • Parallel tool_use blocks run concurrently; results stitched back in order
Text Reply
Streamed to the user, ends the turn

Plain text the user sees. When the assistant stops without issuing a tool call, the harness fires Stop hooks, persists the turn, and yields back to the user.

  • GitHub-flavored markdown rendered in CommonMark
  • Stop hook can append text or kick the model again
  • SessionEnd hook fires when the container is reclaimed
Tool Call
Structured request the harness must execute

The model emits a tool_use block with a tool name and JSON input. The harness validates the input against the tool's JSONSchema, then enters the dispatch pipeline.

  • name + input_schema validation rejects bad calls before they run
  • Calls can target built-in tools, sub-agents, or MCP servers
  • Bash supports run_in_background → process tracked, output streamed via Monitor
  • Up to N parallel calls per turn — each independent
{
  "type": "tool_use",
  "name": "Edit",
  "input": { "file_path": "...", "old_string": "...", "new_string": "..." }
}
PreToolUse Hook
Intercept tool calls before they execute

Configured shell commands receive the pending tool call on stdin. They can approve it, modify the input, or block it outright. This is where lint-on-write, format-on-write, and policy gates live.

  • Configured per-matcher in settings.json (e.g. matcher: 'Edit|Write')
  • Exit 0 → allow, exit 2 → block with stderr shown to model
  • stdout can rewrite the tool input before dispatch
  • Runs before the permission check
"hooks": {
  "PreToolUse": [{
    "matcher": "Edit|Write",
    "hooks": [{ "type": "command", "command": "npm run lint:check" }]
  }]
}
Permission Check
Auto-allow rules, mode policy, user approval

The harness consults the permission system: rule-matched tools execute silently, others prompt the user. Permission mode (default / acceptEdits / bypass / plan) tunes how aggressive auto-approval is.

  • settings.json permissions.allow + permissions.deny — glob-matched
  • Project, user, and enterprise scopes are merged
  • Plan mode blocks all writes/edits until user exits via ExitPlanMode
  • Denied calls return a tool_error that the model can react to
"permissions": {
  "allow": ["Bash(npm test:*)", "Read(**)"],
  "deny":  ["Bash(rm -rf*)", "Write(/etc/**)"],
  "defaultMode": "acceptEdits"
}
Tool Dispatch
Built-in tools, sub-agents, MCP servers

Approved calls run in three families: built-in tools (file + shell + web), sub-agents spawned via the Agent/Task tool (each gets its own context window), and MCP server tools (external integrations registered at startup).

  • Built-in: Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch, TodoWrite, NotebookEdit
  • Sub-agents: general-purpose, Explore, Plan, code-reviewer, statusline-setup — isolated context, return one message
  • MCP: mcp__github__*, mcp__slack__*, custom servers — stdio or http transport
  • Background processes tracked separately, surfaced via Monitor
  • Each tool result becomes a tool_result block in the next model turn
Agent({
  subagent_type: "Explore",
  description: "find auth flow",
  prompt: "Locate session-cookie issuing code..."
})  // → independent context, summarized back
PostToolUse Hook
React to a completed tool call

Mirror of PreToolUse, but runs after the tool returns. Receives the tool result on stdin. Used for auto-format, auto-test, auto-stage, or to feed extra context back into the model.

  • Same matcher syntax as PreToolUse
  • stdout is appended to the tool_result the model sees
  • Common uses: prettier on Edit, test runner on Write, git status after Bash
Tool Result
Stitched into the next model turn — loop closes

Every tool result is wrapped as a tool_result block keyed to its tool_use_id and inserted into the context for the next inference call. The model sees the result and decides what to do next.

  • tool_use_id pairs result to call
  • Errors arrive as is_error: true — model can self-correct
  • Long output may be truncated; full content stays in cache
  • Loop continues until the model emits a turn with no tool calls
Settings
settings.json across enterprise / user / project scopes

Config layer that shapes everything else: which permissions auto-approve, which env vars to inject, which hooks fire on which events, which MCP servers to spin up.

  • Resolution order: enterprise → user (~/.claude) → project (.claude) → project.local
  • Last write wins per leaf key
  • model, env, permissions, hooks, mcpServers, apiKeyHelper
  • /config and the update-config skill edit these safely
Memory Files
CLAUDE.md project + user files, surfaced every turn

Long-lived instructions the model should remember. Project CLAUDE.md travels with the repo; user CLAUDE.md follows the developer. Both are injected into context near the top of every turn.

  • CLAUDE.md — repo conventions, build commands, code-style rules
  • AGENTS.md — agent-targeted notes (this repo uses one)
  • ~/.claude/CLAUDE.md — personal preferences across all projects
  • Edits via /remember or direct file write — picked up next turn
Compaction
Summarize history when the window is near full

Before the context window overflows, the harness summarizes earlier turns into a compact recap and replaces the raw history with it. The model keeps working without losing the thread.

  • Triggered as token budget approaches the configured limit
  • Recent turns kept verbatim; older turns folded into summary
  • Tool results may be elided if their content is recoverable from later state
  • Transparent to the model — same context shape, smaller footprint
Background & Monitor
Long-running processes outside the turn loop

Bash with run_in_background spawns a tracked child process. Its stdout streams via the Monitor tool so the model can subscribe to events (e.g. PR webhooks, dev-server logs) without sleeping.

  • Bash(run_in_background: true) returns a process handle
  • Monitor reads new lines as notifications when new output appears
  • Used for: PR activity subscriptions, dev servers, file watchers
  • Process cleaned up at session end