Each node from the diagram, expanded. Clicking a node above scrolls here.
User Input
Prompt + slash commands + skill invocations
Every turn starts with a user message. A leading slash invokes a registered skill, which the harness loads into context before the model sees the turn.
- Plain text, images, file references
- Slash command resolves to a skill name and arguments
- UserPromptSubmit hook can mutate or block the prompt before context assembly
- Hook output is concatenated into context as additional user material
UserPromptSubmit → settings.json hooks[]
→ hook stdout appended to user turn
→ exit code 2 blocks the turn
Context Assembly
System prompt, memory, environment, tool schemas, history
Before each model call the harness builds a single context window: identity + behavior rules, project memory, runtime environment, the catalog of currently-loaded tool schemas, prior turns, and any deferred tool definitions resolved via ToolSearch.
- System prompt — identity, tone, code-style rules, doing-tasks rules
- Environment block — cwd, platform, model id, git status, today's date
- Memory files — CLAUDE.md (project) and ~/.claude/CLAUDE.md (user)
- Tool schemas — JSONSchema for every tool in scope this turn
- Deferred tools — names only, hydrated on demand via ToolSearch
- Skills — names + when-to-use blurbs, full body loaded only on /invoke
- Conversation history — earlier turns + previous tool results
context = [
systemPrompt,
environmentBlock,
memoryFiles,
toolSchemas,
deferredToolIndex,
skillsRegistry,
...history,
currentUserTurn,
]
Claude Model
Opus / Sonnet / Haiku inference with optional extended thinking
The assembled context is sent to a Claude model. The harness picks the model family (Opus for hardest reasoning, Sonnet for default, Haiku for fast/cheap). Extended thinking is opt-in via configuration.
- Streaming tokens — text and structured tool_use blocks interleaved
- Reasoning ('thinking') content may precede the visible reply
- Tool-call blocks emit name + JSON input that must match the tool schema
- Multiple tool_use blocks in one turn = parallel execution
- Model never sees raw filesystem — only what tools return
opus claude-opus-4-7 // hardest reasoning
sonnet claude-sonnet-4-6 // default
haiku claude-haiku-4-5-20251001 // fast + cheap
Output Decision
Text reply vs. tool call(s)
Each streamed turn ends either with text (the assistant talking to the user) or with one or more tool_use blocks (the assistant asking the harness to do something). The harness branches accordingly.
- Text-only turn → flush to UI, fire Stop hook, await next user turn
- Tool turn → suspend the model, run each tool call to completion
- Mixed turn → text streamed first, then tool calls executed
- Parallel tool_use blocks run concurrently; results stitched back in order
Text Reply
Streamed to the user, ends the turn
Plain text the user sees. When the assistant stops without issuing a tool call, the harness fires Stop hooks, persists the turn, and yields back to the user.
- GitHub-flavored markdown rendered in CommonMark
- Stop hook can append text or kick the model again
- SessionEnd hook fires when the container is reclaimed
Tool Call
Structured request the harness must execute
The model emits a tool_use block with a tool name and JSON input. The harness validates the input against the tool's JSONSchema, then enters the dispatch pipeline.
- name + input_schema validation rejects bad calls before they run
- Calls can target built-in tools, sub-agents, or MCP servers
- Bash supports run_in_background → process tracked, output streamed via Monitor
- Up to N parallel calls per turn — each independent
{
"type": "tool_use",
"name": "Edit",
"input": { "file_path": "...", "old_string": "...", "new_string": "..." }
}PreToolUse Hook
Intercept tool calls before they execute
Configured shell commands receive the pending tool call on stdin. They can approve it, modify the input, or block it outright. This is where lint-on-write, format-on-write, and policy gates live.
- Configured per-matcher in settings.json (e.g. matcher: 'Edit|Write')
- Exit 0 → allow, exit 2 → block with stderr shown to model
- stdout can rewrite the tool input before dispatch
- Runs before the permission check
"hooks": {
"PreToolUse": [{
"matcher": "Edit|Write",
"hooks": [{ "type": "command", "command": "npm run lint:check" }]
}]
}Permission Check
Auto-allow rules, mode policy, user approval
The harness consults the permission system: rule-matched tools execute silently, others prompt the user. Permission mode (default / acceptEdits / bypass / plan) tunes how aggressive auto-approval is.
- settings.json permissions.allow + permissions.deny — glob-matched
- Project, user, and enterprise scopes are merged
- Plan mode blocks all writes/edits until user exits via ExitPlanMode
- Denied calls return a tool_error that the model can react to
"permissions": {
"allow": ["Bash(npm test:*)", "Read(**)"],
"deny": ["Bash(rm -rf*)", "Write(/etc/**)"],
"defaultMode": "acceptEdits"
}Tool Dispatch
Built-in tools, sub-agents, MCP servers
Approved calls run in three families: built-in tools (file + shell + web), sub-agents spawned via the Agent/Task tool (each gets its own context window), and MCP server tools (external integrations registered at startup).
- Built-in: Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch, TodoWrite, NotebookEdit
- Sub-agents: general-purpose, Explore, Plan, code-reviewer, statusline-setup — isolated context, return one message
- MCP: mcp__github__*, mcp__slack__*, custom servers — stdio or http transport
- Background processes tracked separately, surfaced via Monitor
- Each tool result becomes a tool_result block in the next model turn
Agent({
subagent_type: "Explore",
description: "find auth flow",
prompt: "Locate session-cookie issuing code..."
}) // → independent context, summarized backPostToolUse Hook
React to a completed tool call
Mirror of PreToolUse, but runs after the tool returns. Receives the tool result on stdin. Used for auto-format, auto-test, auto-stage, or to feed extra context back into the model.
- Same matcher syntax as PreToolUse
- stdout is appended to the tool_result the model sees
- Common uses: prettier on Edit, test runner on Write, git status after Bash
Tool Result
Stitched into the next model turn — loop closes
Every tool result is wrapped as a tool_result block keyed to its tool_use_id and inserted into the context for the next inference call. The model sees the result and decides what to do next.
- tool_use_id pairs result to call
- Errors arrive as is_error: true — model can self-correct
- Long output may be truncated; full content stays in cache
- Loop continues until the model emits a turn with no tool calls
Settings
settings.json across enterprise / user / project scopes
Config layer that shapes everything else: which permissions auto-approve, which env vars to inject, which hooks fire on which events, which MCP servers to spin up.
- Resolution order: enterprise → user (~/.claude) → project (.claude) → project.local
- Last write wins per leaf key
- model, env, permissions, hooks, mcpServers, apiKeyHelper
- /config and the update-config skill edit these safely
Memory Files
CLAUDE.md project + user files, surfaced every turn
Long-lived instructions the model should remember. Project CLAUDE.md travels with the repo; user CLAUDE.md follows the developer. Both are injected into context near the top of every turn.
- CLAUDE.md — repo conventions, build commands, code-style rules
- AGENTS.md — agent-targeted notes (this repo uses one)
- ~/.claude/CLAUDE.md — personal preferences across all projects
- Edits via /remember or direct file write — picked up next turn
Compaction
Summarize history when the window is near full
Before the context window overflows, the harness summarizes earlier turns into a compact recap and replaces the raw history with it. The model keeps working without losing the thread.
- Triggered as token budget approaches the configured limit
- Recent turns kept verbatim; older turns folded into summary
- Tool results may be elided if their content is recoverable from later state
- Transparent to the model — same context shape, smaller footprint
Background & Monitor
Long-running processes outside the turn loop
Bash with run_in_background spawns a tracked child process. Its stdout streams via the Monitor tool so the model can subscribe to events (e.g. PR webhooks, dev-server logs) without sleeping.
- Bash(run_in_background: true) returns a process handle
- Monitor reads new lines as notifications when new output appears
- Used for: PR activity subscriptions, dev servers, file watchers
- Process cleaned up at session end