Claude Managed Agents ships the sandbox, session state, tool execution and event stream as a hosted service. You describe the agent, Anthropic runs it. Four concepts (Agent, Environment, Session, Events), a ten-minute quick start, and patterns for outcomes and multiagent delegation.
If you've built agents on the raw Messages API, you already know the tax: your own tool-call loop, your own sandbox, your own retry logic, your own secrets store. Months of plumbing before the agent does anything interesting. Claude Managed Agents deletes that list.
You describe the agent. Anthropic runs it — secure containers, long-lived sessions, built-in code execution, streaming events. This post is the short version of what it is, when to use it, and how to ship your first one today.
Messages API vs Managed Agents
They solve different problems. Pick based on how much control you actually need.
- Messages API — direct model access. You write the agent loop. Right for custom harnesses or when you need total control.
- Managed Agents — a hosted agent platform. Right for long tasks, async work, and anything you'd rather ship this quarter than next year.
Two deeper reasons to prefer the managed path. First, self-written harnesses bake in assumptions about what the model can't do — those assumptions go stale with every release. Managed Agents updates the harness for you. Second, task horizons are growing fast. Anthropic expects future Claude versions to work for days or weeks on a single task. That demands fault-tolerant infrastructure you probably don't want to maintain.
Four concepts that run the whole system
Learn these four and the rest follows.
- Agent — a versioned config: model, system prompt, tools, MCP servers, skills. Create once, reference by ID. Models:
claude-sonnet-4-6,claude-opus-4-6. - Environment — a container template. Networking rules, preinstalled packages (Python, Node, Go). Each session gets its own isolated container from it.
- Session — a running instance of the agent in an environment. Holds history, filesystem, status. Runs for hours. Secrets live in a vault.
- Events — the message bus. You send
user.message; the agent streams text, tool calls, and status via Server-Sent Events.
managed-agents-2026-04-01 header (the SDK adds it automatically). Outcomes, Multiagent, and Memory are separate research previews — request them individually.
Your first agent in ten minutes
Install
# CLI
brew install anthropics/tap/ant
# SDK
pip install anthropic # Python
npm install @anthropic-ai/sdk # TypeScript
export ANTHROPIC_API_KEY="your-api-key-here"
If you already use Claude Code, there's a faster path: run claude, then /claude-api managed-agents-onboarding. The built-in skill walks you through everything below.
Create the agent
from anthropic import Anthropic
client = Anthropic()
agent = client.beta.agents.create(
name="Coding Assistant",
model="claude-sonnet-4-6",
system="You are a helpful coding assistant.",
tools=[{"type": "agent_toolset_20260401"}],
)
agent_toolset_20260401 bundles bash, read, write, edit, glob, grep, web_fetch, and web_search. One line, full toolbox.
Create an environment and start a session
env = client.beta.environments.create(
name="dev-env",
config={"type": "cloud", "networking": {"type": "unrestricted"}},
)
session = client.beta.sessions.create(
agent=agent.id,
environment_id=env.id,
title="My first session",
)
Send a task and stream the result
with client.beta.sessions.events.stream(session.id) as stream:
client.beta.sessions.events.send(
session.id,
events=[{
"type": "user.message",
"content": [{"type": "text",
"text": "Generate the first 20 Fibonacci "
"numbers and save to fibonacci.txt"}],
}],
)
for event in stream:
if event.type == "session.status_idle":
break
Under the hood: a container spins up, Claude picks tools, calls execute in the sandbox, results stream back. No Docker config, no tool dispatcher, no recovery logic on your side.
Built-in tools and how to scope them
The toolset gives you eight capabilities by default. You'll usually want to narrow it down for production agents.
Disable specific tools:
{
"type": "agent_toolset_20260401",
"configs": [
{"name": "web_fetch", "enabled": false},
{"name": "web_search", "enabled": false}
]
}
Or flip the default and enable only what you need:
{
"type": "agent_toolset_20260401",
"default_config": {"enabled": false},
"configs": [
{"name": "bash", "enabled": true},
{"name": "read", "enabled": true},
{"name": "write", "enabled": true}
]
}
Custom tools slot in alongside the built-ins. Four rules that matter more than anything else:
- Describe richly — 3–4 sentences per tool on what, when, and limits. Tool-use quality tracks description quality.
- Bundle related ops into one tool with an
actionparameter rather than ten near-duplicates. - Namespace —
db_query,storage_read. - Return stable identifiers, not internal links the agent can't use later.
Permissions: always_allow vs always_ask
Two modes, and you can mix them per tool.
always_allow— the tool runs automatically. Use for trusted internal agents.always_ask— the session pauses for your app to approve each call. Use for user-facing agents. MCP tools default to this.
A common shape: reads and web search auto, bash gated. This is one of the things that makes Managed Agents more production-ready than LangGraph, CrewAI, or AutoGen — none of them ship permissions out of the box.
Usage patterns to steal
- Event-triggered — external service fires the agent. Sentry's Seer detects a bug, the agent writes the patch, opens the PR.
- Scheduled — daily digests, GitHub activity, team task review.
- Fire-and-forget — human files a task in Slack, gets back a table or presentation. Asana AI Teammates.
- Long-horizon — multi-hour research, code migrations, deep analysis. Sessions preserve state across the run.
Outcomes: turn conversations into jobs
Outcomes (research preview) promote a session from chat to job. You supply a rubric. The system spins up a separate grader in its own context — unbiased by the main agent's decisions — and scores each iteration. The agent keeps working until it passes or hits max_iterations.
Rubric rule: concrete and checkable wins.
- Good: "CSV contains a
pricecolumn with numeric values" - Bad: "Data looks good"
client.beta.sessions.events.send(
session_id=session.id,
events=[{
"type": "user.define_outcome",
"description": "Build a DCF model for Costco in .xlsx",
"rubric": {"type": "text", "content": RUBRIC},
"max_iterations": 5,
}],
)
Outputs land in /mnt/session/outputs/. Pull them via the Files API once the session reports satisfied.
Multiagent delegation
A coordinator agent can call other agents listed in its callable_agents. Each runs in its own thread with isolated context, but they share one container and filesystem. Useful for code review with read-only tools, test generation in a sandboxed agent, or research agents with web access.
One limit: delegation is one level deep. The coordinator calls sub-agents; sub-agents can't chain further. Design accordingly.
Architecture in one paragraph
Anthropic deliberately split the system into three replaceable pieces: the brain (Claude plus harness), the hands (sandboxes and tools), and the session (an event log). Each is an interface with minimal assumptions about the others. A failure in one doesn't kill the system, execution is isolated from context, and you can plug in new harnesses as they evolve. Prompt caching, compaction, and automatic recovery are all built in.
Cost and limits
Standard API token rates plus $0.08 per active session hour. A ten-minute coding session costs a few cents in compute on top of tokens. API rate limits: 60 writes/min for resource creation, 600 reads/min for gets, lists, and streaming. Your org and plan limits stack on top.
Launch checklist
- Get an API key at console.anthropic.com
- Install the CLI or SDK
- Export
ANTHROPIC_API_KEY - Create an agent (model + prompt + tools)
- Create an environment (container + packages + networking)
- Start a session and send a task
- Handle the event stream in your app
The bottom line
Managed Agents is the shortest path from "we want an agent" to "it's running in production." If your infrastructure is the blocker, switch. If you need a custom harness and total control, stay on Messages. Either way, the cost of hand-building a sandbox and event log just went to zero as a default choice.
For the full command reference, see the Claude Code cheatsheet, or browse other deep dives on the blog index.