Claude Managed Agents: A Practical Playbook

TL;DR

Claude Managed Agents ships the sandbox, session state, tool execution and event stream as a hosted service. You describe the agent, Anthropic runs it. Four concepts (Agent, Environment, Session, Events), a ten-minute quick start, and patterns for outcomes and multiagent delegation.

If you've built agents on the raw Messages API, you already know the tax: your own tool-call loop, your own sandbox, your own retry logic, your own secrets store. Months of plumbing before the agent does anything interesting. Claude Managed Agents deletes that list.

You describe the agent. Anthropic runs it — secure containers, long-lived sessions, built-in code execution, streaming events. This post is the short version of what it is, when to use it, and how to ship your first one today.

Messages API vs Managed Agents

They solve different problems. Pick based on how much control you actually need.

Messages API — direct model access. You write the agent loop. Right for custom harnesses or when you need total control.
Managed Agents — a hosted agent platform. Right for long tasks, async work, and anything you'd rather ship this quarter than next year.

Two deeper reasons to prefer the managed path. First, self-written harnesses bake in assumptions about what the model can't do — those assumptions go stale with every release. Managed Agents updates the harness for you. Second, task horizons are growing fast. Anthropic expects future Claude versions to work for days or weeks on a single task. That demands fault-tolerant infrastructure you probably don't want to maintain.

Four concepts that run the whole system

Learn these four and the rest follows.

Agent — a versioned config: model, system prompt, tools, MCP servers, skills. Create once, reference by ID. Models: claude-sonnet-4-6, claude-opus-4-6.
Environment — a container template. Networking rules, preinstalled packages (Python, Node, Go). Each session gets its own isolated container from it.
Session — a running instance of the agent in an environment. Holds history, filesystem, status. Runs for hours. Secrets live in a vault.
Events — the message bus. You send user.message; the agent streams text, tool calls, and status via Server-Sent Events.

Access. Managed Agents is in open beta. All endpoints require the managed-agents-2026-04-01 header (the SDK adds it automatically). Outcomes, Multiagent, and Memory are separate research previews — request them individually.

Your first agent in ten minutes

Install

# CLI
brew install anthropics/tap/ant

# SDK
pip install anthropic          # Python
npm install @anthropic-ai/sdk  # TypeScript

export ANTHROPIC_API_KEY="your-api-key-here"

If you already use Claude Code, there's a faster path: run claude, then /claude-api managed-agents-onboarding. The built-in skill walks you through everything below.

Create the agent

from anthropic import Anthropic

client = Anthropic()

agent = client.beta.agents.create(
    name="Coding Assistant",
    model="claude-sonnet-4-6",
    system="You are a helpful coding assistant.",
    tools=[{"type": "agent_toolset_20260401"}],
)

agent_toolset_20260401 bundles bash, read, write, edit, glob, grep, web_fetch, and web_search. One line, full toolbox.

Create an environment and start a session

env = client.beta.environments.create(
    name="dev-env",
    config={"type": "cloud", "networking": {"type": "unrestricted"}},
)

session = client.beta.sessions.create(
    agent=agent.id,
    environment_id=env.id,
    title="My first session",
)

Send a task and stream the result

with client.beta.sessions.events.stream(session.id) as stream:
    client.beta.sessions.events.send(
        session.id,
        events=[{
            "type": "user.message",
            "content": [{"type": "text",
                         "text": "Generate the first 20 Fibonacci "
                                 "numbers and save to fibonacci.txt"}],
        }],
    )
    for event in stream:
        if event.type == "session.status_idle":
            break

Under the hood: a container spins up, Claude picks tools, calls execute in the sandbox, results stream back. No Docker config, no tool dispatcher, no recovery logic on your side.

Built-in tools and how to scope them

The toolset gives you eight capabilities by default. You'll usually want to narrow it down for production agents.

Disable specific tools:

{
  "type": "agent_toolset_20260401",
  "configs": [
    {"name": "web_fetch", "enabled": false},
    {"name": "web_search", "enabled": false}
  ]
}

Or flip the default and enable only what you need:

{
  "type": "agent_toolset_20260401",
  "default_config": {"enabled": false},
  "configs": [
    {"name": "bash", "enabled": true},
    {"name": "read", "enabled": true},
    {"name": "write", "enabled": true}
  ]
}

Custom tools slot in alongside the built-ins. Four rules that matter more than anything else:

Describe richly — 3–4 sentences per tool on what, when, and limits. Tool-use quality tracks description quality.
Bundle related ops into one tool with an action parameter rather than ten near-duplicates.
Namespace — db_query, storage_read.
Return stable identifiers, not internal links the agent can't use later.

Permissions: always_allow vs always_ask

Two modes, and you can mix them per tool.

always_allow — the tool runs automatically. Use for trusted internal agents.
always_ask — the session pauses for your app to approve each call. Use for user-facing agents. MCP tools default to this.

A common shape: reads and web search auto, bash gated. This is one of the things that makes Managed Agents more production-ready than LangGraph, CrewAI, or AutoGen — none of them ship permissions out of the box.

Usage patterns to steal

Event-triggered — external service fires the agent. Sentry's Seer detects a bug, the agent writes the patch, opens the PR.
Scheduled — daily digests, GitHub activity, team task review.
Fire-and-forget — human files a task in Slack, gets back a table or presentation. Asana AI Teammates.
Long-horizon — multi-hour research, code migrations, deep analysis. Sessions preserve state across the run.

Deploy pattern that works. Keep agent templates as YAML in git. Apply them with the CLI from your deploy pipeline. Run sessions with the SDK at runtime. Config lives in version control; runtime stays lightweight.

Outcomes: turn conversations into jobs

Outcomes (research preview) promote a session from chat to job. You supply a rubric. The system spins up a separate grader in its own context — unbiased by the main agent's decisions — and scores each iteration. The agent keeps working until it passes or hits max_iterations.

Rubric rule: concrete and checkable wins.

Good: "CSV contains a price column with numeric values"
Bad: "Data looks good"

client.beta.sessions.events.send(
    session_id=session.id,
    events=[{
        "type": "user.define_outcome",
        "description": "Build a DCF model for Costco in .xlsx",
        "rubric": {"type": "text", "content": RUBRIC},
        "max_iterations": 5,
    }],
)

Outputs land in /mnt/session/outputs/. Pull them via the Files API once the session reports satisfied.

Multiagent delegation

A coordinator agent can call other agents listed in its callable_agents. Each runs in its own thread with isolated context, but they share one container and filesystem. Useful for code review with read-only tools, test generation in a sandboxed agent, or research agents with web access.

One limit: delegation is one level deep. The coordinator calls sub-agents; sub-agents can't chain further. Design accordingly.

Architecture in one paragraph

Anthropic deliberately split the system into three replaceable pieces: the brain (Claude plus harness), the hands (sandboxes and tools), and the session (an event log). Each is an interface with minimal assumptions about the others. A failure in one doesn't kill the system, execution is isolated from context, and you can plug in new harnesses as they evolve. Prompt caching, compaction, and automatic recovery are all built in.

Cost and limits

Standard API token rates plus $0.08 per active session hour. A ten-minute coding session costs a few cents in compute on top of tokens. API rate limits: 60 writes/min for resource creation, 600 reads/min for gets, lists, and streaming. Your org and plan limits stack on top.

Launch checklist

Get an API key at console.anthropic.com
Install the CLI or SDK
Export ANTHROPIC_API_KEY
Create an agent (model + prompt + tools)
Create an environment (container + packages + networking)
Start a session and send a task
Handle the event stream in your app

The bottom line

Managed Agents is the shortest path from "we want an agent" to "it's running in production." If your infrastructure is the blocker, switch. If you need a custom harness and total control, stay on Messages. Either way, the cost of hand-building a sandbox and event log just went to zero as a default choice.

For the full command reference, see the Claude Code cheatsheet, or browse other deep dives on the blog index.

Want the full Claude Code reference? Open the cheatsheet →

Messages API vs Managed Agents

Four concepts that run the whole system

Your first agent in ten minutes

Install

Create the agent

Create an environment and start a session

Send a task and stream the result

Built-in tools and how to scope them

Permissions: always_allow vs always_ask

Usage patterns to steal

Outcomes: turn conversations into jobs

Multiagent delegation

Architecture in one paragraph

Cost and limits

Launch checklist

The bottom line

Related reading