Bedsheet User Guide

A progressive tutorial from your first agent to complex multi-agent orchestration.

13 Lessons Hands-on Examples Beginner Friendly

Why "Bedsheet"? A Frustrated Developer's Tale

After 25+ years of writing code, I've learned that the best tools are the ones that get out of your way. I still use Vim in a terminal. I define infrastructure as code. I believe clicking through web UIs is for people who enjoy suffering.

So when I tried AWS Bedrock Agents... let's just say the experience was counterintuitive. The web console is cluttered. Click here, then there, configure this dropdown, wait for deployment. Every time I wanted to tweak an agent's behavior, I had to navigate through multiple screens. It felt like fighting the UI instead of building agents.

I wanted to define my agents in code. Version-controlled, reviewable, debuggable code. I wanted to run them locally during development. I wanted to actually see what was happening inside. And when I'm ready for production, I want to deploy to AWS, GCP, or my own servers - my choice, not Amazon's.

So I built Bedsheet. The name? It "covers" the same concepts as Bedrock (agents, action groups, orchestration) while being completely cloud-agnostic. Like a bedsheet that fits any bed regardless of brand, Bedsheet fits any cloud - or no cloud at all.

Plus, agent frameworks shouldn't take themselves too seriously. Life's too short for that.

Planned: Export your Bedsheet agents to AWS Bedrock or Google Cloud Vertex AI with a single command. Define locally, deploy anywhere. (Coming in v0.6)

1Getting Started

Installation

uv pip install bedsheet

Set Your API Key

Bedsheet uses Claude by default. Get your API key from console.anthropic.com:

export ANTHROPIC_API_KEY=your-key-here

Verify Installation

uvx bedsheet demo

This runs a demo showing multi-agent collaboration. If it works, you're ready!

2Your First Agent

Why start simple?

Before adding tools, multi-agent orchestration, and fancy features, let's make sure you understand the core concept: an Agent is just a wrapper around an LLM with a personality (instruction) and a way to track conversations. That's it. Everything else builds on this foundation.

Let's create the simplest possible agent - one that just responds to messages without any tools:

import asyncio
from bedsheet import Agent
from bedsheet.llm import AnthropicClient
from bedsheet.events import CompletionEvent

async def main():
    # Create an agent
    agent = Agent(
        name="Assistant",
        instruction="You are a helpful assistant. Be concise.",
        model_client=AnthropicClient(),
    )

    # Invoke the agent and get events
    async for event in agent.invoke(
        session_id="my-session",
        input_text="What is Python?"
    ):
        if isinstance(event, CompletionEvent):
            print(event.response)

asyncio.run(main())

What's happening

Agent - The core class that talks to an LLM
name - A name for your agent (used in logs and multi-agent setups)
instruction - The system prompt that defines behavior
model_client - The LLM to use (Claude by default)
invoke() - Runs the agent and yields events
session_id - Groups messages into a conversation
CompletionEvent - The final response

Output

Python is a high-level, interpreted programming language known for its readable syntax and versatility. It's widely used for web development, data science, automation, and AI/ML applications.

3Adding Tools

Why tools matter

An LLM by itself can only generate text based on its training data. But what if you want it to check today's weather? Look up a database? Send an email? That's where tools come in. You define Python functions, and the LLM can decide when to call them. This is what transforms a chatbot into an agent that can actually do things in the world.

Agents become powerful when they can use tools. Let's give our agent the ability to check the weather:

import asyncio
from bedsheet import Agent, ActionGroup
from bedsheet.llm import AnthropicClient
from bedsheet.events import CompletionEvent, ToolCallEvent, ToolResultEvent

# Step 1: Create an ActionGroup to hold tools
tools = ActionGroup(name="WeatherTools")

# Step 2: Define a tool using the @action decorator
@tools.action(
    name="get_weather",
    description="Get the current weather for a city"
)
async def get_weather(city: str) -> dict:
    """In a real app, this would call a weather API."""
    weather_data = {
        "New York": {"temp": 72, "condition": "Sunny"},
        "London": {"temp": 58, "condition": "Cloudy"},
        "Tokyo": {"temp": 68, "condition": "Partly cloudy"},
    }
    return weather_data.get(city, {"temp": 70, "condition": "Unknown"})

async def main():
    # Step 3: Create an agent
    agent = Agent(
        name="WeatherBot",
        instruction="You help users check the weather. Use get_weather when asked.",
        model_client=AnthropicClient(),
    )

    # Step 4: Attach the tools to the agent
    agent.add_action_group(tools)

    # Step 5: Invoke
    async for event in agent.invoke("session-1", "What's the weather in Tokyo?"):
        if isinstance(event, ToolCallEvent):
            print(f"[Tool Call] {event.tool_name}({event.tool_input})")
        elif isinstance(event, ToolResultEvent):
            print(f"[Tool Result] {event.result}")
        elif isinstance(event, CompletionEvent):
            print(f"\n{event.response}")

asyncio.run(main())

Output

[Tool Call] get_weather({'city': 'Tokyo'}) [Tool Result] {'temp': 68, 'condition': 'Partly cloudy'} The weather in Tokyo is currently 68°F and partly cloudy.

What's happening

The LLM sees your message and the available tools
It decides to call get_weather with city="Tokyo"
Bedsheet executes your function and sends the result back
The LLM generates a natural language response

4Understanding Events

Why events instead of just returning a response?

This is where Bedsheet fundamentally differs from "black box" agent frameworks. Instead of waiting for a final answer (and having no idea what's happening inside), you get a stream of events as they happen. Tool call? You see it. Result came back? You see it. Error occurred? You see it immediately. This is essential for debugging ("why did it call that tool?"), building UIs with progress indicators, and understanding agent behavior. No more "it worked... but I have no idea why."

Bedsheet uses an event-driven architecture. Instead of returning a single response, invoke() yields events as things happen. This gives you full visibility into the agent's behavior.

All Event Types

from bedsheet.events import (
    # Single agent events
    TextTokenEvent,      # A token from streaming response
    ToolCallEvent,       # LLM wants to call a tool
    ToolResultEvent,     # Tool execution completed
    CompletionEvent,     # Agent finished with final response
    ErrorEvent,          # Something went wrong
    ThinkingEvent,       # Extended thinking content

    # Multi-agent events
    DelegationEvent,          # Supervisor delegating to agents
    CollaboratorStartEvent,   # A collaborator agent is starting
    CollaboratorEvent,        # Wraps events from collaborators
    CollaboratorCompleteEvent,# A collaborator finished
    RoutingEvent,             # Router picked an agent
)

Event Handling Pattern

async for event in agent.invoke(session_id, user_input):
    match event:
        case ToolCallEvent(tool_name=name, tool_input=args):
            print(f"Calling {name} with {args}")

        case ToolResultEvent(result=result, error=error):
            if error:
                print(f"Tool failed: {error}")
            else:
                print(f"Tool returned: {result}")

        case CompletionEvent(response=text):
            print(f"Final: {text}")

        case ErrorEvent(error=err):
            print(f"Error: {err}")

Why Events?

Events let you see exactly what's happening inside the agent - every tool call, every decision. This is essential for debugging and building UIs that show real-time progress.

5Streaming Responses

Why streaming?

LLMs can take several seconds to generate a full response. Without streaming, your user stares at a blank screen wondering if the app is frozen. With streaming, they see words appearing in real-time - it feels faster and more engaging, even though the total time is the same. It's a small change that dramatically improves the user experience.

By default, you get the complete response in CompletionEvent. But you can stream token-by-token for a better UX:

async for event in agent.invoke(session_id, user_input, stream=True):
    if isinstance(event, TextTokenEvent):
        # Print each word as it arrives
        print(event.token, end="", flush=True)
    elif isinstance(event, CompletionEvent):
        print()  # Newline after streaming completes

Real-World Effect

This is how ChatGPT and Claude show responses word-by-word instead of making you wait for the complete answer. Each token is displayed the moment it arrives from the API.

6Structured Outputs

Why structured outputs?

LLMs are great at generating text, but sometimes you need predictable, machine-readable data. Maybe you need a JSON object to store in a database, or a specific format for your UI to render. Without constraints, LLMs might add extra text, miss required fields, or format things inconsistently. Structured outputs guarantee the response matches your exact schema - zero chance of malformed JSON.

Bedsheet supports Anthropic's native structured outputs feature. You define a schema, and Claude's constrained decoding ensures 100% compliance:

Option 1: Raw JSON Schema (No Dependencies)

from bedsheet.llm import AnthropicClient, OutputSchema

# Define your schema as a JSON Schema dict
schema = OutputSchema.from_dict({
    "type": "object",
    "properties": {
        "symbol": {"type": "string"},
        "recommendation": {
            "type": "string",
            "enum": ["buy", "sell", "hold"]
        },
        "confidence": {
            "type": "number",
            "minimum": 0,
            "maximum": 1
        },
        "reasoning": {"type": "string"}
    },
    "required": ["symbol", "recommendation", "confidence", "reasoning"]
})

# Use it with the LLM client
client = AnthropicClient()
response = await client.chat(
    messages=[{"role": "user", "content": "Analyze NVDA stock"}],
    system="You are a stock analyst.",
    output_schema=schema,
)

# Access the validated data - guaranteed to match your schema!
print(response.parsed_output)
# {"symbol": "NVDA", "recommendation": "buy", "confidence": 0.85, "reasoning": "..."}

Option 2: Pydantic Models (If You Prefer)

If you're already using Pydantic in your project, you can create schemas from your models:

from pydantic import BaseModel
from bedsheet.llm import OutputSchema

class StockAnalysis(BaseModel):
    symbol: str
    recommendation: str
    confidence: float
    reasoning: str

schema = OutputSchema.from_pydantic(StockAnalysis)

response = await client.chat(
    messages=[{"role": "user", "content": "Analyze AAPL"}],
    system="You are a stock analyst.",
    output_schema=schema,
)

# parsed_output is a dict matching your schema
print(response.parsed_output["recommendation"])  # "hold"

What's happening under the hood

Your schema is sent to Claude's API with the structured-outputs-2025-11-13 beta
Claude uses constrained decoding - it literally cannot generate tokens that would violate your schema
The response is guaranteed to be valid JSON matching your exact structure
parsed_output contains the validated, parsed data

Works WITH Tools!

Unlike some frameworks (looking at you, Google ADK), Bedsheet's structured outputs work alongside tool use. You can have an agent that calls tools AND returns structured data.

Pydantic is Optional

You don't need Pydantic installed. Raw JSON schemas work perfectly fine. Use whichever approach fits your project.

7Conversation Memory

Why memory?

By default, each call to invoke() is stateless - the agent has no idea what you said before. That's fine for one-off questions, but most real applications need context. "What was my last order?" requires knowing who "my" is. Memory stores conversation history so the agent can maintain context across multiple interactions. The session_id groups related messages together - same session means continued conversation, different session means fresh start.

Agents can remember previous messages in a session using Memory:

from bedsheet import Agent
from bedsheet.memory import InMemory

agent = Agent(
    name="Assistant",
    instruction="You are helpful.",
    model_client=AnthropicClient(),
    memory=InMemory(),  # Enable conversation memory
)

# First message
async for event in agent.invoke("session-1", "My name is Alice"):
    if isinstance(event, CompletionEvent):
        print(event.response)
# Output: "Nice to meet you, Alice!"

# Second message - agent remembers!
async for event in agent.invoke("session-1", "What's my name?"):
    if isinstance(event, CompletionEvent):
        print(event.response)
# Output: "Your name is Alice."

# Different session - no memory
async for event in agent.invoke("session-2", "What's my name?"):
    if isinstance(event, CompletionEvent):
        print(event.response)
# Output: "I don't know your name yet."

Session IDs Matter!

Same session ID = continued conversation. Different session ID = fresh start.

Memory Backends

from bedsheet.memory import InMemory, RedisMemory

# Development: Simple in-memory (lost when app restarts)
memory = InMemory()

# Production: Redis (persistent across restarts)
memory = RedisMemory(url="redis://localhost:6379")

8Multiple Tools

Agents can have many tools. The LLM decides which to use:

tools = ActionGroup(name="ResearchTools")

@tools.action(name="search_web", description="Search the web for information")
async def search_web(query: str) -> list:
    return [{"title": "Result 1", "snippet": "..."}]

@tools.action(name="get_page", description="Fetch content from a URL")
async def get_page(url: str) -> str:
    return "<html>...</html>"

@tools.action(name="summarize", description="Summarize long text")
async def summarize(text: str, max_words: int = 100) -> str:
    return "Summary..."

agent = Agent(
    name="Researcher",
    instruction="""You help users research topics.
    1. Search for relevant information
    2. Fetch pages if needed
    3. Summarize findings""",
    model_client=AnthropicClient(),
)
agent.add_action_group(tools)

Parallel Tool Execution

If the LLM requests multiple tools at once, Bedsheet executes them concurrently. Two search queries run at the same time, not one after another!

9Your First Multi-Agent System

Why multiple agents?

As your system grows, a single agent trying to do everything becomes unwieldy. Its instruction gets huge, it has dozens of tools, and debugging becomes a nightmare. The solution? Specialization. Create focused agents that each do one thing well (translation, research, analysis), then use a Supervisor to coordinate them. This is the same principle behind microservices - small, focused units that are easier to build, test, and maintain.

The real-world parallel

Think of it like a company. The CEO (Supervisor) doesn't personally write code, run marketing campaigns, AND handle accounting. They delegate to specialists (agents) who each have deep expertise in their domain. The CEO's job is to coordinate, synthesize, and make sure the right person handles the right task.

Now let's coordinate multiple agents. A Supervisor can delegate tasks to specialized agents:

from bedsheet import Agent, Supervisor, ActionGroup
from bedsheet.llm import AnthropicClient
from bedsheet.memory import InMemory
from bedsheet.events import (
    CompletionEvent, DelegationEvent,
    CollaboratorStartEvent, CollaboratorCompleteEvent
)

# === Agent 1: Translator ===
translator = Agent(
    name="Translator",
    instruction="You translate text. Respond with just the translation.",
    model_client=AnthropicClient(),
)

# === Agent 2: Summarizer ===
summarizer = Agent(
    name="Summarizer",
    instruction="You summarize text. Respond with just the summary.",
    model_client=AnthropicClient(),
)

# === Supervisor: Coordinator ===
coordinator = Supervisor(
    name="Coordinator",
    instruction="""You coordinate text processing.

    Collaborators:
    - Translator: For translation tasks
    - Summarizer: For summarization tasks

    Delegate to the appropriate agent based on the user's request.""",
    model_client=AnthropicClient(),
    memory=InMemory(),
    collaborators=[translator, summarizer],
    collaboration_mode="supervisor",
)

async def main():
    async for event in coordinator.invoke("s1", "Translate 'Hello' to Spanish"):
        if isinstance(event, DelegationEvent):
            agents = [d["agent_name"] for d in event.delegations]
            print(f"Delegating to: {agents}")

        elif isinstance(event, CollaboratorStartEvent):
            print(f"  [{event.agent_name}] Starting...")

        elif isinstance(event, CollaboratorCompleteEvent):
            print(f"  [{event.agent_name}] Done")

        elif isinstance(event, CompletionEvent):
            print(f"\nFinal: {event.response}")

Output

Delegating to: ['Translator'] [Translator] Starting... [Translator] Done Final: "Hello" in Spanish is "Hola".

10Parallel Delegation

Why parallel matters

When you ask for stock analysis, you need both market data AND news sentiment. These are independent - one doesn't need the other's output. Running them sequentially means waiting for one to finish before starting the other. Parallel delegation runs them at the same time, often cutting response time in half. In real applications with multiple API calls, this can turn a 10-second response into a 3-second response. Users notice.

The real power: running multiple agents simultaneously.

# Supervisor with Parallel Delegation
advisor = Supervisor(
    name="InvestmentAdvisor",
    instruction="""You coordinate investment research.

    For stock analysis, delegate to BOTH agents IN PARALLEL:

    delegate(delegations=[
        {"agent_name": "MarketAnalyst", "task": "Analyze price data"},
        {"agent_name": "NewsResearcher", "task": "Find recent news"}
    ])

    Then synthesize their findings.""",
    model_client=AnthropicClient(),
    memory=InMemory(),
    collaborators=[market_analyst, news_researcher],
    collaboration_mode="supervisor",
)

With Parallel Delegation

0s  MarketAnalyst starts
0s  NewsResearcher starts  ← Both start!
2s  MarketAnalyst done
2s  NewsResearcher done
3s  Synthesis complete
─────────────────────────
Total: 3 seconds

Without (Sequential)

0s  MarketAnalyst starts
2s  MarketAnalyst done
2s  NewsResearcher starts  ← Waits!
4s  NewsResearcher done
5s  Synthesis complete
─────────────────────────
Total: 5 seconds

gantt title Parallel vs Sequential Delegation dateFormat X axisFormat %s section Parallel MarketAnalyst :a1, 0, 20 NewsResearcher :a2, 0, 20 section Sequential MarketAnalyst :b1, 0, 20 NewsResearcher :b2, 20, 40

11Deep Agent Hierarchies

Why hierarchies?

Just like real organizations have managers of managers, complex AI systems benefit from layered supervision. A Project Manager doesn't directly coordinate 20 individual workers - they work through team leads. Each layer handles coordination at its level, keeping the complexity manageable. The Project Manager says "build feature X", the Dev Lead breaks that into code + review tasks, and individual agents do the actual work. This also means you can reuse team compositions - the same "DevTeam" supervisor can work for different projects.

Supervisors can contain other Supervisors, creating deep hierarchies:

# Level 3: Specialist agents
code_writer = Agent(name="CodeWriter", ...)
code_reviewer = Agent(name="CodeReviewer", ...)
test_writer = Agent(name="TestWriter", ...)

# Level 2: Team leads (Supervisors containing agents)
dev_lead = Supervisor(
    name="DevLead",
    instruction="Coordinate code writing and review.",
    collaborators=[code_writer, code_reviewer],
    ...
)

qa_lead = Supervisor(
    name="QALead",
    instruction="Coordinate testing efforts.",
    collaborators=[test_writer],
    ...
)

# Level 1: Project manager (Supervisor containing supervisors!)
project_manager = Supervisor(
    name="ProjectManager",
    instruction="""Manage software projects.
    - DevLead for implementation
    - QALead for testing""",
    collaborators=[dev_lead, qa_lead],  # Other supervisors!
    ...
)

flowchart TB PM[ProjectManager
Supervisor] PM --> DL[DevLead
Supervisor] PM --> QA[QALead
Supervisor] DL --> CW[CodeWriter
Agent] DL --> CR[CodeReviewer
Agent] QA --> TW[TestWriter
Agent] style PM fill:#dbeafe,stroke:#0969da,color:#1f2328 style DL fill:#e0f2fe,stroke:#0284c7,color:#1f2328 style QA fill:#e0f2fe,stroke:#0284c7,color:#1f2328 style CW fill:#dcfce7,stroke:#1a7f37,color:#1f2328 style CR fill:#dcfce7,stroke:#1a7f37,color:#1f2328 style TW fill:#dcfce7,stroke:#1a7f37,color:#1f2328

Nested Events

For deep hierarchies, you'll see CollaboratorEvent wrapping CollaboratorEvent. Each level adds a wrapper showing which agent emitted the event.

12Router Mode

When to use Router vs Supervisor

Not every multi-agent scenario needs coordination and synthesis. Sometimes you just need to route the user to the right expert. A help desk doesn't need to combine answers from Python and JavaScript experts - it needs to figure out which one can help and hand off the conversation. Router mode is for this: pick one agent and pass through its response directly. It's simpler, faster, and often exactly what you need.

Sometimes you don't need synthesis - just route to the right specialist:

# Router just picks one and hands off
help_desk = Supervisor(
    name="HelpDesk",
    instruction="""Route programming questions to the right expert:
    - Python questions → PythonExpert
    - JavaScript questions → JavaScriptExpert
    - Database questions → DatabaseExpert

    Pick ONE expert that best matches.""",
    collaborators=[python_expert, javascript_expert, database_expert],
    collaboration_mode="router",  # Just route, don't synthesize
    ...
)

Supervisor Mode (default)

Orchestrates multiple agents
Synthesizes their responses
Good for complex tasks

Router Mode

Picks one agent
Passes through directly
Good for simple triage

13Best Practices

Agent Design

1. Single Responsibility

Each agent should do one thing well:

Good: Focused agents

code_writer = Agent(
    name="CodeWriter",
    instruction="Write code."
)
code_reviewer = Agent(
    name="CodeReviewer",
    instruction="Review code."
)

Bad: Kitchen sink

do_everything = Agent(
    name="DoEverything",
    instruction="Write, review,
    test, deploy, monitor..."
)

2. Clear Instructions

Good: Specific

You summarize text.
- Under 100 words
- Focus on key points
- Use bullet points

Bad: Vague

Help with text stuff

3. Tool Descriptions Matter

The LLM uses descriptions to decide when to call tools:

Good: Descriptive

@tools.action(
    name="get_weather",
    description="Get current weather
    for a city. Returns temperature
    in Fahrenheit and conditions."
)

Bad: Vague

@tools.action(
    name="get_weather",
    description="Weather stuff"
)

Multi-Agent Design

Use Parallel Delegation

When tasks are independent, run them together. Don't make agents wait for each other unnecessarily.

Gate Expensive Operations

Check prerequisites (ethics, permissions) before running expensive research tasks.

Let Supervisors Synthesize

Don't just concatenate agent outputs. Have the supervisor create a unified, coherent response.

Error Handling

@tools.action(name="get_data", description="Fetch data")
async def get_data(id: str) -> dict:
    try:
        return await fetch(id)
    except NotFoundError:
        return {"error": f"No data found for id: {id}"}
    except Exception as e:
        return {"error": f"Failed to fetch: {str(e)}"}

Production Readiness

# Use Redis for production
from bedsheet.memory import RedisMemory
memory = RedisMemory(url="redis://your-redis:6379")

# Set iteration limits
supervisor = Supervisor(..., max_iterations=10)

# Log events for observability
async for event in agent.invoke(...):
    logger.info(f"Event: {event.type}", extra={"event": event})