AI Agents Architecture Production

How we design autonomous AI agents for business processes

Architecture, design patterns, and lessons learned building agents that operate 24/7 without human intervention.

March 2026 9 min

How we design autonomous AI agents for business processes

An AI agent is not a chatbot with more context. It is an autonomous system that receives a goal, plans the necessary steps, executes actions through external tools, and adapts when something fails. The difference is operational: a chatbot answers questions, an agent completes tasks.

At Cloudstudio we have been designing and deploying AI agents for over a year for clients who need to automate complex processes. In this article we share the architecture we use, the patterns that work, and the mistakes we have learned to avoid.

The fundamental agent loop

Every agent follows a cycle: observe the current state, decide the next action, execute it, and evaluate the result. This loop repeats until the goal is met or it escalates to a human. The key is that each step is a model call with updated context — the agent does not memorize a fixed plan, but re-evaluates at each iteration.

We use Claude as the brain of our agents. Its native tool use capability allows defining tools as functions the model can invoke directly: querying databases, sending emails, updating CRMs, generating documents. The model decides when and how to use each tool based on the task context.

Here is the core agent loop we use in production. It is deceptively simple — the complexity lives in the tools and the system prompt:

import anthropic
import json
import time
from dataclasses import dataclass, field
from typing import Any

@dataclass
class AgentState:
    goal: str
    messages: list = field(default_factory=list)
    steps_taken: int = 0
    total_tokens: int = 0
    total_cost: float = 0.0
    max_steps: int = 25
    max_cost: float = 5.00  # USD budget limit

class Agent:
    def __init__(self, system_prompt: str, tools: list, tool_handlers: dict):
        self.client = anthropic.Anthropic()
        self.system_prompt = system_prompt
        self.tools = tools
        self.tool_handlers = tool_handlers

    def run(self, goal: str) -> AgentState:
        state = AgentState(goal=goal)
        state.messages = [{"role": "user", "content": goal}]

        while state.steps_taken < state.max_steps:
            # Check budget before each step
            if state.total_cost >= state.max_cost:
                state.messages.append({
                    "role": "user",
                    "content": "Budget limit reached. Summarize what you have accomplished so far."
                })

            response = self.client.messages.create(
                model="claude-sonnet-4-6-6",
                max_tokens=4096,
                system=self.system_prompt,
                tools=self.tools,
                messages=state.messages,
            )

            # Track usage
            state.total_tokens += response.usage.input_tokens + response.usage.output_tokens
            state.total_cost += self._calculate_cost(response.usage)
            state.steps_taken += 1

            # If Claude is done (no more tool calls), return
            if response.stop_reason == "end_turn":
                state.messages.append({"role": "assistant", "content": response.content})
                return state

            # Process tool calls
            if response.stop_reason == "tool_use":
                state.messages.append({"role": "assistant", "content": response.content})
                tool_results = self._execute_tools(response.content)
                state.messages.append({"role": "user", "content": tool_results})

        return state  # Max steps reached

    def _execute_tools(self, content) -> list:
        results = []
        for block in content:
            if block.type == "tool_use":
                result = self._safe_execute(block.name, block.input)
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result),
                })
        return results

    def _safe_execute(self, tool_name: str, params: dict) -> dict:
        handler = self.tool_handlers.get(tool_name)
        if not handler:
            return {"error": f"Unknown tool: {tool_name}"}
        try:
            return handler(**params)
        except Exception as e:
            return {"error": f"{type(e).__name__}: {str(e)}"}

    def _calculate_cost(self, usage) -> float:
        # Sonnet pricing per million tokens
        input_cost = (usage.input_tokens / 1_000_000) * 3.0
        output_cost = (usage.output_tokens / 1_000_000) * 15.0
        return input_cost + output_cost

The max_steps and max_cost limits are not optional — they are the difference between a controlled agent and a runaway process that burns through your API budget at 3 AM.

Framework alternatives: If you prefer not to build a custom agent loop from scratch, there are now mature options. The Anthropic Agent SDK provides a production-ready Python framework for building Claude-powered agents with built-in tool management, guardrails, and multi-agent orchestration. For TypeScript teams, Vercel AI SDK 6 offers a provider-agnostic framework with human-in-the-loop tool approval, MCP support, and an AI Gateway that works across Claude, GPT-5.4, and Gemini 3.1. And for connecting agents to external tools and data sources, the Model Context Protocol (MCP) — now an industry standard under the Linux Foundation with backing from Anthropic, OpenAI, Google, Microsoft, and AWS — provides a universal interface that eliminates the need to write custom integrations for every tool. We still prefer custom loops for maximum control in complex production scenarios, but these frameworks significantly reduce time-to-production for standard agent patterns.

Tool definitions: designing for reliability

The quality of your tools determines the quality of your agent. Each tool must have a precise description, a strict input schema, and predictable error behavior. Here is an example set of tools for a customer support agent:

SUPPORT_TOOLS = [
    {
        "name": "search_knowledge_base",
        "description": "Search the internal [knowledge base](/services/rag-systems) for articles relevant to a customer issue. Returns the top 5 matching articles with titles and content. Use this FIRST before attempting to answer any technical question.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Natural language search query describing the customer's issue"
                },
                "category": {
                    "type": "string",
                    "enum": ["billing", "technical", "account", "product"],
                    "description": "Category to narrow the search"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "lookup_customer",
        "description": "Look up a customer's account details by email. Returns subscription plan, account status, recent tickets, and billing history. Use this to understand the customer's context.",
        "input_schema": {
            "type": "object",
            "properties": {
                "email": {
                    "type": "string",
                    "description": "Customer email address"
                }
            },
            "required": ["email"]
        }
    },
    {
        "name": "create_ticket",
        "description": "Create a support ticket for issues that require human follow-up. Use this when the issue cannot be resolved automatically or requires elevated permissions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "subject": {"type": "string", "description": "Brief summary of the issue"},
                "body": {"type": "string", "description": "Detailed description including steps taken"},
                "priority": {
                    "type": "string",
                    "enum": ["low", "medium", "high", "urgent"]
                },
                "category": {
                    "type": "string",
                    "enum": ["billing", "technical", "account", "product"]
                }
            },
            "required": ["customer_email", "subject", "body", "priority", "category"]
        }
    },
    {
        "name": "escalate_to_human",
        "description": "Immediately transfer the conversation to a human agent. Use this when: the customer explicitly requests a human, the issue involves refunds over $100, or you are not confident in your resolution.",
        "input_schema": {
            "type": "object",
            "properties": {
                "reason": {"type": "string", "description": "Why the escalation is needed"},
                "summary": {"type": "string", "description": "Summary of the conversation so far"},
                "suggested_team": {
                    "type": "string",
                    "enum": ["billing", "engineering", "account_management"]
                }
            },
            "required": ["reason", "summary"]
        }
    }
]

Two design principles we have learned the hard way: first, always include an escalation tool. An agent that cannot escalate will hallucinate solutions when it is stuck. Second, write tool descriptions as directives — "Use this FIRST before..." gives the model a clear decision framework.

Patterns that work in production

Human-in-the-loop. No agent should operate without an escalation mechanism. We define confidence thresholds: if the agent is not sure about its decision, it pauses execution and notifies a human. This is critical in processes with financial or legal impact.

Error recovery. Agents in production encounter errors constantly: failing APIs, unexpected data, timeouts. We design each tool with automatic retries, fallbacks, and circuit breakers. The agent must be able to diagnose the error and try an alternative route.

Here is our error recovery wrapper that wraps every tool handler:

import time
import random
import logging
from functools import wraps

logger = logging.getLogger(__name__)

def resilient_tool(max_retries=3, timeout=30, fallback=None):
    """Decorator that adds retry logic and circuit breaking to tool handlers."""
    def decorator(func):
        _failure_count = {"value": 0}
        _circuit_open_until = {"value": 0}

        @wraps(func)
        def wrapper(**kwargs):
            # Circuit breaker: if too many recent failures, fail fast
            if time.time() < _circuit_open_until["value"]:
                if fallback:
                    logger.warning(f"Circuit open for {func.__name__}, using fallback")
                    return fallback(**kwargs)
                return {"error": "Service temporarily unavailable", "retry_after": 60}

            for attempt in range(max_retries):
                try:
                    result = func(**kwargs)
                    _failure_count["value"] = 0  # Reset on success
                    return result
                except TimeoutError:
                    if attempt < max_retries - 1:
                        wait = (2 ** attempt) + random.uniform(0, 1)
                        time.sleep(wait)
                    else:
                        _failure_count["value"] += 1
                        if _failure_count["value"] >= 5:
                            _circuit_open_until["value"] = time.time() + 60
                        return {"error": f"Timeout after {max_retries} retries"}
                except Exception as e:
                    logger.error(f"Tool {func.__name__} failed: {e}")
                    _failure_count["value"] += 1
                    if _failure_count["value"] >= 5:
                        _circuit_open_until["value"] = time.time() + 60
                    return {"error": str(e)}

        return wrapper
    return decorator


@resilient_tool(max_retries=3, timeout=10)
def lookup_customer(email: str) -> dict:
    """Look up customer from CRM."""
    response = crm_client.get(f"/customers?email={email}", timeout=10)
    response.raise_for_status()
    return response.json()

The circuit breaker pattern is essential. Without it, a downstream service outage causes your agent to spend all its budget retrying a tool that will never succeed. With the circuit breaker, after 5 consecutive failures the tool fails fast for 60 seconds, giving the agent a clear error signal to work with.

Observability. Every agent action is logged with timestamps, token costs, duration, and result. Without full observability, debugging an agent in production is impossible. We use structured traces that allow reconstructing each agent decision step by step.

Monitoring and observability setup

You cannot operate an agent you cannot observe. We log every step in a structured format that allows us to reconstruct the full decision chain:

import time
import json
import logging
from contextlib import contextmanager
from dataclasses import dataclass, asdict

logger = logging.getLogger("agent.trace")

@dataclass
class AgentTrace:
    agent_id: str
    session_id: str
    step: int
    action: str  # "llm_call", "tool_call", "tool_result", "error", "escalation"
    tool_name: str | None = None
    input_summary: str | None = None
    output_summary: str | None = None
    input_tokens: int = 0
    output_tokens: int = 0
    cost_usd: float = 0.0
    duration_ms: int = 0
    success: bool = True
    error: str | None = None

class AgentTracer:
    def __init__(self, agent_id: str, session_id: str):
        self.agent_id = agent_id
        self.session_id = session_id
        self.step = 0
        self.traces: list[AgentTrace] = []

    @contextmanager
    def trace_step(self, action: str, tool_name: str = None):
        self.step += 1
        trace = AgentTrace(
            agent_id=self.agent_id,
            session_id=self.session_id,
            step=self.step,
            action=action,
            tool_name=tool_name,
        )
        start = time.monotonic()
        try:
            yield trace
            trace.success = True
        except Exception as e:
            trace.success = False
            trace.error = str(e)
            raise
        finally:
            trace.duration_ms = int((time.monotonic() - start) * 1000)
            self.traces.append(trace)
            # Emit as structured JSON log
            logger.info(json.dumps(asdict(trace)))

    def summary(self) -> dict:
        return {
            "total_steps": self.step,
            "total_tokens": sum(t.input_tokens + t.output_tokens for t in self.traces),
            "total_cost": sum(t.cost_usd for t in self.traces),
            "total_duration_ms": sum(t.duration_ms for t in self.traces),
            "errors": [t.error for t in self.traces if not t.success],
            "tools_used": [t.tool_name for t in self.traces if t.tool_name],
        }

We pipe these JSON logs into our observability stack (Datadog or similar) and build dashboards that show: agent success rate, average steps per task, cost per task, most-used tools, and error frequency by tool. The dashboard has been the single most valuable debugging tool — when an agent starts behaving unexpectedly, the traces tell you exactly where the reasoning went wrong.

Cost control: budget limits and token tracking

Agents can be expensive because they make multiple LLM calls per task. Without cost controls, a single pathological input can trigger dozens of iterations. Here is how we enforce budgets:

@dataclass
class BudgetConfig:
    max_cost_per_session: float = 2.00     # USD
    max_cost_per_step: float = 0.50        # USD
    max_tokens_per_session: int = 100_000
    max_steps: int = 25
    alert_threshold: float = 0.75          # Alert at 75% of budget

class BudgetManager:
    def __init__(self, config: BudgetConfig):
        self.config = config
        self.total_cost = 0.0
        self.total_tokens = 0
        self.steps = 0

    def check_budget(self, estimated_input_tokens: int = 0) -> dict:
        """Check if we can afford another step. Returns status and remaining budget."""
        if self.steps >= self.config.max_steps:
            return {"allowed": False, "reason": "max_steps_reached"}
        if self.total_cost >= self.config.max_cost_per_session:
            return {"allowed": False, "reason": "cost_limit_reached"}
        if self.total_tokens + estimated_input_tokens > self.config.max_tokens_per_session:
            return {"allowed": False, "reason": "token_limit_reached"}

        remaining = self.config.max_cost_per_session - self.total_cost
        if remaining / self.config.max_cost_per_session < (1 - self.config.alert_threshold):
            logger.warning(f"Budget alert: only ${remaining:.2f} remaining")

        return {
            "allowed": True,
            "remaining_cost": remaining,
            "remaining_steps": self.config.max_steps - self.steps,
        }

    def record_usage(self, input_tokens: int, output_tokens: int, model: str):
        self.steps += 1
        self.total_tokens += input_tokens + output_tokens
        self.total_cost += self._price(input_tokens, output_tokens, model)

    def _price(self, input_tokens: int, output_tokens: int, model: str) -> float:
        pricing = {
            "claude-haiku-4-5-5": (0.80, 4.0),
            "claude-sonnet-4-6-6": (3.0, 15.0),
            "claude-opus-4-6-6": (15.0, 75.0),
        }
        input_rate, output_rate = pricing.get(model, (3.0, 15.0))
        return (input_tokens / 1_000_000 * input_rate) + (output_tokens / 1_000_000 * output_rate)

We set different budget configs for different agent types. A simple classification agent gets $0.50 per session. A complex research agent that needs to search multiple sources gets $5.00. These limits catch runaway agents early and give you predictable unit economics.

What we learned the hard way

The biggest mistake is assuming the agent will always make the right decision. In production, edge cases are the norm. A support agent that classifies tickets will work perfectly 95% of the time — but that remaining 5% can generate incorrect responses to important clients.

The solution is not more prompt engineering. It is designing the system so that failures are detectable, reversible, and escalable. Limiting the blast radius of each agent action. And continuously measuring decision quality against an evaluation set.

Three more lessons from running agents in production for a year:

Test with adversarial inputs. Users will send your agent things you never imagined — empty messages, messages in the wrong language, screenshots when it expects text, or deliberately misleading instructions. Build a test suite of 50+ adversarial cases and run it on every prompt change.

Keep the system prompt under 2,000 tokens. Longer prompts give the model more information, but they also increase latency and cost per step. The agent loop multiplies this cost by 5-25x. We have found that concise, well-structured system prompts outperform verbose ones.

Log every tool call, not just errors. When something goes wrong on step 15 of an agent run, you need the full trace to understand how it got there. Logging only errors gives you the symptom. Logging every step gives you the diagnosis.