Skip to content

Budget & Limits

The limits section controls how much a workflow or individual block is allowed to spend in cost, tokens, and wall-clock time. When a limit is breached, the engine either kills execution immediately or emits a warning and continues, depending on the on_exceed mode.

Add a limits section at the top level of your workflow YAML:

custom/workflows/research-pipeline.yaml
version: "1.0"
id: research-pipeline
kind: workflow
limits:
cost_cap_usd: 2.50
token_cap: 100000
max_duration_seconds: 300
on_exceed: fail
warn_at_pct: 0.8
workflow:
name: research-pipeline
entry: step_one
blocks:
step_one:
type: linear
soul_ref: analyst
FieldTypeDefaultConstraintsDescription
cost_cap_usdfloat?None>= 0.0Maximum total LLM cost in USD
token_capint?None>= 1Maximum total tokens (prompt + completion)
max_duration_secondsint?None1 -- 86400Wall-clock timeout for the entire workflow
on_exceed"warn" | "fail""fail"---What happens when a cap is breached
warn_at_pctfloat0.80.0 -- 1.0Percentage threshold for early warning events

All fields are optional. If you omit limits entirely, no budget enforcement is applied.

Individual blocks can have their own limits section:

custom/workflows/research-pipeline.yaml
blocks:
expensive_step:
type: linear
soul_ref: analyst
limits:
cost_cap_usd: 1.00
token_cap: 50000
max_duration_seconds: 120
on_exceed: fail
FieldTypeDefaultConstraintsDescription
cost_cap_usdfloat?None>= 0.0Maximum LLM cost for this block
token_capint?None>= 1Maximum tokens for this block
max_duration_secondsint?None1 -- 86400Wall-clock timeout for this block
on_exceed"warn" | "fail""fail"---What happens when a cap is breached

The on_exceed field controls what happens when a budget cap is breached:

The default. When any cap is exceeded, the engine raises a BudgetKilledException immediately. The exception includes structured metadata:

  • scope --- "block" or "workflow"
  • block_id --- which block triggered the breach (if block-scoped)
  • limit_kind --- "cost_usd", "token_cap", or "timeout"
  • limit_value --- the configured cap
  • actual_value --- the value that exceeded the cap

Error route interaction: If the block that triggered the exception has an error_route configured, the BudgetKilledException is caught by the workflow’s generic exception handler. The block result is written with exit_handle: "error" and error metadata, and execution continues on the error route target block. Only when no error_route exists does the exception propagate and terminate the run with status: failed.

Flow-level timeouts are the exception: When a workflow-level max_duration_seconds fires, the resulting BudgetKilledException is raised outside the block execution loop. It cannot be caught by any block’s error_route and always terminates the run unconditionally.

When a cap is exceeded, the engine logs a warning but execution continues. The run can finish normally even after exceeding a budget cap. Use this for soft budgets where you want visibility but not hard stops.

Warn mode example
limits:
cost_cap_usd: 1.00
on_exceed: warn
warn_at_pct: 0.8

With this configuration, a warning event is emitted at 80% of the cost cap ($0.80), and another when the cap is exceeded ($1.00+), but execution continues.

Budget enforcement tracks the active budget session per async task. Every LLM call passes through a single chokepoint where cost and token limits are checked --- whether the block runs in the engine process or in an isolated subprocess.

  1. Workflow start: If the workflow has limits, a budget session is created and set as the active budget for the run.

  2. Block start: If a block has limits, a child budget session is created with the workflow session as its parent. The child replaces the active budget for the duration of that block.

  3. LLM call returns: After each LLM call, the session records the cost and tokens. If the session has a parent, costs propagate up the chain automatically.

  4. Cap check: After recording, the engine walks the entire parent chain. If any session (block or workflow) has exceeded its cap with on_exceed: "fail", execution is killed immediately.

  5. Block end: The block’s budget session is removed and the workflow session is restored.

LLM blocks run in isolated subprocesses that have no direct access to API keys or budget state. Budget enforcement still works because every LLM call from the subprocess is proxied through an IPC channel back to the engine, where a BudgetInterceptor enforces caps:

  • Before each LLM call: The interceptor checks the active budget session. If the budget is exceeded, the call is rejected before the LLM provider is contacted --- no money spent.
  • After each LLM call: The interceptor accrues the reported cost and tokens to the budget session. Costs propagate up the parent chain automatically.
  • Budget kill propagation: When a budget cap is breached, a BudgetKilledException is serialized back to the subprocess, which writes it into the ResultEnvelope. The engine deserializes and re-raises it in the main process.

Block-level and workflow-level caps both work across the isolation boundary. The subprocess never sees or manipulates budget state directly --- the engine owns it entirely.

See Process Isolation for the full architecture.

When a block has its own limits, a child budget session is created with the workflow session as its parent. Every cost recorded on the child also increments the parent’s counters recursively up the chain. After recording, the engine walks the entire parent chain, so both the block’s caps and the workflow’s caps are enforced on every LLM call:

Block accrues $0.50, 1000 tokens
→ Block session: cost=$0.50, tokens=1000
→ Parent (workflow) session: cost=$0.50, tokens=1000 (propagated)

This means a workflow with cost_cap_usd: 2.00 will kill the run even if the individual block has no limit, as long as the workflow’s total cost exceeds $2.00. Sub-flow budgets are not independent --- the parent workflow’s caps are always enforced because costs propagate upward through the parent chain.

Timeouts work differently from cost and token caps:

  • Workflow timeout: The engine wraps the main execution loop with the configured max_duration_seconds. If the timeout fires, execution is killed immediately with a budget exception (limit_kind="timeout").

  • Block timeout: Each block is individually wrapped with its own max_duration_seconds. Block timeouts are independent of the workflow timeout.

When a dispatch block fans out to multiple exit branches, each branch gets an isolated child session created via create_isolated_child(branch_id=exit_id). The child inherits the parent’s exact caps (cost_cap_usd, token_cap, max_duration_seconds, on_exceed) but has no parent pointer --- it accumulates costs independently. This prevents concurrent branches from sharing mutable budget state during execution.

Each child is individually capped at the parent’s full cap value. For example, if the workflow cap is $5.00, each branch is independently allowed up to $5.00 --- not $5.00 divided by the number of branches. This means a single branch can hit the cap on its own before the others finish.

After all branches complete via asyncio.gather, each child’s totals are reconciled back to the parent session via reconcile_child(), which adds the child’s cost and token totals to the parent. Then the parent session’s caps are checked with check_or_raise().

Parent session: cost_cap=$5.00
├── Branch A (isolated, cap=$5.00): cost=$1.00
├── Branch B (isolated, cap=$5.00): cost=$2.00
├── Reconciliation: parent cost += $1.00 + $2.00 = $3.00
└── Parent cap check: $3.00 ≤ $5.00 → passes
custom/workflows/budget-example.yaml
version: "1.0"
id: budget-example
kind: workflow
limits:
cost_cap_usd: 5.00
token_cap: 200000
max_duration_seconds: 600
on_exceed: fail
warn_at_pct: 0.8
workflow:
name: budget-example
entry: research
transitions:
- from: research
to: summarize
- from: summarize
blocks:
research:
type: linear
soul_ref: researcher
limits:
cost_cap_usd: 3.00
max_duration_seconds: 300
on_exceed: fail
summarize:
type: linear
soul_ref: writer
limits:
cost_cap_usd: 1.00
on_exceed: warn

In this example:

  • The workflow will hard-stop at $5.00 total or 200k tokens or 10 minutes.
  • The research block will hard-stop at $3.00 or 5 minutes.
  • The summarize block will warn at $1.00 but continue running.
  • Both blocks’ costs propagate to the workflow total.