Budget & Limits

The limits section controls how much a workflow or individual block is allowed to spend in cost, tokens, and wall-clock time. When a limit is breached, the engine either kills execution immediately or emits a warning and continues, depending on the on_exceed mode.

Workflow-level limits

Add a limits section at the top level of your workflow YAML:

version: "1.0"
id: research-pipeline
kind: workflow
limits:
  cost_cap_usd: 2.50
  token_cap: 100000
  max_duration_seconds: 300
  on_exceed: fail
  warn_at_pct: 0.8
workflow:
  name: research-pipeline
  entry: step_one
blocks:
  step_one:
    type: linear
    soul_ref: analyst

WorkflowLimitsDef fields

Field	Type	Default	Constraints	Description
`cost_cap_usd`	`float?`	`None`	`>= 0.0`	Maximum total LLM cost in USD
`token_cap`	`int?`	`None`	`>= 1`	Maximum total tokens (prompt + completion)
`max_duration_seconds`	`int?`	`None`	`1 -- 86400`	Wall-clock timeout for the entire workflow
`on_exceed`	`"warn" \| "fail"`	`"fail"`	---	What happens when a cap is breached
`warn_at_pct`	`float`	`0.8`	`0.0 -- 1.0`	Percentage threshold for early warning events

All fields are optional. If you omit limits entirely, no budget enforcement is applied.

Per-block limits

Individual blocks can have their own limits section:

blocks:
  expensive_step:
    type: linear
    soul_ref: analyst
    limits:
      cost_cap_usd: 1.00
      token_cap: 50000
      max_duration_seconds: 120
      on_exceed: fail

BlockLimitsDef fields

Field	Type	Default	Constraints	Description
`cost_cap_usd`	`float?`	`None`	`>= 0.0`	Maximum LLM cost for this block
`token_cap`	`int?`	`None`	`>= 1`	Maximum tokens for this block
`max_duration_seconds`	`int?`	`None`	`1 -- 86400`	Wall-clock timeout for this block
`on_exceed`	`"warn" \| "fail"`	`"fail"`	---	What happens when a cap is breached

Warn mode vs kill mode

The on_exceed field controls what happens when a budget cap is breached:

Kill mode (`on_exceed: "fail"`)

The default. When any cap is exceeded, the engine raises a BudgetKilledException immediately. The exception includes structured metadata:

scope --- "block" or "workflow"
block_id --- which block triggered the breach (if block-scoped)
limit_kind --- "cost_usd", "token_cap", or "timeout"
limit_value --- the configured cap
actual_value --- the value that exceeded the cap

Error route interaction: If the block that triggered the exception has an error_route configured, the BudgetKilledException is caught by the workflow’s generic exception handler. The block result is written with exit_handle: "error" and error metadata, and execution continues on the error route target block. Only when no error_route exists does the exception propagate and terminate the run with status: failed.

Flow-level timeouts are the exception: When a workflow-level max_duration_seconds fires, the resulting BudgetKilledException is raised outside the block execution loop. It cannot be caught by any block’s error_route and always terminates the run unconditionally.

Warn mode (`on_exceed: "warn"`)

When a cap is exceeded, the engine logs a warning but execution continues. The run can finish normally even after exceeding a budget cap. Use this for soft budgets where you want visibility but not hard stops.

limits:
  cost_cap_usd: 1.00
  on_exceed: warn
  warn_at_pct: 0.8

With this configuration, a warning event is emitted at 80% of the cost cap ($0.80), and another when the cap is exceeded ($1.00+), but execution continues.

How enforcement works

Budget enforcement tracks the active budget session per async task. Every LLM call passes through a single chokepoint where cost and token limits are checked --- whether the block runs in the engine process or in an isolated subprocess.

The enforcement chain

Workflow start: If the workflow has limits, a budget session is created and set as the active budget for the run.
Block start: If a block has limits, a child budget session is created with the workflow session as its parent. The child replaces the active budget for the duration of that block.
LLM call returns: After each LLM call, the session records the cost and tokens. If the session has a parent, costs propagate up the chain automatically.
Cap check: After recording, the engine walks the entire parent chain. If any session (block or workflow) has exceeded its cap with on_exceed: "fail", execution is killed immediately.
Block end: The block’s budget session is removed and the workflow session is restored.

Enforcement across process isolation

LLM blocks run in isolated subprocesses that have no direct access to API keys or budget state. Budget enforcement still works because every LLM call from the subprocess is proxied through an IPC channel back to the engine, where a BudgetInterceptor enforces caps:

Before each LLM call: The interceptor checks the active budget session. If the budget is exceeded, the call is rejected before the LLM provider is contacted --- no money spent.
After each LLM call: The interceptor accrues the reported cost and tokens to the budget session. Costs propagate up the parent chain automatically.
Budget kill propagation: When a budget cap is breached, a BudgetKilledException is serialized back to the subprocess, which writes it into the ResultEnvelope. The engine deserializes and re-raises it in the main process.

Block-level and workflow-level caps both work across the isolation boundary. The subprocess never sees or manipulates budget state directly --- the engine owns it entirely.

See Process Isolation for the full architecture.

Parent propagation

When a block has its own limits, a child budget session is created with the workflow session as its parent. Every cost recorded on the child also increments the parent’s counters recursively up the chain. After recording, the engine walks the entire parent chain, so both the block’s caps and the workflow’s caps are enforced on every LLM call:

Block accrues $0.50, 1000 tokens
  → Block session: cost=$0.50, tokens=1000
  → Parent (workflow) session: cost=$0.50, tokens=1000 (propagated)

This means a workflow with cost_cap_usd: 2.00 will kill the run even if the individual block has no limit, as long as the workflow’s total cost exceeds $2.00. Sub-flow budgets are not independent --- the parent workflow’s caps are always enforced because costs propagate upward through the parent chain.

Timeout enforcement

Timeouts work differently from cost and token caps:

Workflow timeout: The engine wraps the main execution loop with the configured max_duration_seconds. If the timeout fires, execution is killed immediately with a budget exception (limit_kind="timeout").
Block timeout: Each block is individually wrapped with its own max_duration_seconds. Block timeouts are independent of the workflow timeout.

Dispatch branch isolation

When a dispatch block fans out to multiple exit branches, each branch gets an isolated child session created via create_isolated_child(branch_id=exit_id). The child inherits the parent’s exact caps (cost_cap_usd, token_cap, max_duration_seconds, on_exceed) but has no parent pointer --- it accumulates costs independently. This prevents concurrent branches from sharing mutable budget state during execution.

Each child is individually capped at the parent’s full cap value. For example, if the workflow cap is $5.00, each branch is independently allowed up to $5.00 --- not $5.00 divided by the number of branches. This means a single branch can hit the cap on its own before the others finish.

After all branches complete via asyncio.gather, each child’s totals are reconciled back to the parent session via reconcile_child(), which adds the child’s cost and token totals to the parent. Then the parent session’s caps are checked with check_or_raise().

Parent session: cost_cap=$5.00
  ├── Branch A (isolated, cap=$5.00): cost=$1.00
  ├── Branch B (isolated, cap=$5.00): cost=$2.00
  ├── Reconciliation: parent cost += $1.00 + $2.00 = $3.00
  └── Parent cap check: $3.00 ≤ $5.00 → passes

Complete example

version: "1.0"
id: budget-example
kind: workflow
limits:
  cost_cap_usd: 5.00
  token_cap: 200000
  max_duration_seconds: 600
  on_exceed: fail
  warn_at_pct: 0.8
workflow:
  name: budget-example
  entry: research
  transitions:
    - from: research
      to: summarize
    - from: summarize
blocks:
  research:
    type: linear
    soul_ref: researcher
    limits:
      cost_cap_usd: 3.00
      max_duration_seconds: 300
      on_exceed: fail
  summarize:
    type: linear
    soul_ref: writer
    limits:
      cost_cap_usd: 1.00
      on_exceed: warn

In this example:

The workflow will hard-stop at $5.00 total or 200k tokens or 10 minutes.
The research block will hard-stop at $3.00 or 5 minutes.
The summarize block will warn at $1.00 but continue running.
Both blocks’ costs propagate to the workflow total.