Skip to content

Budget & Limits

The limits section controls how much a workflow or individual block is allowed to spend in cost, tokens, and wall-clock time. When a limit is breached, the engine either kills execution immediately or emits a warning and continues, depending on the on_exceed mode.

Add a limits section at the top level of your workflow YAML:

custom/workflows/research-pipeline.yaml
version: "1.0"
limits:
cost_cap_usd: 2.50
token_cap: 100000
max_duration_seconds: 300
on_exceed: fail
warn_at_pct: 0.8
workflow:
name: research-pipeline
entry: step_one
blocks:
step_one:
type: linear
soul_ref: analyst
FieldTypeDefaultConstraintsDescription
cost_cap_usdfloat?None>= 0.0Maximum total LLM cost in USD
token_capint?None>= 1Maximum total tokens (prompt + completion)
max_duration_secondsint?None1 -- 86400Wall-clock timeout for the entire workflow
on_exceed"warn" | "fail""fail"---What happens when a cap is breached
warn_at_pctfloat0.80.0 -- 1.0Percentage threshold for early warning events

All fields are optional. If you omit limits entirely, no budget enforcement is applied.

Individual blocks can have their own limits section:

custom/workflows/research-pipeline.yaml
blocks:
expensive_step:
type: linear
soul_ref: analyst
limits:
cost_cap_usd: 1.00
token_cap: 50000
max_duration_seconds: 120
on_exceed: fail
FieldTypeDefaultConstraintsDescription
cost_cap_usdfloat?None>= 0.0Maximum LLM cost for this block
token_capint?None>= 1Maximum tokens for this block
max_duration_secondsint?None1 -- 86400Wall-clock timeout for this block
on_exceed"warn" | "fail""fail"---What happens when a cap is breached

The on_exceed field controls what happens when a budget cap is breached:

The default. When any cap is exceeded, the engine raises a BudgetKilledException immediately. The exception includes structured metadata:

  • scope --- "block" or "workflow"
  • block_id --- which block triggered the breach (if block-scoped)
  • limit_kind --- "cost_usd", "token_cap", or "timeout"
  • limit_value --- the configured cap
  • actual_value --- the value that exceeded the cap

Error route interaction: If the block that triggered the exception has an error_route configured, the BudgetKilledException is caught by the workflow’s generic exception handler. The block result is written with exit_handle: "error" and error metadata, and execution continues on the error route target block. Only when no error_route exists does the exception propagate and terminate the run with status: failed.

Flow-level timeouts are the exception: When a workflow-level max_duration_seconds fires, the resulting BudgetKilledException is raised outside the block execution loop. It cannot be caught by any block’s error_route and always terminates the run unconditionally.

When a cap is exceeded, the engine logs a warning but execution continues. The run can finish normally even after exceeding a budget cap. Use this for soft budgets where you want visibility but not hard stops.

Warn mode example
limits:
cost_cap_usd: 1.00
on_exceed: warn
warn_at_pct: 0.8

With this configuration, a warning event is emitted at 80% of the cost cap ($0.80), and another when the cap is exceeded ($1.00+), but execution continues.

Budget enforcement uses Python contextvars to track the active budget session per asyncio task. The enforcement point is inside LiteLLMClient.achat() --- the single chokepoint for all LLM calls.

  1. Workflow start: If the workflow has limits, a BudgetSession is created and set as the active budget via _active_budget ContextVar.

  2. Block start: If a block has limits, a child BudgetSession is created with the workflow session as its parent. The child session replaces the active budget for the duration of that block.

  3. LLM call returns: After each achat() call, the session’s accrue() method adds the cost and tokens. If the session has a parent, costs propagate up the chain automatically.

  4. Cap check: After accrual, check_or_raise() walks the entire parent chain. If any session (block or workflow) has exceeded its cap with on_exceed: "fail", a BudgetKilledException is raised.

  5. Block end: The block’s budget session is removed and the workflow session is restored.

When a block has its own limits, the BudgetSession is created with the workflow session as parent. Every accrue() call on the child also increments the parent’s counters recursively up the chain. After accrual, check_or_raise() walks the entire parent chain, so both the block’s caps and the workflow’s caps are enforced on every LLM call:

Block accrues $0.50, 1000 tokens
→ Block session: cost=$0.50, tokens=1000
→ Parent (workflow) session: cost=$0.50, tokens=1000 (propagated)

This means a workflow with cost_cap_usd: 2.00 will kill the run even if the individual block has no limit, as long as the workflow’s total cost exceeds $2.00. Sub-flow budgets are not independent --- the parent workflow’s caps are always enforced because costs propagate upward through the parent chain.

Timeouts work differently from cost and token caps:

  • Workflow timeout: Workflow.run() wraps the main execution loop in asyncio.wait_for(timeout=max_duration_seconds). If the timeout fires, a BudgetKilledException is raised with limit_kind="timeout".

  • Block timeout: execute_block() wraps the individual block dispatch in asyncio.wait_for(timeout=max_duration_seconds). Block timeouts are independent of the workflow timeout.

When a dispatch block fans out to multiple exit branches, each branch gets an isolated child session created via create_isolated_child(branch_id=exit_id). The child inherits the parent’s exact caps (cost_cap_usd, token_cap, max_duration_seconds, on_exceed) but has no parent pointer --- it accumulates costs independently. This prevents concurrent branches from sharing mutable budget state during execution.

Each child is individually capped at the parent’s full cap value. For example, if the workflow cap is $5.00, each branch is independently allowed up to $5.00 --- not $5.00 divided by the number of branches. This means a single branch can hit the cap on its own before the others finish.

After all branches complete via asyncio.gather, each child’s totals are reconciled back to the parent session via reconcile_child(), which adds the child’s cost and token totals to the parent. Then the parent session’s caps are checked with check_or_raise().

Parent session: cost_cap=$5.00
├── Branch A (isolated, cap=$5.00): cost=$1.00
├── Branch B (isolated, cap=$5.00): cost=$2.00
├── Reconciliation: parent cost += $1.00 + $2.00 = $3.00
└── Parent cap check: $3.00 ≤ $5.00 → passes
custom/workflows/budget-example.yaml
version: "1.0"
limits:
cost_cap_usd: 5.00
token_cap: 200000
max_duration_seconds: 600
on_exceed: fail
warn_at_pct: 0.8
workflow:
name: budget-example
entry: research
transitions:
- from: research
to: summarize
- from: summarize
blocks:
research:
type: linear
soul_ref: researcher
limits:
cost_cap_usd: 3.00
max_duration_seconds: 300
on_exceed: fail
summarize:
type: linear
soul_ref: writer
limits:
cost_cap_usd: 1.00
on_exceed: warn

In this example:

  • The workflow will hard-stop at $5.00 total or 200k tokens or 10 minutes.
  • The research block will hard-stop at $3.00 or 5 minutes.
  • The summarize block will warn at $1.00 but continue running.
  • Both blocks’ costs propagate to the workflow total.