Budget & Limits
The limits section controls how much a workflow or individual block is allowed to spend in cost, tokens, and wall-clock time. When a limit is breached, the engine either kills execution immediately or emits a warning and continues, depending on the on_exceed mode.
Workflow-level limits
Section titled “Workflow-level limits”Add a limits section at the top level of your workflow YAML:
version: "1.0"id: research-pipelinekind: workflowlimits: cost_cap_usd: 2.50 token_cap: 100000 max_duration_seconds: 300 on_exceed: fail warn_at_pct: 0.8workflow: name: research-pipeline entry: step_oneblocks: step_one: type: linear soul_ref: analystWorkflowLimitsDef fields
Section titled “WorkflowLimitsDef fields”| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
cost_cap_usd | float? | None | >= 0.0 | Maximum total LLM cost in USD |
token_cap | int? | None | >= 1 | Maximum total tokens (prompt + completion) |
max_duration_seconds | int? | None | 1 -- 86400 | Wall-clock timeout for the entire workflow |
on_exceed | "warn" | "fail" | "fail" | --- | What happens when a cap is breached |
warn_at_pct | float | 0.8 | 0.0 -- 1.0 | Percentage threshold for early warning events |
All fields are optional. If you omit limits entirely, no budget enforcement is applied.
Per-block limits
Section titled “Per-block limits”Individual blocks can have their own limits section:
blocks: expensive_step: type: linear soul_ref: analyst limits: cost_cap_usd: 1.00 token_cap: 50000 max_duration_seconds: 120 on_exceed: failBlockLimitsDef fields
Section titled “BlockLimitsDef fields”| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
cost_cap_usd | float? | None | >= 0.0 | Maximum LLM cost for this block |
token_cap | int? | None | >= 1 | Maximum tokens for this block |
max_duration_seconds | int? | None | 1 -- 86400 | Wall-clock timeout for this block |
on_exceed | "warn" | "fail" | "fail" | --- | What happens when a cap is breached |
Warn mode vs kill mode
Section titled “Warn mode vs kill mode”The on_exceed field controls what happens when a budget cap is breached:
Kill mode (on_exceed: "fail")
Section titled “Kill mode (on_exceed: "fail")”The default. When any cap is exceeded, the engine raises a BudgetKilledException immediately. The exception includes structured metadata:
scope---"block"or"workflow"block_id--- which block triggered the breach (if block-scoped)limit_kind---"cost_usd","token_cap", or"timeout"limit_value--- the configured capactual_value--- the value that exceeded the cap
Error route interaction: If the block that triggered the exception has an error_route configured, the BudgetKilledException is caught by the workflow’s generic exception handler. The block result is written with exit_handle: "error" and error metadata, and execution continues on the error route target block. Only when no error_route exists does the exception propagate and terminate the run with status: failed.
Flow-level timeouts are the exception: When a workflow-level max_duration_seconds fires, the resulting BudgetKilledException is raised outside the block execution loop. It cannot be caught by any block’s error_route and always terminates the run unconditionally.
Warn mode (on_exceed: "warn")
Section titled “Warn mode (on_exceed: "warn")”When a cap is exceeded, the engine logs a warning but execution continues. The run can finish normally even after exceeding a budget cap. Use this for soft budgets where you want visibility but not hard stops.
limits: cost_cap_usd: 1.00 on_exceed: warn warn_at_pct: 0.8With this configuration, a warning event is emitted at 80% of the cost cap ($0.80), and another when the cap is exceeded ($1.00+), but execution continues.
How enforcement works
Section titled “How enforcement works”Budget enforcement tracks the active budget session per async task. Every LLM call passes through a single chokepoint where cost and token limits are checked --- whether the block runs in the engine process or in an isolated subprocess.
The enforcement chain
Section titled “The enforcement chain”-
Workflow start: If the workflow has
limits, a budget session is created and set as the active budget for the run. -
Block start: If a block has
limits, a child budget session is created with the workflow session as its parent. The child replaces the active budget for the duration of that block. -
LLM call returns: After each LLM call, the session records the cost and tokens. If the session has a parent, costs propagate up the chain automatically.
-
Cap check: After recording, the engine walks the entire parent chain. If any session (block or workflow) has exceeded its cap with
on_exceed: "fail", execution is killed immediately. -
Block end: The block’s budget session is removed and the workflow session is restored.
Enforcement across process isolation
Section titled “Enforcement across process isolation”LLM blocks run in isolated subprocesses that have no direct access to API keys or budget state. Budget enforcement still works because every LLM call from the subprocess is proxied through an IPC channel back to the engine, where a BudgetInterceptor enforces caps:
- Before each LLM call: The interceptor checks the active budget session. If the budget is exceeded, the call is rejected before the LLM provider is contacted --- no money spent.
- After each LLM call: The interceptor accrues the reported cost and tokens to the budget session. Costs propagate up the parent chain automatically.
- Budget kill propagation: When a budget cap is breached, a
BudgetKilledExceptionis serialized back to the subprocess, which writes it into theResultEnvelope. The engine deserializes and re-raises it in the main process.
Block-level and workflow-level caps both work across the isolation boundary. The subprocess never sees or manipulates budget state directly --- the engine owns it entirely.
See Process Isolation for the full architecture.
Parent propagation
Section titled “Parent propagation”When a block has its own limits, a child budget session is created with the workflow session as its parent. Every cost recorded on the child also increments the parent’s counters recursively up the chain. After recording, the engine walks the entire parent chain, so both the block’s caps and the workflow’s caps are enforced on every LLM call:
Block accrues $0.50, 1000 tokens → Block session: cost=$0.50, tokens=1000 → Parent (workflow) session: cost=$0.50, tokens=1000 (propagated)This means a workflow with cost_cap_usd: 2.00 will kill the run even if the individual block has no limit, as long as the workflow’s total cost exceeds $2.00. Sub-flow budgets are not independent --- the parent workflow’s caps are always enforced because costs propagate upward through the parent chain.
Timeout enforcement
Section titled “Timeout enforcement”Timeouts work differently from cost and token caps:
-
Workflow timeout: The engine wraps the main execution loop with the configured
max_duration_seconds. If the timeout fires, execution is killed immediately with a budget exception (limit_kind="timeout"). -
Block timeout: Each block is individually wrapped with its own
max_duration_seconds. Block timeouts are independent of the workflow timeout.
Dispatch branch isolation
Section titled “Dispatch branch isolation”When a dispatch block fans out to multiple exit branches, each branch gets an isolated child session created via create_isolated_child(branch_id=exit_id). The child inherits the parent’s exact caps (cost_cap_usd, token_cap, max_duration_seconds, on_exceed) but has no parent pointer --- it accumulates costs independently. This prevents concurrent branches from sharing mutable budget state during execution.
Each child is individually capped at the parent’s full cap value. For example, if the workflow cap is $5.00, each branch is independently allowed up to $5.00 --- not $5.00 divided by the number of branches. This means a single branch can hit the cap on its own before the others finish.
After all branches complete via asyncio.gather, each child’s totals are reconciled back to the parent session via reconcile_child(), which adds the child’s cost and token totals to the parent. Then the parent session’s caps are checked with check_or_raise().
Parent session: cost_cap=$5.00 ├── Branch A (isolated, cap=$5.00): cost=$1.00 ├── Branch B (isolated, cap=$5.00): cost=$2.00 ├── Reconciliation: parent cost += $1.00 + $2.00 = $3.00 └── Parent cap check: $3.00 ≤ $5.00 → passesComplete example
Section titled “Complete example”version: "1.0"id: budget-examplekind: workflowlimits: cost_cap_usd: 5.00 token_cap: 200000 max_duration_seconds: 600 on_exceed: fail warn_at_pct: 0.8workflow: name: budget-example entry: research transitions: - from: research to: summarize - from: summarizeblocks: research: type: linear soul_ref: researcher limits: cost_cap_usd: 3.00 max_duration_seconds: 300 on_exceed: fail summarize: type: linear soul_ref: writer limits: cost_cap_usd: 1.00 on_exceed: warnIn this example:
- The workflow will hard-stop at $5.00 total or 200k tokens or 10 minutes.
- The
researchblock will hard-stop at $3.00 or 5 minutes. - The
summarizeblock will warn at $1.00 but continue running. - Both blocks’ costs propagate to the workflow total.