Assertions
Assertions are quality checks that run automatically after a block completes. You define them directly on a block in your workflow YAML, and the engine evaluates each one against the block’s output. Every assertion produces a pass/fail result and a numeric score (0.0 to 1.0).
Where assertions live
Section titled “Where assertions live”Assertions are defined on the assertions field of any block definition. The field accepts a list of assertion config objects:
blocks: analyze: type: llm soul_ref: analyst assertions: - type: contains value: "analysis" - type: cost threshold: 0.05The assertions field is declared on BaseBlockDef as Optional[List[Dict[str, Any]]] and defaults to None. Every block type inherits it.
Assertion config fields
Section titled “Assertion config fields”Each assertion in the list is a dict with these fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
type | str | yes | — | Assertion type identifier (see table below) |
value | any | depends | "" | Comparison value. Required for string assertions, optional for performance assertions |
threshold | float | no | varies | Numeric threshold. Meaning depends on assertion type |
weight | float | no | 1.0 | Weight in the aggregate score calculation |
metric | str | no | None | Named metric label. When set, the score is stored in named_scores under this key |
transform | str | no | None | Pre-process output before evaluation. See Transform Hooks |
Assertion types
Section titled “Assertion types”Runsight ships 15 deterministic assertion types across four categories.
String assertions
Section titled “String assertions”| Type | Value | Behavior |
|---|---|---|
equals | str | Exact string match. Also attempts JSON deep-equal if both sides parse as JSON |
contains | str | Case-sensitive substring check |
icontains | str | Case-insensitive substring check |
contains-all | list[str] | All items must appear as substrings |
contains-any | list[str] | At least one item must appear as a substring |
starts-with | str | String prefix check |
regex | str | Regex search (uses re.search, not full match) |
word-count | int or {min, max} | Exact count or range check on whitespace-split words |
Examples
Section titled “Examples”assertions: # Exact match - type: equals value: "approved"
# Case-insensitive substring - type: icontains value: "conclusion"
# All keywords must appear - type: contains-all value: ["summary", "recommendation", "next steps"]
# Any keyword is acceptable - type: contains-any value: ["approve", "accept", "pass"]
# Regex pattern - type: regex value: "\\d{4}-\\d{2}-\\d{2}"
# Word count range - type: word-count value: min: 50 max: 500Structural assertions
Section titled “Structural assertions”| Type | Value | Behavior |
|---|---|---|
is-json | dict (optional JSON Schema) | Validates output is valid JSON. If value is provided, validates against a JSON Schema |
contains-json | dict (optional JSON Schema) | Finds a JSON substring in the output. Scans for { and [ delimiters. Optional schema validation |
Examples
Section titled “Examples”assertions: # Output must be valid JSON - type: is-json
# Output must contain a JSON object matching a schema - type: contains-json value: type: object required: ["name", "score"] properties: name: type: string score: type: numberPerformance assertions
Section titled “Performance assertions”| Type | Value | Threshold | Behavior |
|---|---|---|---|
cost | — | float (USD) | Passes if cost_usd from the execution context is at or below threshold |
latency | — | float (ms) | Passes if latency_ms from the execution context is at or below threshold |
Performance assertions read from the AssertionContext, not from the block output string. The context is populated with actual execution metrics (cost, latency, tokens) from the RunNode record.
assertions: - type: cost threshold: 0.10 - type: latency threshold: 5000Linguistic assertions
Section titled “Linguistic assertions”| Type | Value | Default threshold | Behavior |
|---|---|---|---|
levenshtein | str (reference text) | 5 | Edit distance between output and reference. Passes if distance <= threshold |
bleu | str (reference text) | 0.5 | BLEU-4 score with smoothing. Passes if score >= threshold |
rouge-n | str (reference text) | 0.75 | ROUGE-1 F-measure. Passes if score >= threshold |
assertions: - type: levenshtein value: "The capital of France is Paris." threshold: 10 - type: bleu value: "Machine learning models process data to find patterns." threshold: 0.3Negation with not- prefix
Section titled “Negation with not- prefix”Any assertion type can be negated by prefixing it with not-. The engine inverts both the pass/fail boolean and the score (1.0 - original):
assertions: - type: not-contains value: "error" - type: not-contains-jsonWeighted scoring
Section titled “Weighted scoring”When a block has multiple assertions, the aggregate score is a weighted average. Each assertion’s weight (default 1.0) determines its contribution:
assertions: - type: contains value: "recommendation" weight: 2.0 - type: word-count value: min: 100 weight: 1.0 - type: cost threshold: 0.05 weight: 0.5The AssertionsResult class accumulates weighted results. Its aggregate_score property returns the weighted average: total_score / total_weight. The passed() method without a threshold returns True only if every individual assertion passed.
Assertion chaining
Section titled “Assertion chaining”A block can have any number of assertions. Each assertion runs independently — they do not share state or depend on each other’s results. The engine runs all of them concurrently via asyncio.gather, so a failure in one assertion never prevents the others from completing.
This is especially useful with transform hooks. Each assertion can target a different field in the block output using its own json_path transform:
blocks: process_order: type: llm soul_ref: processor assertions: - type: equals value: "completed" transform: "json_path:$.status" - type: contains value: "success" transform: "json_path:$.message" - type: cost threshold: 0.02In this example, the first assertion extracts $.status and checks for an exact match, the second extracts $.message and checks for a substring, and the third checks execution cost without any transform. All three run in parallel.
If a transform fails on one assertion (e.g., the output is not valid JSON, or the path does not exist), that assertion returns passed=False with a descriptive reason. The other assertions still run normally and produce their own results. The aggregate score and pass/fail are then computed across all of them using the standard weighted scoring rules.
How assertions fire during execution
Section titled “How assertions fire during execution”When a workflow runs via the API, the engine wires assertions automatically:
- The
ExecutionService._build_assertion_configsmethod readsassertionsfrom each block in the parsed workflow - An
EvalObserveris created with the assertion configs and attached to the workflow run - After each block completes,
EvalObserver.on_block_completefires:- Extracts the block output from
WorkflowState - Builds an
AssertionContextwith output, cost, latency, soul info, and tokens - Runs all assertion configs for that block
- Persists
eval_score,eval_passed, andeval_resultson theRunNoderecord - Emits a
node_eval_completeSSE event with scores, pass/fail, and baseline delta
- Extracts the block output from
The RunNode entity stores three eval fields:
| Field | Type | Description |
|---|---|---|
eval_score | Optional[float] | Weighted average score across all assertions (0.0 to 1.0) |
eval_passed | Optional[bool] | True if all individual assertions passed |
eval_results | Optional[Dict] | Detailed results with per-assertion type, passed, score, and reason |
These fields are None when a block has no assertions configured.