Assertions
Assertions are quality checks that run after a block completes. You define them directly in workflow YAML, and each assertion produces a pass/fail result plus a numeric score from 0.0 to 1.0.
Runsight ships 15 built-in deterministic assertion types. You can also add your own scanner-discovered Python assertions under custom/assertions/. See Custom Assertions for the custom plugin workflow.
Where assertions live
Section titled “Where assertions live”Assertions are defined on the assertions field of any block definition. The field accepts a list of assertion config objects:
version: "1.0"id: researchkind: workflowblocks: analyze: type: code code: | def main(data): return "analysis ready" assertions: - type: contains value: "analysis" - type: cost threshold: 0.05workflow: name: assertions_demo entry: analyze transitions: - from: analyze to: nullThe assertions field is declared on BaseBlockDef as Optional[List[Dict[str, Any]]] and defaults to None.
Assertion config fields
Section titled “Assertion config fields”Each assertion in the list is a dict with these fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
type | str | yes | — | Assertion type identifier. For custom assertions, use custom:<id> |
value | any | depends | "" | Comparison value. Required for most string and linguistic assertions |
threshold | float | no | varies | Numeric threshold. Meaning depends on assertion type |
config | any | no | None | Per-assertion config payload. Built-ins ignore it. Custom assertions receive it as context["config"] |
weight | float | no | 1.0 | Weight in the aggregate score calculation |
metric | str | no | None | Named metric label. When set, the score is stored in named_scores under this key |
transform | str | no | None | Pre-process output before evaluation. See Transform Hooks |
Built-in assertion types
Section titled “Built-in assertion types”Runsight ships 15 deterministic assertion types across four categories.
String assertions
Section titled “String assertions”| Type | Value | Behavior |
|---|---|---|
equals | str | Exact string match. Use config: {mode: json} to opt into JSON deep-equal |
contains | str | Case-sensitive substring check |
icontains | str | Case-insensitive substring check |
contains-all | list[str] | All items must appear as substrings |
contains-any | list[str] | At least one item must appear as a substring |
starts-with | str | String prefix check |
regex | str | Regex search (uses re.search, not full match) |
word-count | int or {min, max} | Exact count or range check on whitespace-split words |
Examples
Section titled “Examples”assertions: # Exact match - type: equals value: "approved"
# Case-insensitive substring - type: icontains value: "conclusion"
# All keywords must appear - type: contains-all value: ["summary", "recommendation", "next steps"]
# Any keyword is acceptable - type: contains-any value: ["approve", "accept", "pass"]
# Regex pattern - type: regex value: "\\d{4}-\\d{2}-\\d{2}"
# Word count range - type: word-count value: min: 50 max: 500Structural assertions
Section titled “Structural assertions”| Type | Value | Behavior |
|---|---|---|
is-json | dict (optional JSON Schema) | Validates output is valid JSON. If value is provided, validates against a JSON Schema |
contains-json | dict (optional JSON Schema) | Finds a JSON substring in the output. Scans for { and [ delimiters. Optional schema validation |
Examples
Section titled “Examples”assertions: # Output must be valid JSON - type: is-json
# Output must contain a JSON object matching a schema - type: contains-json value: type: object required: ["name", "score"] properties: name: type: string score: type: numberPerformance assertions
Section titled “Performance assertions”| Type | Value | Threshold | Behavior |
|---|---|---|---|
cost | — | float (USD) | Passes if cost_usd from the execution context is at or below threshold |
latency | — | float (ms) | Passes if latency_ms from the execution context is at or below threshold |
Performance assertions read from the AssertionContext, not from the block output string. In live API runs that context is populated with run metrics such as cost, latency, and tokens; offline eval uses zeroed metric fields.
assertions: - type: cost threshold: 0.10 - type: latency threshold: 5000Linguistic assertions
Section titled “Linguistic assertions”| Type | Value | Default threshold | Behavior |
|---|---|---|---|
levenshtein | str (reference text) | 5 | Edit distance between output and reference. Passes if distance <= threshold |
bleu | str (reference text) | 0.5 | BLEU-4 score with smoothing. Passes if score >= threshold |
rouge-n | str (reference text) | 0.75 | ROUGE-1 F-measure. Passes if score >= threshold |
assertions: - type: levenshtein value: "The capital of France is Paris." threshold: 10 - type: bleu value: "Machine learning models process data to find patterns." threshold: 0.3Custom assertions
Section titled “Custom assertions”Beyond the 15 built-in types, you can create your own assertions in Python. The contract is promptfoo-compatible — existing promptfoo assertion functions work with minimal changes. Custom assertions are discovered from manifest files under custom/assertions/*.yaml and are referenced by embedded assertion id:
assertions: - type: custom:tone_check config: prefix: calmImportant details:
- The runtime key is always
custom:<id>. - The manifest
nameis display-only. - Custom assertions can be used alongside built-in assertions in the same list.
- Custom assertions support the same
not-negation prefix as built-in assertions.
See Custom Assertions for the manifest format, Python contract, params schema validation, context keys, and migration guidance.
Negation with not- prefix
Section titled “Negation with not- prefix”Any assertion type can be negated by prefixing it with not-. The engine inverts both the pass/fail boolean and the score (1.0 - original):
assertions: - type: not-contains value: "error" - type: not-contains-json - type: not-custom:blocked_word config: blocked: stormNegation works for both built-in and custom assertion types.
Weighted scoring
Section titled “Weighted scoring”When a block has multiple assertions, the aggregate score is a weighted average. Each assertion’s weight (default 1.0) determines its contribution:
assertions: - type: contains value: "recommendation" weight: 2.0 - type: word-count value: min: 100 weight: 1.0 - type: cost threshold: 0.05 weight: 0.5The AssertionsResult class accumulates weighted results. Its aggregate_score property returns the weighted average: total_score / total_weight. The passed() method without a threshold returns True only if every individual assertion passed.
Execution model
Section titled “Execution model”Assertions do not share state with each other. Each configured assertion is evaluated independently against the same block result.
Runsight evaluates assertions in two contexts:
- Offline eval — assertions run concurrently for each block
- Live API runs — assertions run sequentially after each block completes
In both contexts:
- Every configured assertion produces its own result
- A transform failure on one assertion does not prevent the rest from running
- Aggregate scoring follows the same weighted scoring rules
Assertion chaining
Section titled “Assertion chaining”This is especially useful with transform hooks. Each assertion can target a different field in the block output using its own json_path transform:
assertions: - type: equals value: "completed" transform: "json_path:$.status" - type: contains value: "success" transform: "json_path:$.message" - type: cost threshold: 0.02In this example, the first assertion extracts $.status and checks for an exact match, the second extracts $.message and checks for a substring, and the third checks execution cost without any transform.
If a transform fails on one assertion (e.g., the output is not valid JSON, or the path does not exist), that assertion returns passed=False with a descriptive reason. The other assertions still run normally and produce their own results. The aggregate score and pass/fail are then computed across all of them using the standard weighted scoring rules.
How assertions fire during execution
Section titled “How assertions fire during execution”When a workflow runs via the API, assertions fire automatically — no extra configuration needed beyond defining them on your blocks:
- The engine reads each block’s
assertionslist from the workflow YAML - After a block completes, the engine evaluates all configured assertions against the block output
- Each assertion receives the block’s output text plus execution context (cost, latency, soul info, tokens)
- Results are persisted on the run record and pushed to the UI via SSE
The RunNode entity stores three eval fields:
| Field | Type | Description |
|---|---|---|
eval_score | Optional[float] | Weighted average score across all assertions on the block |
eval_passed | Optional[bool] | True when every individual assertion passed |
eval_results | Optional[Dict] | Per-assertion results including pass/fail, score, reason, and handler type when the handler sets one |
These fields are None when a block has no assertions configured.