Assertion Reference
Assertions validate block outputs against expected values. They are used in the assertions field on any block and in the eval section’s expected entries. Each assertion is a dict with at minimum a type field.
Assertion config fields
Section titled “Assertion config fields”Every assertion config supports these fields:
| Field | Type | Default | Description |
|---|---|---|---|
type | str | — | Assertion type name (see sections below). Prefix with not- to negate. |
value | Any | "" | Comparison value (type-specific) |
threshold | float | varies | Pass threshold (type-specific) |
weight | float | 1.0 | Weight for aggregated scoring |
metric | str | none | Named score key for tracking |
transform | str | none | Pre-processing transform (see Transform hooks) |
String assertions
Section titled “String assertions”equals
Section titled “equals”Exact string match or JSON deep-equal. Tries JSON parsing first; falls back to string comparison.
| Parameter | Type | Description |
|---|---|---|
value | Any | Expected value (string or JSON) |
- type: equals value: "expected output"contains
Section titled “contains”Case-sensitive substring check.
| Parameter | Type | Description |
|---|---|---|
value | str | Substring to find |
- type: contains value: "machine learning"icontains
Section titled “icontains”Case-insensitive substring check.
| Parameter | Type | Description |
|---|---|---|
value | str | Substring to find (case-insensitive) |
- type: icontains value: "Machine Learning"contains-all
Section titled “contains-all”All items in the value list must be present as substrings.
| Parameter | Type | Description |
|---|---|---|
value | List[str] | List of substrings that must all be present |
- type: contains-all value: ["introduction", "methodology", "conclusion"]contains-any
Section titled “contains-any”At least one item in the value list must be present as a substring.
| Parameter | Type | Description |
|---|---|---|
value | List[str] | List of candidate substrings |
- type: contains-any value: ["approved", "accepted", "passed"]starts-with
Section titled “starts-with”String prefix check.
| Parameter | Type | Description |
|---|---|---|
value | str | Expected prefix |
- type: starts-with value: "Summary:"Regex search match (uses re.search, not full match).
| Parameter | Type | Description |
|---|---|---|
value | str | Regular expression pattern |
- type: regex value: "\\d{4}-\\d{2}-\\d{2}"word-count
Section titled “word-count”Word count check. Supports exact count or min/max range.
| Parameter | Type | Description |
|---|---|---|
value | int or {min, max} | Exact word count, or a dict with optional min and max bounds |
# Exact count- type: word-count value: 100
# Range- type: word-count value: min: 50 max: 200Structural assertions
Section titled “Structural assertions”is-json
Section titled “is-json”Validates that the output is valid JSON. Optionally validates against a JSON Schema.
| Parameter | Type | Description |
|---|---|---|
value | Dict (JSON Schema) or null | Optional JSON Schema to validate against |
# Just check valid JSON- type: is-json
# With schema validation- type: is-json value: type: object required: ["title", "body"] properties: title: type: string body: type: stringcontains-json
Section titled “contains-json”Finds a valid JSON substring in the output. Scans for { and [ delimiters and attempts to parse. Optionally validates the extracted JSON against a JSON Schema.
| Parameter | Type | Description |
|---|---|---|
value | Dict (JSON Schema) or null | Optional JSON Schema to validate extracted JSON against |
- type: contains-json value: type: object required: ["status"]Linguistic assertions
Section titled “Linguistic assertions”levenshtein
Section titled “levenshtein”Edit distance between the output and a reference string must be at or below the threshold.
| Parameter | Type | Default | Description |
|---|---|---|---|
value | str | "" | Reference string to compare against |
threshold | float | 5 | Maximum allowed edit distance |
- type: levenshtein value: "expected text" threshold: 10BLEU-4 score against a reference string must be at or above the threshold. Uses smoothing method 1 (add-one).
| Parameter | Type | Default | Description |
|---|---|---|---|
value | str | "" | Reference text |
threshold | float | 0.5 | Minimum BLEU score |
- type: bleu value: "The expected reference text for comparison." threshold: 0.4rouge-n
Section titled “rouge-n”ROUGE-1 F-measure against a reference string must be at or above the threshold. Uses the rouge-score library.
| Parameter | Type | Default | Description |
|---|---|---|---|
value | str | "" | Reference text |
threshold | float | 0.75 | Minimum ROUGE-N score |
- type: rouge-n value: "The reference summary text." threshold: 0.6Performance assertions
Section titled “Performance assertions”Checks that the block’s cost_usd is within the threshold. Reads from AssertionContext.cost_usd, not from the output string.
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | float | 0.0 | Maximum cost in USD |
- type: cost threshold: 0.50latency
Section titled “latency”Checks that the block’s latency_ms is within the threshold. Reads from AssertionContext.latency_ms, not from the output string.
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | float | 0.0 | Maximum latency in milliseconds |
- type: latency threshold: 5000Negation prefix
Section titled “Negation prefix”Any assertion type can be negated by prefixing with not-. The result is inverted: passed becomes not passed, and score becomes 1.0 - score.
- type: not-contains value: "error"
- type: not-is-jsonTransform hooks
Section titled “Transform hooks”Transforms pre-process the block output before the assertion evaluates it. Specified via the transform field on any assertion config.
json_path
Section titled “json_path”Extracts a value from JSON output using JSONPath syntax (via the jsonpath-ng library). The extracted value is converted to a string and passed to the assertion.
Format: json_path:<expression>
- type: contains value: "active" transform: "json_path:$.status"
- type: equals value: "42" transform: "json_path:$.data.count"If the output is not valid JSON or the path is not found, the assertion fails with a descriptive error before the actual assertion runs.
Scoring and aggregation
Section titled “Scoring and aggregation”Assertions are scored and aggregated using weighted averages:
- Each assertion produces a
GradingResultwithpassed(bool) andscore(0.0—1.0). - Weights default to
1.0and are used for the weighted average (aggregate_score). - In eval mode, the suite passes if
aggregate_score >= threshold(default threshold:1.0). - Without a threshold, all individual assertions must pass.
Using in eval cases
Section titled “Using in eval cases”eval: threshold: 0.8 cases: - id: test_summary fixtures: summarize: "Machine learning is a subset of AI that enables systems to learn." expected: summarize: - type: contains value: "machine learning" - type: word-count value: min: 5 max: 50 - type: not-contains value: "error"Using on blocks
Section titled “Using on blocks”blocks: research: type: linear soul_ref: researcher assertions: - type: is-json - type: word-count value: min: 100