Skip to content

Assertion Reference

Assertions validate block outputs against expected values. They are used in the assertions field on any block and in the eval section’s expected entries. Each assertion is a dict with at minimum a type field.

Every assertion config supports these fields:

FieldTypeDefaultDescription
typestrAssertion type name (see sections below). Prefix with not- to negate.
valueAny""Comparison value (type-specific)
thresholdfloatvariesPass threshold (type-specific)
weightfloat1.0Weight for aggregated scoring
metricstrnoneNamed score key for tracking
transformstrnonePre-processing transform (see Transform hooks)

Exact string match or JSON deep-equal. Tries JSON parsing first; falls back to string comparison.

ParameterTypeDescription
valueAnyExpected value (string or JSON)
- type: equals
value: "expected output"

Case-sensitive substring check.

ParameterTypeDescription
valuestrSubstring to find
- type: contains
value: "machine learning"

Case-insensitive substring check.

ParameterTypeDescription
valuestrSubstring to find (case-insensitive)
- type: icontains
value: "Machine Learning"

All items in the value list must be present as substrings.

ParameterTypeDescription
valueList[str]List of substrings that must all be present
- type: contains-all
value: ["introduction", "methodology", "conclusion"]

At least one item in the value list must be present as a substring.

ParameterTypeDescription
valueList[str]List of candidate substrings
- type: contains-any
value: ["approved", "accepted", "passed"]

String prefix check.

ParameterTypeDescription
valuestrExpected prefix
- type: starts-with
value: "Summary:"

Regex search match (uses re.search, not full match).

ParameterTypeDescription
valuestrRegular expression pattern
- type: regex
value: "\\d{4}-\\d{2}-\\d{2}"

Word count check. Supports exact count or min/max range.

ParameterTypeDescription
valueint or {min, max}Exact word count, or a dict with optional min and max bounds
# Exact count
- type: word-count
value: 100
# Range
- type: word-count
value:
min: 50
max: 200

Validates that the output is valid JSON. Optionally validates against a JSON Schema.

ParameterTypeDescription
valueDict (JSON Schema) or nullOptional JSON Schema to validate against
# Just check valid JSON
- type: is-json
# With schema validation
- type: is-json
value:
type: object
required: ["title", "body"]
properties:
title:
type: string
body:
type: string

Finds a valid JSON substring in the output. Scans for { and [ delimiters and attempts to parse. Optionally validates the extracted JSON against a JSON Schema.

ParameterTypeDescription
valueDict (JSON Schema) or nullOptional JSON Schema to validate extracted JSON against
- type: contains-json
value:
type: object
required: ["status"]

Edit distance between the output and a reference string must be at or below the threshold.

ParameterTypeDefaultDescription
valuestr""Reference string to compare against
thresholdfloat5Maximum allowed edit distance
- type: levenshtein
value: "expected text"
threshold: 10

BLEU-4 score against a reference string must be at or above the threshold. Uses smoothing method 1 (add-one).

ParameterTypeDefaultDescription
valuestr""Reference text
thresholdfloat0.5Minimum BLEU score
- type: bleu
value: "The expected reference text for comparison."
threshold: 0.4

ROUGE-1 F-measure against a reference string must be at or above the threshold. Uses the rouge-score library.

ParameterTypeDefaultDescription
valuestr""Reference text
thresholdfloat0.75Minimum ROUGE-N score
- type: rouge-n
value: "The reference summary text."
threshold: 0.6

Checks that the block’s cost_usd is within the threshold. Reads from AssertionContext.cost_usd, not from the output string.

ParameterTypeDefaultDescription
thresholdfloat0.0Maximum cost in USD
- type: cost
threshold: 0.50

Checks that the block’s latency_ms is within the threshold. Reads from AssertionContext.latency_ms, not from the output string.

ParameterTypeDefaultDescription
thresholdfloat0.0Maximum latency in milliseconds
- type: latency
threshold: 5000

Any assertion type can be negated by prefixing with not-. The result is inverted: passed becomes not passed, and score becomes 1.0 - score.

- type: not-contains
value: "error"
- type: not-is-json

Transforms pre-process the block output before the assertion evaluates it. Specified via the transform field on any assertion config.

Extracts a value from JSON output using JSONPath syntax (via the jsonpath-ng library). The extracted value is converted to a string and passed to the assertion.

Format: json_path:<expression>

- type: contains
value: "active"
transform: "json_path:$.status"
- type: equals
value: "42"
transform: "json_path:$.data.count"

If the output is not valid JSON or the path is not found, the assertion fails with a descriptive error before the actual assertion runs.


Assertions are scored and aggregated using weighted averages:

  • Each assertion produces a GradingResult with passed (bool) and score (0.0—1.0).
  • Weights default to 1.0 and are used for the weighted average (aggregate_score).
  • In eval mode, the suite passes if aggregate_score >= threshold (default threshold: 1.0).
  • Without a threshold, all individual assertions must pass.
eval section with assertions
eval:
threshold: 0.8
cases:
- id: test_summary
fixtures:
summarize: "Machine learning is a subset of AI that enables systems to learn."
expected:
summarize:
- type: contains
value: "machine learning"
- type: word-count
value:
min: 5
max: 50
- type: not-contains
value: "error"
block-level assertions
blocks:
research:
type: linear
soul_ref: researcher
assertions:
- type: is-json
- type: word-count
value:
min: 100