Skip to content

Assertions

Assertions are quality checks that run after a block completes. You define them directly in workflow YAML, and each assertion produces a pass/fail result plus a numeric score from 0.0 to 1.0.

Runsight ships 15 built-in deterministic assertion types. You can also add your own scanner-discovered Python assertions under custom/assertions/. See Custom Assertions for the custom plugin workflow.

Assertions are defined on the assertions field of any block definition. The field accepts a list of assertion config objects:

custom/workflows/research.yaml
version: "1.0"
id: research
kind: workflow
blocks:
analyze:
type: code
code: |
def main(data):
return "analysis ready"
assertions:
- type: contains
value: "analysis"
- type: cost
threshold: 0.05
workflow:
name: assertions_demo
entry: analyze
transitions:
- from: analyze
to: null

The assertions field is declared on BaseBlockDef as Optional[List[Dict[str, Any]]] and defaults to None.

Each assertion in the list is a dict with these fields:

FieldTypeRequiredDefaultDescription
typestryesAssertion type identifier. For custom assertions, use custom:<id>
valueanydepends""Comparison value. Required for most string and linguistic assertions
thresholdfloatnovariesNumeric threshold. Meaning depends on assertion type
configanynoNonePer-assertion config payload. Built-ins ignore it. Custom assertions receive it as context["config"]
weightfloatno1.0Weight in the aggregate score calculation
metricstrnoNoneNamed metric label. When set, the score is stored in named_scores under this key
transformstrnoNonePre-process output before evaluation. See Transform Hooks

Runsight ships 15 deterministic assertion types across four categories.

TypeValueBehavior
equalsstrExact string match. Use config: {mode: json} to opt into JSON deep-equal
containsstrCase-sensitive substring check
icontainsstrCase-insensitive substring check
contains-alllist[str]All items must appear as substrings
contains-anylist[str]At least one item must appear as a substring
starts-withstrString prefix check
regexstrRegex search (uses re.search, not full match)
word-countint or {min, max}Exact count or range check on whitespace-split words
String assertion examples
assertions:
# Exact match
- type: equals
value: "approved"
# Case-insensitive substring
- type: icontains
value: "conclusion"
# All keywords must appear
- type: contains-all
value: ["summary", "recommendation", "next steps"]
# Any keyword is acceptable
- type: contains-any
value: ["approve", "accept", "pass"]
# Regex pattern
- type: regex
value: "\\d{4}-\\d{2}-\\d{2}"
# Word count range
- type: word-count
value:
min: 50
max: 500
TypeValueBehavior
is-jsondict (optional JSON Schema)Validates output is valid JSON. If value is provided, validates against a JSON Schema
contains-jsondict (optional JSON Schema)Finds a JSON substring in the output. Scans for { and [ delimiters. Optional schema validation
Structural assertion examples
assertions:
# Output must be valid JSON
- type: is-json
# Output must contain a JSON object matching a schema
- type: contains-json
value:
type: object
required: ["name", "score"]
properties:
name:
type: string
score:
type: number
TypeValueThresholdBehavior
costfloat (USD)Passes if cost_usd from the execution context is at or below threshold
latencyfloat (ms)Passes if latency_ms from the execution context is at or below threshold

Performance assertions read from the AssertionContext, not from the block output string. In live API runs that context is populated with run metrics such as cost, latency, and tokens; offline eval uses zeroed metric fields.

Performance assertion examples
assertions:
- type: cost
threshold: 0.10
- type: latency
threshold: 5000
TypeValueDefault thresholdBehavior
levenshteinstr (reference text)5Edit distance between output and reference. Passes if distance <= threshold
bleustr (reference text)0.5BLEU-4 score with smoothing. Passes if score >= threshold
rouge-nstr (reference text)0.75ROUGE-1 F-measure. Passes if score >= threshold
Linguistic assertion examples
assertions:
- type: levenshtein
value: "The capital of France is Paris."
threshold: 10
- type: bleu
value: "Machine learning models process data to find patterns."
threshold: 0.3

Beyond the 15 built-in types, you can create your own assertions in Python. The contract is promptfoo-compatible — existing promptfoo assertion functions work with minimal changes. Custom assertions are discovered from manifest files under custom/assertions/*.yaml and are referenced by embedded assertion id:

Custom assertion usage
assertions:
- type: custom:tone_check
config:
prefix: calm

Important details:

  • The runtime key is always custom:<id>.
  • The manifest name is display-only.
  • Custom assertions can be used alongside built-in assertions in the same list.
  • Custom assertions support the same not- negation prefix as built-in assertions.

See Custom Assertions for the manifest format, Python contract, params schema validation, context keys, and migration guidance.

Any assertion type can be negated by prefixing it with not-. The engine inverts both the pass/fail boolean and the score (1.0 - original):

Negated assertion
assertions:
- type: not-contains
value: "error"
- type: not-contains-json
- type: not-custom:blocked_word
config:
blocked: storm

Negation works for both built-in and custom assertion types.

When a block has multiple assertions, the aggregate score is a weighted average. Each assertion’s weight (default 1.0) determines its contribution:

Weighted assertions
assertions:
- type: contains
value: "recommendation"
weight: 2.0
- type: word-count
value:
min: 100
weight: 1.0
- type: cost
threshold: 0.05
weight: 0.5

The AssertionsResult class accumulates weighted results. Its aggregate_score property returns the weighted average: total_score / total_weight. The passed() method without a threshold returns True only if every individual assertion passed.

Assertions do not share state with each other. Each configured assertion is evaluated independently against the same block result.

Runsight evaluates assertions in two contexts:

  • Offline eval — assertions run concurrently for each block
  • Live API runs — assertions run sequentially after each block completes

In both contexts:

  • Every configured assertion produces its own result
  • A transform failure on one assertion does not prevent the rest from running
  • Aggregate scoring follows the same weighted scoring rules

This is especially useful with transform hooks. Each assertion can target a different field in the block output using its own json_path transform:

Multiple assertions with different transforms
assertions:
- type: equals
value: "completed"
transform: "json_path:$.status"
- type: contains
value: "success"
transform: "json_path:$.message"
- type: cost
threshold: 0.02

In this example, the first assertion extracts $.status and checks for an exact match, the second extracts $.message and checks for a substring, and the third checks execution cost without any transform.

If a transform fails on one assertion (e.g., the output is not valid JSON, or the path does not exist), that assertion returns passed=False with a descriptive reason. The other assertions still run normally and produce their own results. The aggregate score and pass/fail are then computed across all of them using the standard weighted scoring rules.

When a workflow runs via the API, assertions fire automatically — no extra configuration needed beyond defining them on your blocks:

  1. The engine reads each block’s assertions list from the workflow YAML
  2. After a block completes, the engine evaluates all configured assertions against the block output
  3. Each assertion receives the block’s output text plus execution context (cost, latency, soul info, tokens)
  4. Results are persisted on the run record and pushed to the UI via SSE

The RunNode entity stores three eval fields:

FieldTypeDescription
eval_scoreOptional[float]Weighted average score across all assertions on the block
eval_passedOptional[bool]True when every individual assertion passed
eval_resultsOptional[Dict]Per-assertion results including pass/fail, score, reason, and handler type when the handler sets one

These fields are None when a block has no assertions configured.