Skip to content

Custom Assertions

Custom assertions let you add workspace-local checks alongside Runsight’s 15 built-in assertions. Add a YAML manifest, drop your Python file next to it, and reference it as custom:<id> in your workflow.

Runsight discovers custom assertions from custom/assertions/*.yaml, registers each one under custom:{id}, and runs them in both offline evals and live API workflow runs. The embedded id must match the YAML filename stem.

For built-in assertion types and shared assertion config fields, see Assertions.

This example creates a custom assertion named tone_check, then uses it in an offline eval case with a built-in assertion alongside it.

custom/assertions/tone_check.yaml
version: "1.0"
id: tone_check
kind: assertion
name: "Tone Check"
description: "Passes when output starts with a configured prefix."
returns: "grading_result"
source: "tone_check.py"
params:
type: object
properties:
prefix:
type: string
required: ["prefix"]
custom/assertions/tone_check.py
def get_assert(output, context):
config = context.get("config", {})
return {
"pass": output.startswith(config.get("prefix", "")),
"score": 0.9,
"reason": f"prefix={config.get('prefix', '')}",
}
custom/workflows/custom-assertions-demo.yaml
version: "1.0"
id: custom-assertions-demo
kind: workflow
config:
model_name: gpt-4o
blocks:
analyze:
type: code
code: |
def main(data):
return "unused in fixture mode"
workflow:
name: custom_assertions_demo
entry: analyze
transitions:
- from: analyze
to: null
eval:
threshold: 0.5
cases:
- id: tone_case
fixtures:
analyze: "calm response"
expected:
analyze:
- type: custom:tone_check
config:
prefix: calm
- type: contains
value: "response"

What this does:

  • The assertion’s canonical ID is tone_check because the manifest file is tone_check.yaml.
  • The runtime type is custom:tone_check.
  • The config object is validated against params before the plugin runs.
  • The same custom assertion can also be used under a block’s normal assertions: list during API workflow runs.

Each custom assertion is defined by a YAML manifest in custom/assertions/<id>.yaml.

custom/assertions/example.yaml
version: "1.0"
id: example
kind: assertion
name: "Example Assertion"
description: "Checks something about the block output."
returns: "bool"
source: "example.py"
params:
type: object
properties:
enabled:
type: boolean
FieldTypeRequiredDescription
versionstringyesManifest version string
idstringyesEmbedded assertion id. Must match the filename stem
kind"assertion"yesEntity kind
namestringyesDisplay name for humans. This is not the runtime ID
descriptionstringyesShort description of the assertion
returns"bool" or "grading_result"yesDeclares the plugin return contract
sourcestringyesPython file to load, relative to the manifest file
paramsJSON Schema objectnoSchema used to validate the assertion’s config before plugin execution

Important details:

  • The canonical runtime ID is the embedded id, not name.
  • custom/assertions/tone_check.yaml always registers as custom:tone_check.
  • Extra top-level manifest fields are rejected.
  • Built-in assertion name collisions are rejected at scan time.
  • The source file must exist relative to the manifest file.

A custom assertion source file must define exactly this function:

custom/assertions/example.py
def get_assert(output, context):
return True

Rules:

  • The function name must be get_assert.
  • The parameter list must be exactly (output, context).
  • The function must be synchronous.
  • Runsight validates the function contract before registration.
  • The plugin runs in a separate subprocess with a minimal environment.
  • Plugin execution times out after 30 seconds.

The plugin receives:

  • output: the block output string being checked
  • context: a plain Python dict with assertion metadata and per-assertion config

Runsight does not forward API keys into the plugin subprocess environment.

The manifest returns field controls how Runsight interprets the plugin result.

Use returns: "bool" when the assertion is a simple pass/fail check.

custom/assertions/contains_calm.py
def get_assert(output, context):
return "calm" in output

True becomes a passing result with score 1.0. False becomes a failing result with score 0.0.

Use returns: "grading_result" when you need to control the score or reason.

custom/assertions/tone_check.py
def get_assert(output, context):
config = context.get("config", {})
return {
"pass": output.startswith(config.get("prefix", "")),
"score": 0.9,
"reason": f"prefix={config.get('prefix', '')}",
}

Accepted fields:

FieldRequiredNotes
passed or pass_ or passyesRunsight accepts these aliases in that precedence
scoreyesMust be numeric and between 0.0 and 1.0
reasonnoOptional. Non-string values are coerced with str()

Notes:

  • score may be an int or float; Runsight converts it to float.
  • If the returned shape does not match the declared contract, the assertion fails with a runtime error message instead of crashing the run.

Each assertion entry in workflow YAML can include a config field:

Block assertion config
assertions:
- type: custom:tone_check
config:
prefix: calm

For custom assertions, Runsight passes that value through two stages:

  1. If the manifest defines params, Runsight validates config against that JSON Schema.
  2. If validation succeeds, the exact value is exposed to the plugin as context["config"].

If the manifest does not define params, Runsight skips config validation.

custom/assertions/budget_guard.yaml
version: "1.0"
id: budget_guard
kind: assertion
name: "Budget Guard"
description: "Requires a numeric budget."
returns: "bool"
source: "budget_guard.py"
params:
type: object
properties:
budget:
type: number
required: ["budget"]
custom/assertions/budget_guard.py
def get_assert(output, context):
return True
Workflow usage
assertions:
- type: custom:budget_guard
config:
budget: 0.05

If config is invalid, Runsight returns a failing result whose reason starts with Config validation failed: and skips plugin execution.

Custom assertions use the same outer assertion config object as built-ins, so these features still apply:

  • weight
  • metric
  • transform
  • not- negation, for example not-custom:blocked_word

Custom assertions receive a plain dict, not an AssertionContext object.

KeyTypeDescription
varsdictWorkflow variables for the assertion context
configanyThe per-assertion config value from workflow YAML
promptstringPrompt text in the current assertion context
prompt_hashstringPrompt hash for the current run
soul_idstringSoul ID for the block being evaluated
soul_versionstringSoul version hash or identifier
block_idstringBlock ID
block_typestringBlock type
cost_usdfloatExecution cost in USD
total_tokensintTotal tokens used
latency_msfloatBlock latency in milliseconds
run_idstringRun ID
workflow_idstringWorkflow identifier from the current assertion context. In live API runs this is currently the workflow name; offline eval uses an empty string

Notes:

  • The key is vars, not variables.
  • The key is config, even when the config value is None.
  • Offline eval populates most context fields with empty strings or zeros.
  • Live API execution fills prompt, soul, cost, token, latency, run, and workflow fields from the run context.

Runsight’s custom assertion contract is intentionally close to promptfoo-style Python assertions.

A promptfoo-style function body like this works unchanged:

custom/assertions/tone_check.py
def get_assert(output, context):
config = context.get("config", {})
return {
"pass": output.startswith(config.get("prefix", "")),
"score": 0.9,
"reason": f"prefix={config.get('prefix', '')}",
}

The main Runsight-specific additions are:

  1. Put the code in custom/assertions/<id>.py
  2. Add a matching manifest in custom/assertions/<id>.yaml
  3. Reference it in workflow YAML as custom:<id>

Example:

Workflow assertion entry
assertions:
- type: custom:tone_check
config:
prefix: calm

Remember that tone_check comes from the filename, not the manifest name.

Current custom assertion support is intentionally narrow:

  • Discovery only looks for YAML manifests in custom/assertions/*.yaml.
  • The Python contract is only def get_assert(output, context).
  • Supported return contracts are only bool and grading_result.
  • Manifest fields are fixed to version, name, description, returns, source, and optional params.
  • Plugins run in a separate subprocess with a minimal environment and a 30 second timeout.
  • API keys are not forwarded into that subprocess environment.
  • Offline eval auto-discovers custom assertions only from workflow file paths, not raw YAML strings.
  • Simple custom assertions (return bool or grading_result) run in a minimal subprocess with no API keys and no IPC access.

For assertions that need to call an LLM to grade output (e.g., rubric-based evaluation, factual consistency checks), use the llm_judge assertion type. These assertions run through the same process isolation path as regular LLM blocks --- the judge LLM call is proxied through the IPC channel with full budget enforcement and observability.

llm_judge assertion example
assertions:
- type: llm_judge
config:
model: claude-haiku
rubric: "Grade the output for factual accuracy and completeness."

Key differences from simple custom assertions:

  • The llm_judge assertion runs inside an isolated subprocess with IPC access (not _minimal_subprocess_env)
  • The judge’s LLM call goes through the BudgetInterceptor --- assertion costs count toward the block’s and workflow’s budget
  • A judge_soul is constructed from the assertion config and used to call the LLM
  • The GradingResult includes assertion_type: "llm_judge" and metadata.judge_model for traceability
  • Simple get_assert() custom plugins still use the minimal subprocess with no IPC --- only llm_judge type assertions use the full isolation path

These are the most common failure modes you will see.

These happen before the assertion is registered:

  • Missing or invalid required manifest fields
  • Unsupported extra manifest fields
  • Invalid returns value
  • Built-in ID collision such as contains.yaml
  • Invalid Python signature such as anything other than def get_assert(output, context)

These return failing results instead of crashing the run:

  • Config validation failed: ...
  • Custom assertion 'name' failed: plugin exploded
  • Custom assertion 'name' declares returns: bool but get_assert returned 'dict'
  • custom assertion plugin timed out after 30s

In both offline eval and live API execution:

  • The assertion result is recorded as failed
  • The run continues
  • Other assertions on the same block still produce their own results
  • Live API runs still persist eval_score, eval_passed, and eval_results, and still emit node_eval_complete SSE events