Custom Assertions

Custom assertions let you add workspace-local checks alongside Runsight’s 15 built-in assertions. Add a YAML manifest, drop your Python file next to it, and reference it as custom:<id> in your workflow.

Runsight discovers custom assertions from custom/assertions/*.yaml, registers each one under custom:{id}, and runs them in both offline evals and live API workflow runs. The embedded id must match the YAML filename stem.

For built-in assertion types and shared assertion config fields, see Assertions.

Quick Start

This example creates a custom assertion named tone_check, then uses it in an offline eval case with a built-in assertion alongside it.

version: "1.0"
id: tone_check
kind: assertion
name: "Tone Check"
description: "Passes when output starts with a configured prefix."
returns: "grading_result"
source: "tone_check.py"
params:
  type: object
  properties:
    prefix:
      type: string
  required: ["prefix"]

def get_assert(output, context):
    config = context.get("config", {})
    return {
        "pass": output.startswith(config.get("prefix", "")),
        "score": 0.9,
        "reason": f"prefix={config.get('prefix', '')}",
    }

version: "1.0"
id: custom-assertions-demo
kind: workflow
config:
  model_name: gpt-4o
blocks:
  analyze:
    type: code
    code: |
      def main(data):
          return "unused in fixture mode"
workflow:
  name: custom_assertions_demo
  entry: analyze
  transitions:
    - from: analyze
      to: null
eval:
  threshold: 0.5
  cases:
    - id: tone_case
      fixtures:
        analyze: "calm response"
      expected:
        analyze:
          - type: custom:tone_check
            config:
              prefix: calm
          - type: contains
            value: "response"

What this does:

The assertion’s canonical ID is tone_check because the manifest file is tone_check.yaml.
The runtime type is custom:tone_check.
The config object is validated against params before the plugin runs.
The same custom assertion can also be used under a block’s normal assertions: list during API workflow runs.

YAML Manifest Reference

Each custom assertion is defined by a YAML manifest in custom/assertions/<id>.yaml.

version: "1.0"
id: example
kind: assertion
name: "Example Assertion"
description: "Checks something about the block output."
returns: "bool"
source: "example.py"
params:
  type: object
  properties:
    enabled:
      type: boolean

Field	Type	Required	Description
`version`	`string`	yes	Manifest version string
`id`	`string`	yes	Embedded assertion id. Must match the filename stem
`kind`	`"assertion"`	yes	Entity kind
`name`	`string`	yes	Display name for humans. This is not the runtime ID
`description`	`string`	yes	Short description of the assertion
`returns`	`"bool"` or `"grading_result"`	yes	Declares the plugin return contract
`source`	`string`	yes	Python file to load, relative to the manifest file
`params`	JSON Schema object	no	Schema used to validate the assertion’s `config` before plugin execution

Important details:

The canonical runtime ID is the embedded id, not name.
custom/assertions/tone_check.yaml always registers as custom:tone_check.
Extra top-level manifest fields are rejected.
Built-in assertion name collisions are rejected at scan time.
The source file must exist relative to the manifest file.

Python Contract

A custom assertion source file must define exactly this function:

def get_assert(output, context):
    return True

Rules:

The function name must be get_assert.
The parameter list must be exactly (output, context).
The function must be synchronous.
Runsight validates the function contract before registration.
The plugin runs in a separate subprocess with a minimal environment.
Plugin execution times out after 30 seconds.

The plugin receives:

output: the block output string being checked
context: a plain Python dict with assertion metadata and per-assertion config

Runsight does not forward API keys into the plugin subprocess environment.

Return Types

The manifest returns field controls how Runsight interprets the plugin result.

`bool`

Use returns: "bool" when the assertion is a simple pass/fail check.

def get_assert(output, context):
    return "calm" in output

True becomes a passing result with score 1.0. False becomes a failing result with score 0.0.

`grading_result`

Use returns: "grading_result" when you need to control the score or reason.

def get_assert(output, context):
    config = context.get("config", {})
    return {
        "pass": output.startswith(config.get("prefix", "")),
        "score": 0.9,
        "reason": f"prefix={config.get('prefix', '')}",
    }

Accepted fields:

Field	Required	Notes
`passed` or `pass_` or `pass`	yes	Runsight accepts these aliases in that precedence
`score`	yes	Must be numeric and between `0.0` and `1.0`
`reason`	no	Optional. Non-string values are coerced with `str()`

Notes:

score may be an int or float; Runsight converts it to float.
If the returned shape does not match the declared contract, the assertion fails with a runtime error message instead of crashing the run.

Config & Params

Each assertion entry in workflow YAML can include a config field:

assertions:
  - type: custom:tone_check
    config:
      prefix: calm

For custom assertions, Runsight passes that value through two stages:

If the manifest defines params, Runsight validates config against that JSON Schema.
If validation succeeds, the exact value is exposed to the plugin as context["config"].

If the manifest does not define params, Runsight skips config validation.

Schema-validated config

version: "1.0"
id: budget_guard
kind: assertion
name: "Budget Guard"
description: "Requires a numeric budget."
returns: "bool"
source: "budget_guard.py"
params:
  type: object
  properties:
    budget:
      type: number
  required: ["budget"]

def get_assert(output, context):
    return True

assertions:
  - type: custom:budget_guard
    config:
      budget: 0.05

If config is invalid, Runsight returns a failing result whose reason starts with Config validation failed: and skips plugin execution.

Generic assertion features still work

Custom assertions use the same outer assertion config object as built-ins, so these features still apply:

weight
metric
transform
not- negation, for example not-custom:blocked_word

Context Dict Reference

Custom assertions receive a plain dict, not an AssertionContext object.

Key	Type	Description
`vars`	`dict`	Workflow variables for the assertion context
`config`	`any`	The per-assertion `config` value from workflow YAML
`prompt`	`string`	Prompt text in the current assertion context
`prompt_hash`	`string`	Prompt hash for the current run
`soul_id`	`string`	Soul ID for the block being evaluated
`soul_version`	`string`	Soul version hash or identifier
`block_id`	`string`	Block ID
`block_type`	`string`	Block type
`cost_usd`	`float`	Execution cost in USD
`total_tokens`	`int`	Total tokens used
`latency_ms`	`float`	Block latency in milliseconds
`run_id`	`string`	Run ID
`workflow_id`	`string`	Workflow identifier from the current assertion context. In live API runs this is currently the workflow name; offline eval uses an empty string

Notes:

The key is vars, not variables.
The key is config, even when the config value is None.
Offline eval populates most context fields with empty strings or zeros.
Live API execution fills prompt, soul, cost, token, latency, run, and workflow fields from the run context.

Migrating from Promptfoo

Runsight’s custom assertion contract is intentionally close to promptfoo-style Python assertions.

A promptfoo-style function body like this works unchanged:

def get_assert(output, context):
    config = context.get("config", {})
    return {
        "pass": output.startswith(config.get("prefix", "")),
        "score": 0.9,
        "reason": f"prefix={config.get('prefix', '')}",
    }

The main Runsight-specific additions are:

Put the code in custom/assertions/<id>.py
Add a matching manifest in custom/assertions/<id>.yaml
Reference it in workflow YAML as custom:<id>

Example:

assertions:
  - type: custom:tone_check
    config:
      prefix: calm

Remember that tone_check comes from the filename, not the manifest name.

Limitations

Current custom assertion support is intentionally narrow:

Discovery only looks for YAML manifests in custom/assertions/*.yaml.
The Python contract is only def get_assert(output, context).
Supported return contracts are only bool and grading_result.
Manifest fields are fixed to version, name, description, returns, source, and optional params.
Plugins run in a separate subprocess with a minimal environment and a 30 second timeout.
API keys are not forwarded into that subprocess environment.
Offline eval auto-discovers custom assertions only from workflow file paths, not raw YAML strings.
Simple custom assertions (return bool or grading_result) run in a minimal subprocess with no API keys and no IPC access.

LLM-Graded Assertions (`llm_judge`)

For assertions that need to call an LLM to grade output (e.g., rubric-based evaluation, factual consistency checks), use the llm_judge assertion type. These assertions run through the same process isolation path as regular LLM blocks --- the judge LLM call is proxied through the IPC channel with full budget enforcement and observability.

assertions:
  - type: llm_judge
    config:
      model: claude-haiku
      rubric: "Grade the output for factual accuracy and completeness."

Key differences from simple custom assertions:

The llm_judge assertion runs inside an isolated subprocess with IPC access (not _minimal_subprocess_env)
The judge’s LLM call goes through the BudgetInterceptor --- assertion costs count toward the block’s and workflow’s budget
A judge_soul is constructed from the assertion config and used to call the LLM
The GradingResult includes assertion_type: "llm_judge" and metadata.judge_model for traceability
Simple get_assert() custom plugins still use the minimal subprocess with no IPC --- only llm_judge type assertions use the full isolation path

Error Messages

These are the most common failure modes you will see.

Manifest and discovery errors

These happen before the assertion is registered:

Missing or invalid required manifest fields
Unsupported extra manifest fields
Invalid returns value
Built-in ID collision such as contains.yaml
Invalid Python signature such as anything other than def get_assert(output, context)

Runtime assertion errors

These return failing results instead of crashing the run:

Config validation failed: ...
Custom assertion 'name' failed: plugin exploded
Custom assertion 'name' declares returns: bool but get_assert returned 'dict'
custom assertion plugin timed out after 30s

What happens on failure

In both offline eval and live API execution:

The assertion result is recorded as failed
The run continues
Other assertions on the same block still produce their own results
Live API runs still persist eval_score, eval_passed, and eval_results, and still emit node_eval_complete SSE events