Custom Assertions
Custom assertions let you add workspace-local checks alongside Runsight’s 15 built-in assertions. Add a YAML manifest, drop your Python file next to it, and reference it as custom:<id> in your workflow.
Runsight discovers custom assertions from custom/assertions/*.yaml, registers each one under custom:{id}, and runs them in both offline evals and live API workflow runs. The embedded id must match the YAML filename stem.
For built-in assertion types and shared assertion config fields, see Assertions.
Quick Start
Section titled “Quick Start”This example creates a custom assertion named tone_check, then uses it in an offline eval case with a built-in assertion alongside it.
version: "1.0"id: tone_checkkind: assertionname: "Tone Check"description: "Passes when output starts with a configured prefix."returns: "grading_result"source: "tone_check.py"params: type: object properties: prefix: type: string required: ["prefix"]def get_assert(output, context): config = context.get("config", {}) return { "pass": output.startswith(config.get("prefix", "")), "score": 0.9, "reason": f"prefix={config.get('prefix', '')}", }version: "1.0"id: custom-assertions-demokind: workflowconfig: model_name: gpt-4oblocks: analyze: type: code code: | def main(data): return "unused in fixture mode"workflow: name: custom_assertions_demo entry: analyze transitions: - from: analyze to: nulleval: threshold: 0.5 cases: - id: tone_case fixtures: analyze: "calm response" expected: analyze: - type: custom:tone_check config: prefix: calm - type: contains value: "response"What this does:
- The assertion’s canonical ID is
tone_checkbecause the manifest file istone_check.yaml. - The runtime type is
custom:tone_check. - The
configobject is validated againstparamsbefore the plugin runs. - The same custom assertion can also be used under a block’s normal
assertions:list during API workflow runs.
YAML Manifest Reference
Section titled “YAML Manifest Reference”Each custom assertion is defined by a YAML manifest in custom/assertions/<id>.yaml.
version: "1.0"id: examplekind: assertionname: "Example Assertion"description: "Checks something about the block output."returns: "bool"source: "example.py"params: type: object properties: enabled: type: boolean| Field | Type | Required | Description |
|---|---|---|---|
version | string | yes | Manifest version string |
id | string | yes | Embedded assertion id. Must match the filename stem |
kind | "assertion" | yes | Entity kind |
name | string | yes | Display name for humans. This is not the runtime ID |
description | string | yes | Short description of the assertion |
returns | "bool" or "grading_result" | yes | Declares the plugin return contract |
source | string | yes | Python file to load, relative to the manifest file |
params | JSON Schema object | no | Schema used to validate the assertion’s config before plugin execution |
Important details:
- The canonical runtime ID is the embedded
id, notname. custom/assertions/tone_check.yamlalways registers ascustom:tone_check.- Extra top-level manifest fields are rejected.
- Built-in assertion name collisions are rejected at scan time.
- The source file must exist relative to the manifest file.
Python Contract
Section titled “Python Contract”A custom assertion source file must define exactly this function:
def get_assert(output, context): return TrueRules:
- The function name must be
get_assert. - The parameter list must be exactly
(output, context). - The function must be synchronous.
- Runsight validates the function contract before registration.
- The plugin runs in a separate subprocess with a minimal environment.
- Plugin execution times out after 30 seconds.
The plugin receives:
output: the block output string being checkedcontext: a plain Python dict with assertion metadata and per-assertion config
Runsight does not forward API keys into the plugin subprocess environment.
Return Types
Section titled “Return Types”The manifest returns field controls how Runsight interprets the plugin result.
Use returns: "bool" when the assertion is a simple pass/fail check.
def get_assert(output, context): return "calm" in outputTrue becomes a passing result with score 1.0. False becomes a failing result with score 0.0.
grading_result
Section titled “grading_result”Use returns: "grading_result" when you need to control the score or reason.
def get_assert(output, context): config = context.get("config", {}) return { "pass": output.startswith(config.get("prefix", "")), "score": 0.9, "reason": f"prefix={config.get('prefix', '')}", }Accepted fields:
| Field | Required | Notes |
|---|---|---|
passed or pass_ or pass | yes | Runsight accepts these aliases in that precedence |
score | yes | Must be numeric and between 0.0 and 1.0 |
reason | no | Optional. Non-string values are coerced with str() |
Notes:
scoremay be anintorfloat; Runsight converts it tofloat.- If the returned shape does not match the declared contract, the assertion fails with a runtime error message instead of crashing the run.
Config & Params
Section titled “Config & Params”Each assertion entry in workflow YAML can include a config field:
assertions: - type: custom:tone_check config: prefix: calmFor custom assertions, Runsight passes that value through two stages:
- If the manifest defines
params, Runsight validatesconfigagainst that JSON Schema. - If validation succeeds, the exact value is exposed to the plugin as
context["config"].
If the manifest does not define params, Runsight skips config validation.
Schema-validated config
Section titled “Schema-validated config”version: "1.0"id: budget_guardkind: assertionname: "Budget Guard"description: "Requires a numeric budget."returns: "bool"source: "budget_guard.py"params: type: object properties: budget: type: number required: ["budget"]def get_assert(output, context): return Trueassertions: - type: custom:budget_guard config: budget: 0.05If config is invalid, Runsight returns a failing result whose reason starts with Config validation failed: and skips plugin execution.
Generic assertion features still work
Section titled “Generic assertion features still work”Custom assertions use the same outer assertion config object as built-ins, so these features still apply:
weightmetrictransformnot-negation, for examplenot-custom:blocked_word
Context Dict Reference
Section titled “Context Dict Reference”Custom assertions receive a plain dict, not an AssertionContext object.
| Key | Type | Description |
|---|---|---|
vars | dict | Workflow variables for the assertion context |
config | any | The per-assertion config value from workflow YAML |
prompt | string | Prompt text in the current assertion context |
prompt_hash | string | Prompt hash for the current run |
soul_id | string | Soul ID for the block being evaluated |
soul_version | string | Soul version hash or identifier |
block_id | string | Block ID |
block_type | string | Block type |
cost_usd | float | Execution cost in USD |
total_tokens | int | Total tokens used |
latency_ms | float | Block latency in milliseconds |
run_id | string | Run ID |
workflow_id | string | Workflow identifier from the current assertion context. In live API runs this is currently the workflow name; offline eval uses an empty string |
Notes:
- The key is
vars, notvariables. - The key is
config, even when the config value isNone. - Offline eval populates most context fields with empty strings or zeros.
- Live API execution fills prompt, soul, cost, token, latency, run, and workflow fields from the run context.
Migrating from Promptfoo
Section titled “Migrating from Promptfoo”Runsight’s custom assertion contract is intentionally close to promptfoo-style Python assertions.
A promptfoo-style function body like this works unchanged:
def get_assert(output, context): config = context.get("config", {}) return { "pass": output.startswith(config.get("prefix", "")), "score": 0.9, "reason": f"prefix={config.get('prefix', '')}", }The main Runsight-specific additions are:
- Put the code in
custom/assertions/<id>.py - Add a matching manifest in
custom/assertions/<id>.yaml - Reference it in workflow YAML as
custom:<id>
Example:
assertions: - type: custom:tone_check config: prefix: calmRemember that tone_check comes from the filename, not the manifest name.
Limitations
Section titled “Limitations”Current custom assertion support is intentionally narrow:
- Discovery only looks for YAML manifests in
custom/assertions/*.yaml. - The Python contract is only
def get_assert(output, context). - Supported return contracts are only
boolandgrading_result. - Manifest fields are fixed to
version,name,description,returns,source, and optionalparams. - Plugins run in a separate subprocess with a minimal environment and a 30 second timeout.
- API keys are not forwarded into that subprocess environment.
- Offline eval auto-discovers custom assertions only from workflow file paths, not raw YAML strings.
- Simple custom assertions (return
boolorgrading_result) run in a minimal subprocess with no API keys and no IPC access.
LLM-Graded Assertions (llm_judge)
Section titled “LLM-Graded Assertions (llm_judge)”For assertions that need to call an LLM to grade output (e.g., rubric-based evaluation, factual consistency checks), use the llm_judge assertion type. These assertions run through the same process isolation path as regular LLM blocks --- the judge LLM call is proxied through the IPC channel with full budget enforcement and observability.
assertions: - type: llm_judge config: model: claude-haiku rubric: "Grade the output for factual accuracy and completeness."Key differences from simple custom assertions:
- The
llm_judgeassertion runs inside an isolated subprocess with IPC access (not_minimal_subprocess_env) - The judge’s LLM call goes through the
BudgetInterceptor--- assertion costs count toward the block’s and workflow’s budget - A
judge_soulis constructed from the assertion config and used to call the LLM - The
GradingResultincludesassertion_type: "llm_judge"andmetadata.judge_modelfor traceability - Simple
get_assert()custom plugins still use the minimal subprocess with no IPC --- onlyllm_judgetype assertions use the full isolation path
Error Messages
Section titled “Error Messages”These are the most common failure modes you will see.
Manifest and discovery errors
Section titled “Manifest and discovery errors”These happen before the assertion is registered:
- Missing or invalid required manifest fields
- Unsupported extra manifest fields
- Invalid
returnsvalue - Built-in ID collision such as
contains.yaml - Invalid Python signature such as anything other than
def get_assert(output, context)
Runtime assertion errors
Section titled “Runtime assertion errors”These return failing results instead of crashing the run:
Config validation failed: ...Custom assertion 'name' failed: plugin explodedCustom assertion 'name' declares returns: bool but get_assert returned 'dict'custom assertion plugin timed out after 30s
What happens on failure
Section titled “What happens on failure”In both offline eval and live API execution:
- The assertion result is recorded as failed
- The run continues
- Other assertions on the same block still produce their own results
- Live API runs still persist
eval_score,eval_passed, andeval_results, and still emitnode_eval_completeSSE events