Class ThresholdError

Thrown by evaluate when one or more metric aggregate scores fall below their configured ScoreThresholds.

Carries both the failing metric details (failures) and the full EvaluationResult (result) so you can export SARIF, JUnit, or HTML reports even when the quality gate fails.

Use this in CI pipelines to fail a build when RAG quality regresses:

Example

import { evaluate, ThresholdError, toSarif, toJUnit } from 'rageval'
import { writeFileSync } from 'node:fs'

try {
  await evaluate({
    provider: { type: 'anthropic', client },
    dataset,
    thresholds: { faithfulness: 0.8, answerRelevance: 0.75 },
  })
  console.log('Quality gate passed ✓')
} catch (e) {
  if (e instanceof ThresholdError) {
    // Log each failed metric
    for (const [metric, { score, threshold }] of Object.entries(e.failures)) {
      console.error(`  ${metric}: ${score.toFixed(3)} < ${threshold}`)
    }
    // Still export reports — the full result is attached to the error
    writeFileSync('rageval.sarif', toSarif(e.result))
    writeFileSync('junit-results.xml', toJUnit(e.result))
    process.exit(1)
  }
  throw e
}

Hierarchy

Error
- ThresholdError

Index

Constructors

constructor

new ThresholdError(
    failures: Record<string, { score: number; threshold: number }>,
    result: {
        scores: {
            faithfulness?: number;
            contextRelevance?: number;
            answerRelevance?: number;
            contextRecall?: number;
            contextPrecision?: number;
            overall: number;
            [key: string]: unknown;
        };
        samples: {
            id?: string;
            question: string;
            scores: Record<string, number>;
            reasoning?: Record<string, string>;
            tenantId?: string;
            metadata?: Record<string, unknown>;
        }[];
        stats?: Record<
            string,
            { mean: number; min: number; max: number; stddev: number; count: number },
        >;
        meta: {
            totalSamples: number;
            metrics: string[];
            provider: string;
            model: string;
            startedAt: string;
            completedAt: string;
            durationMs: number;
        };
    },
): ThresholdError
Parameters
- failures: Record<string, { score: number; threshold: number }>
- result: {
      scores: {
          faithfulness?: number;
          contextRelevance?: number;
          answerRelevance?: number;
          contextRecall?: number;
          contextPrecision?: number;
          overall: number;
          [key: string]: unknown;
      };
      samples: {
          id?: string;
          question: string;
          scores: Record<string, number>;
          reasoning?: Record<string, string>;
          tenantId?: string;
          metadata?: Record<string, unknown>;
      }[];
      stats?: Record<
          string,
          { mean: number; min: number; max: number; stddev: number; count: number },
      >;
      meta: {
          totalSamples: number;
          metrics: string[];
          provider: string;
          model: string;
          startedAt: string;
          completedAt: string;
          durationMs: number;
      };
  }
  - scores: {
        faithfulness?: number;
        contextRelevance?: number;
        answerRelevance?: number;
        contextRecall?: number;
        contextPrecision?: number;
        overall: number;
        [key: string]: unknown;
    }
    Aggregate scores averaged across all samples.
  - samples: {
        id?: string;
        question: string;
        scores: Record<string, number>;
        reasoning?: Record<string, string>;
        tenantId?: string;
        metadata?: Record<string, unknown>;
    }[]
    Per-sample detailed results.
  - Optionalstats?: Record<
    string,
    { mean: number; min: number; max: number; stddev: number; count: number },
    >
    Per-metric score distribution statistics (min, max, stddev, count).
    
    Keys are metric names (same as keys in scores, minus overall). Useful for understanding score variance and identifying which questions score poorly. overall is excluded — compute it from individual metric stats.
    
    Example
    const { stats } = await evaluate({ ... }) // High stddev indicates inconsistent pipeline behaviour: if ((stats.faithfulness?.stddev ?? 0) > 0.15) { console.warn('Faithfulness varies widely across samples — review your retrieval.') }
  - meta: {
        totalSamples: number;
        metrics: string[];
        provider: string;
        model: string;
        startedAt: string;
        completedAt: string;
        durationMs: number;
    }
    Metadata about the evaluation run.
    
    totalSamples: number
    Total number of samples evaluated.
    
    metrics: string[]
    Names of the metrics that were evaluated.
    
    provider: string
    LLM provider used (e.g. 'anthropic', 'openai').
    
    model: string
    LLM model used (e.g. 'claude-opus-4-6').
    
    startedAt: string
    ISO 8601 timestamp when evaluation started.
    
    completedAt: string
    ISO 8601 timestamp when evaluation completed.
    
    durationMs: number
    Wall-clock duration of the evaluation in milliseconds.
Returns ThresholdError
Overrides Error.constructor
- Defined in src/errors.ts:62

Properties

`Readonly`failures

failures: Record<string, { score: number; threshold: number }>

Map of metric names to their actual score and required minimum. Only metrics that failed the threshold are included.

Iterate with Object.entries(e.failures) to get [metric, { score, threshold }] pairs.

Example

// { faithfulness: { score: 0.72, threshold: 0.8 } }

`Readonly`result

result: {
    scores: {
        faithfulness?: number;
        contextRelevance?: number;
        answerRelevance?: number;
        contextRecall?: number;
        contextPrecision?: number;
        overall: number;
        [key: string]: unknown;
    };
    samples: {
        id?: string;
        question: string;
        scores: Record<string, number>;
        reasoning?: Record<string, string>;
        tenantId?: string;
        metadata?: Record<string, unknown>;
    }[];
    stats?: Record<
        string,
        { mean: number; min: number; max: number; stddev: number; count: number },
    >;
    meta: {
        totalSamples: number;
        metrics: string[];
        provider: string;
        model: string;
        startedAt: string;
        completedAt: string;
        durationMs: number;
    };
}

The complete EvaluationResult that triggered this error.

All per-sample scores and aggregate scores are present — only the threshold gate failed. Use this to export reports (SARIF, JUnit, HTML, Markdown) even when the quality gate fails, so you can diagnose exactly which samples caused the regression.

Type Declaration

scores: {
    faithfulness?: number;
    contextRelevance?: number;
    answerRelevance?: number;
    contextRecall?: number;
    contextPrecision?: number;
    overall: number;
    [key: string]: unknown;
}
Aggregate scores averaged across all samples.
samples: {
    id?: string;
    question: string;
    scores: Record<string, number>;
    reasoning?: Record<string, string>;
    tenantId?: string;
    metadata?: Record<string, unknown>;
}[]
Per-sample detailed results.
Optionalstats?: Record<
string,
{ mean: number; min: number; max: number; stddev: number; count: number },
>
Per-metric score distribution statistics (min, max, stddev, count).

Keys are metric names (same as keys in scores, minus overall). Useful for understanding score variance and identifying which questions score poorly. overall is excluded — compute it from individual metric stats.
Example
```
const { stats } = await evaluate({ ... })
// High stddev indicates inconsistent pipeline behaviour:
if ((stats.faithfulness?.stddev ?? 0) > 0.15) {
  console.warn('Faithfulness varies widely across samples — review your retrieval.')
}
```
meta: {
    totalSamples: number;
    metrics: string[];
    provider: string;
    model: string;
    startedAt: string;
    completedAt: string;
    durationMs: number;
}
Metadata about the evaluation run.
- totalSamples: number
  Total number of samples evaluated.
- metrics: string[]
  Names of the metrics that were evaluated.
- provider: string
  LLM provider used (e.g. 'anthropic', 'openai').
- model: string
  LLM model used (e.g. 'claude-opus-4-6').
- startedAt: string
  ISO 8601 timestamp when evaluation started.
- completedAt: string
  ISO 8601 timestamp when evaluation completed.
- durationMs: number
  Wall-clock duration of the evaluation in milliseconds.

`Static`stackTraceLimit

stackTraceLimit: number

The Error.stackTraceLimit property specifies the number of stack frames collected by a stack trace (whether generated by new Error().stack or Error.captureStackTrace(obj)).

The default value is 10 but may be set to any valid JavaScript number. Changes will affect any stack trace captured after the value has been changed.

If set to a non-number value, or set to a negative number, stack traces will not capture any frames.

`Optional`cause

cause?: unknown

name

name: string

message

message: string

`Optional`stack

stack?: string

Methods

`Static`captureStackTrace

captureStackTrace(targetObject: object, constructorOpt?: Function): void
Creates a .stack property on targetObject, which when accessed returns a string representing the location in the code at which Error.captureStackTrace() was called.
```
const myObject = {};
Error.captureStackTrace(myObject);
myObject.stack;  // Similar to `new Error().stack`
```
The first line of the trace will be prefixed with ${myObject.name}: ${myObject.message}.

The optional constructorOpt argument accepts a function. If given, all frames above constructorOpt, including constructorOpt, will be omitted from the generated stack trace.

The constructorOpt argument is useful for hiding implementation details of error generation from the user. For instance:
```
function a() {
  b();
}

function b() {
  c();
}

function c() {
  // Create an error without stack trace to avoid calculating the stack trace twice.
  const { stackTraceLimit } = Error;
  Error.stackTraceLimit = 0;
  const error = new Error();
  Error.stackTraceLimit = stackTraceLimit;

  // Capture the stack trace above function b
  Error.captureStackTrace(error, b); // Neither function c, nor b is included in the stack trace
  throw error;
}

a();
```
Parameters
- targetObject: object
- OptionalconstructorOpt: Function
Returns void
Inherited from Error.captureStackTrace
- Defined in node_modules/.pnpm/@types+node@25.6.0/node_modules/@types/node/globals.d.ts:51

`Static`prepareStackTrace

prepareStackTrace(err: Error, stackTraces: CallSite[]): any
Parameters
- err: Error
- stackTraces: CallSite[]
Returns any
See
https://v8.dev/docs/stack-trace-api#customizing-stack-traces
Inherited from Error.prepareStackTrace
- Defined in node_modules/.pnpm/@types+node@25.6.0/node_modules/@types/node/globals.d.ts:55

Class ThresholdError

Example

Hierarchy

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

scores: { faithfulness?: number; contextRelevance?: number; answerRelevance?: number; contextRecall?: number; contextPrecision?: number; overall: number; [key: string]: unknown;}

samples: { id?: string; question: string; scores: Record<string, number>; reasoning?: Record<string, string>; tenantId?: string; metadata?: Record<string, unknown>;}[]

Optionalstats?: Record< string, { mean: number; min: number; max: number; stddev: number; count: number },>

Example

meta: { totalSamples: number; metrics: string[]; provider: string; model: string; startedAt: string; completedAt: string; durationMs: number;}

totalSamples: number

metrics: string[]

provider: string

model: string

startedAt: string

completedAt: string

durationMs: number

Returns ThresholdError

Properties

Readonlyfailures

Example

Readonlyresult

Type Declaration

scores: { faithfulness?: number; contextRelevance?: number; answerRelevance?: number; contextRecall?: number; contextPrecision?: number; overall: number; [key: string]: unknown;}

samples: { id?: string; question: string; scores: Record<string, number>; reasoning?: Record<string, string>; tenantId?: string; metadata?: Record<string, unknown>;}[]

Optionalstats?: Record< string, { mean: number; min: number; max: number; stddev: number; count: number },>

Example

meta: { totalSamples: number; metrics: string[]; provider: string; model: string; startedAt: string; completedAt: string; durationMs: number;}

totalSamples: number

metrics: string[]

provider: string

model: string

startedAt: string

completedAt: string

durationMs: number

StaticstackTraceLimit

Optionalcause

name

message

Optionalstack

Methods

StaticcaptureStackTrace

Parameters

Returns void

StaticprepareStackTrace

Parameters

Returns any

See

Settings

On This Page

scores: {
faithfulness?: number;
contextRelevance?: number;
answerRelevance?: number;
contextRecall?: number;
contextPrecision?: number;
overall: number;
[key: string]: unknown;
}

samples: {
id?: string;
question: string;
scores: Record<string, number>;
reasoning?: Record<string, string>;
tenantId?: string;
metadata?: Record<string, unknown>;
}[]

`Optional`stats?: Record<
string,
{ mean: number; min: number; max: number; stddev: number; count: number },
>

meta: {
totalSamples: number;
metrics: string[];
provider: string;
model: string;
startedAt: string;
completedAt: string;
durationMs: number;
}

`Readonly`failures

`Readonly`result

scores: {
faithfulness?: number;
contextRelevance?: number;
answerRelevance?: number;
contextRecall?: number;
contextPrecision?: number;
overall: number;
[key: string]: unknown;
}

samples: {
id?: string;
question: string;
scores: Record<string, number>;
reasoning?: Record<string, string>;
tenantId?: string;
metadata?: Record<string, unknown>;
}[]

`Optional`stats?: Record<
string,
{ mean: number; min: number; max: number; stddev: number; count: number },
>

meta: {
totalSamples: number;
metrics: string[];
provider: string;
model: string;
startedAt: string;
completedAt: string;
durationMs: number;
}

`Static`stackTraceLimit

`Optional`cause

`Optional`stack

`Static`captureStackTrace

`Static`prepareStackTrace