Interface MetricOutput

Output from a single metric evaluation on one sample.

Returned by every metric's score() method and collected by evaluate() into the final EvaluationResult.

interface MetricOutput {
    score: number;
    reasoning?: string;
    skipped?: boolean;
}

Index

Properties

score reasoning? skipped?

Properties

score

score: number

The metric score for this sample, in the range [0.0, 1.0]. Higher is always better. Scores are clamped to [0, 1] even if the LLM returns values outside that range.

`Optional`reasoning

reasoning?: string

The LLM judge's explanation of why it assigned this score. Only populated when includeReasoning: true is passed to evaluate(). Useful for debugging unexpectedly low or high scores.

`Optional`skipped

skipped?: boolean

When true, this metric could not be computed for this sample and should be excluded from all aggregates. The most common cause is contextRecall being evaluated on a sample without a groundTruth field.

evaluate() detects skipped: true and omits the score from both per-sample scores and the per-metric aggregate -- it is never counted as a 0. This prevents silent score distortion.

The score field is still set to 0 for backward compatibility with code that reads raw MetricOutput without checking skipped.

Interface MetricOutput

Index

Properties

Properties

score

`Optional`reasoning

`Optional`skipped

Settings

On This Page

Interface MetricOutput

Index

Properties

Properties

score

Optionalreasoning

Optionalskipped

Settings

On This Page

`Optional`reasoning

`Optional`skipped