Interface EvaluateOptions

Configuration options for evaluate.

interface EvaluateOptions {
    provider: ProviderConfig;
    dataset: {
        id?: string;
        question: string;
        answer: string;
        contexts: string[];
        groundTruth?: string;
        tenantId?: string;
        metadata?: Record<string, unknown>;
    }[];
    metrics?: Metric[];
    includeReasoning?: boolean;
    concurrency?: number;
    thresholds?: Partial<Record<string, number>>;
    onProgress?: (completed: number, total: number) => void;
    checkpoint?: string;
}

Index

Properties

provider dataset metrics? includeReasoning? concurrency? thresholds? onProgress? checkpoint?

Properties

provider

provider: ProviderConfig

The LLM provider to use as the judge. Pass { type: 'anthropic', client, model }, { type: 'openai', client, model }, or { type: 'azure', client, model }.

dataset

dataset: {
    id?: string;
    question: string;
    answer: string;
    contexts: string[];
    groundTruth?: string;
    tenantId?: string;
    metadata?: Record<string, unknown>;
}[]

Array of RAG samples to evaluate. Each sample must have question, answer, and contexts. groundTruth is optional but required for the contextRecall metric. tenantId and metadata are optional and propagate to per-sample results.

`Optional`metrics

metrics?: Metric[]

Which metrics to compute. Defaults to all five built-in metrics.

Available: faithfulness, contextRelevance, answerRelevance, contextRecall, contextPrecision.

Note: contextRecall requires groundTruth on each sample. Samples without groundTruth are automatically skipped for that metric and excluded from its aggregate score.

`Optional`includeReasoning

includeReasoning?: boolean

When true, each metric's LLM reasoning is included in sample results. Useful for debugging unexpected scores.

Default

false

`Optional`concurrency

concurrency?: number

Maximum number of samples evaluated simultaneously. Higher values are faster but consume more API quota.

Default

`Optional`thresholds

thresholds?: Partial<Record<string, number>>

Minimum acceptable score per metric. If any aggregate score falls below its threshold after evaluation, a ThresholdError is thrown containing the full result.

This is intended for CI quality gates — use it in combination with process.exit(1) to fail a build when RAG quality regresses.

Example

thresholds: { faithfulness: 0.8, answerRelevance: 0.75 }

`Optional`onProgress

onProgress?: (completed: number, total: number) => void

Called after each sample completes evaluation. Use for progress bars, logging, or UI updates during large evaluations.

Type Declaration

- (completed: number, total: number): void
- Parameters
  - completed: number
    Number of samples evaluated so far.
  - total: number
    Total number of samples in the dataset.
  Returns void

Example

onProgress: (done, total) => {
  process.stderr.write(`\r${done}/${total} evaluated`)
}

`Optional`checkpoint

checkpoint?: string

File path for checkpoint-based resumable evaluation.

When provided, evaluate() will:

On start — read the checkpoint file if it exists, and skip any samples whose results are already recorded (matched by id if present, otherwise by question text). This lets you resume a large batch that was interrupted.
After each new sample — write the accumulated results (prior + new) to the checkpoint file as JSON so progress is never lost.

The checkpoint file is a plain JSON file with the shape:

{ "version": 1, "samples": [ ...SampleResult[] ] }

Delete the checkpoint file when you want to start a fresh evaluation.

Example

// Large 500-sample evaluation — safe to Ctrl+C and restart
await evaluate({
  provider: { type: 'anthropic', client },
  dataset: largeDataset,
  checkpoint: './eval-progress.json',
  onProgress: (done, total) => process.stderr.write(`\r${done}/${total}`),
})

Interface EvaluateOptions

Index

Properties

Properties

provider

dataset

`Optional`metrics

`Optional`includeReasoning

Default

`Optional`concurrency

Default

`Optional`thresholds

Example

`Optional`onProgress

Type Declaration

Parameters

Returns void

Example

`Optional`checkpoint

Example

Settings

On This Page

Interface EvaluateOptions

Index

Properties

Properties

provider

dataset

Optionalmetrics

OptionalincludeReasoning

Default

Optionalconcurrency

Default

Optionalthresholds

Example

OptionalonProgress

Type Declaration

Parameters

Returns void

Example

Optionalcheckpoint

Example

Settings

On This Page

`Optional`metrics

`Optional`includeReasoning

`Optional`concurrency

`Optional`thresholds

`Optional`onProgress

`Optional`checkpoint