The evaluation result to export.
Aggregate scores averaged across all samples.
Per-sample detailed results.
Optionalstats?: Record<Per-metric score distribution statistics (min, max, stddev, count).
Keys are metric names (same as keys in scores, minus overall).
Useful for understanding score variance and identifying which questions
score poorly. overall is excluded — compute it from individual metric stats.
Metadata about the evaluation run.
Total number of samples evaluated.
Names of the metrics that were evaluated.
LLM provider used (e.g. 'anthropic', 'openai').
LLM model used (e.g. 'claude-opus-4-6').
ISO 8601 timestamp when evaluation started.
ISO 8601 timestamp when evaluation completed.
Wall-clock duration of the evaluation in milliseconds.
Whether to pretty-print with 2-space indentation. Default: true.
Pass false for compact output (e.g. when storing in a database).
JSON string representation of the full evaluation result.
Serializes an EvaluationResult to a JSON string.
The JSON structure matches the EvaluationResult schema exactly and can be parsed back with
JSON.parse()-- useful for caching results or comparing evaluation runs over time.