constresults = awaitevaluate({ provider: { type:'anthropic', client:newAnthropic(), model:'claude-haiku-4-5-20251001' }, dataset: [ { question:'What is the capital of France?', answer:'The capital of France is Paris.', contexts: ['France is a country in Western Europe. Its capital city is Paris.'], groundTruth:'Paris', }, ], metrics: [faithfulness, contextRelevance, answerRelevance], })
Scores are non-deterministic by nature (LLM outputs vary). Treat differences
smaller than ±0.03 as noise. Use temperature: 0 in your provider config for
reproducible benchmarks. See the README for full guidance.
rageval — TypeScript RAG pipeline evaluation library.
The RAGAS-inspired equivalent for Node.js. Evaluate the quality of your Retrieval-Augmented Generation pipeline with LLM-as-judge scoring.
Quick Start
Score Interpretation
All scores are in the range [0, 1]:
Important Notes
Scores are non-deterministic by nature (LLM outputs vary). Treat differences smaller than ±0.03 as noise. Use
temperature: 0in your provider config for reproducible benchmarks. See the README for full guidance.