rageval - v0.1.1
    Preparing search index...

    Module rageval

    rageval — TypeScript RAG pipeline evaluation library.

    The RAGAS-inspired equivalent for Node.js. Evaluate the quality of your Retrieval-Augmented Generation pipeline with LLM-as-judge scoring.

    import Anthropic from '@anthropic-ai/sdk'
    import { evaluate, faithfulness, contextRelevance, answerRelevance } from 'rageval'

    const results = await evaluate({
    provider: { type: 'anthropic', client: new Anthropic(), model: 'claude-haiku-4-5-20251001' },
    dataset: [
    {
    question: 'What is the capital of France?',
    answer: 'The capital of France is Paris.',
    contexts: ['France is a country in Western Europe. Its capital city is Paris.'],
    groundTruth: 'Paris',
    },
    ],
    metrics: [faithfulness, contextRelevance, answerRelevance],
    })

    console.log(results.scores)
    // { faithfulness: 0.97, contextRelevance: 0.91, answerRelevance: 0.95, overall: 0.94 }

    All scores are in the range [0, 1]:

    • 0.9 – 1.0 — Excellent
    • 0.7 – 0.9 — Good
    • 0.5 – 0.7 — Fair — consider reviewing retrieval or prompts
    • < 0.5 — Poor — pipeline needs attention

    Scores are non-deterministic by nature (LLM outputs vary). Treat differences smaller than ±0.03 as noise. Use temperature: 0 in your provider config for reproducible benchmarks. See the README for full guidance.

    Classes

    ThresholdError

    Interfaces - Exports

    PrintReportOptions

    Interfaces - Other

    EvaluateOptions
    MetricInput
    MetricOutput
    Metric
    LlmProvider
    AnthropicProviderConfig
    OpenAIProviderConfig
    AzureOpenAIProviderConfig

    Type Aliases - Other

    ScoreThresholds
    ProviderConfig

    Type Aliases - Types

    RagSample
    Dataset
    MetricName
    MetricScore
    SampleResult
    AggregateScores
    MetricStats
    EvaluationResult

    Variables

    answerRelevance
    contextPrecision
    contextRecall
    contextRelevance
    faithfulness
    RagSampleSchema
    DatasetSchema
    ProviderTypeSchema
    MetricNameSchema
    MetricScoreSchema
    SampleScoresSchema
    SampleResultSchema
    AggregateScoresSchema
    MetricStatsSchema
    EvaluationResultSchema

    Functions - Core

    evaluate

    Functions - Providers

    createAnthropicProvider
    createAzureOpenAIProvider
    createOpenAIProvider

    Functions - Exports

    cosineSimilarity
    toJson
    toCsv
    toHtml
    toJUnit
    toMarkdown
    printReport
    toSarif

    Functions - Utilities

    parseLlmScore
    jsonInstruction