Context Relevance — measures whether the retrieved context is relevant
to the question being asked (retriever quality signal).
Score 1.0 = all retrieved chunks are highly relevant and directly useful.
Score 0.0 = retrieved chunks are entirely off-topic / irrelevant to the question.
When to use: Use contextRelevance to diagnose retrieval quality issues. A
low score typically indicates that the embedding model, chunking strategy, or
similarity threshold is not filtering out irrelevant chunks well enough.
Difference from contextPrecision: Both measure retrieval quality, but from
different angles. contextRelevance makes a holistic judgment ("is this context
useful overall?") while contextPrecision computes an explicit ratio ("what
fraction of chunks are relevant?"). Use both for a complete retrieval picture.
Score interpretation (5-point scale):
1.0: All chunks are directly relevant — excellent retriever precision
0.75: Most chunks are relevant; one or two contain minor tangential content
0.5: Mixed — roughly half the retrieved content is relevant to the question
0.25: Most retrieved content is off-topic; only minor relevant signals
0.0: Entirely irrelevant — retriever is fetching the wrong documents completely
Uses LLM-as-judge pattern — see arXiv:2306.05685 (RAGAS paper).
Context Relevance — measures whether the retrieved context is relevant to the question being asked (retriever quality signal).
Score 1.0 = all retrieved chunks are highly relevant and directly useful. Score 0.0 = retrieved chunks are entirely off-topic / irrelevant to the question.
When to use: Use contextRelevance to diagnose retrieval quality issues. A low score typically indicates that the embedding model, chunking strategy, or similarity threshold is not filtering out irrelevant chunks well enough.
Difference from contextPrecision: Both measure retrieval quality, but from different angles. contextRelevance makes a holistic judgment ("is this context useful overall?") while contextPrecision computes an explicit ratio ("what fraction of chunks are relevant?"). Use both for a complete retrieval picture.
Score interpretation (5-point scale):
Uses LLM-as-judge pattern — see arXiv:2306.05685 (RAGAS paper).