Context Precision — measures what fraction of the retrieved context
chunks are actually relevant to answering the question (noise ratio signal).
Score 1.0 = every retrieved chunk is relevant and useful.
Score 0.0 = none of the retrieved chunks are relevant to the question.
What it measures: High precision = low retrieval noise. Low precision =
the retriever is returning irrelevant chunks alongside the useful ones, which
wastes token budget and can confuse the LLM generator.
Difference from contextRelevance: contextPrecision instructs the judge
to evaluate each chunk independently and compute an explicit ratio
(relevant / total). contextRelevance makes a holistic judgment. Use both
together for a comprehensive retrieval quality picture.
Score interpretation (5-point scale):
1.0: All chunks are relevant — retriever precision is excellent
0.75: Most chunks are relevant; one or two are not directly useful
0.5: About half the chunks are relevant; half are noise
0.25: Most chunks are noise; only a small fraction are useful
0.0: No retrieved chunk is relevant to the question — pure noise
Uses LLM-as-judge pattern — see arXiv:2306.05685 (RAGAS paper).
Context Precision — measures what fraction of the retrieved context chunks are actually relevant to answering the question (noise ratio signal).
Score 1.0 = every retrieved chunk is relevant and useful. Score 0.0 = none of the retrieved chunks are relevant to the question.
What it measures: High precision = low retrieval noise. Low precision = the retriever is returning irrelevant chunks alongside the useful ones, which wastes token budget and can confuse the LLM generator.
Difference from contextRelevance: contextPrecision instructs the judge to evaluate each chunk independently and compute an explicit ratio (relevant / total). contextRelevance makes a holistic judgment. Use both together for a comprehensive retrieval quality picture.
Score interpretation (5-point scale):
Uses LLM-as-judge pattern — see arXiv:2306.05685 (RAGAS paper).