Answer Relevance — measures whether the generated answer actually
addresses the question that was asked (on-topic response signal).
Score 1.0 = the answer directly and completely responds to the question.
Score 0.0 = the answer is entirely off-topic or does not address the question.
Important distinction: This metric measures topicality, not accuracy.
An answer can score 1.0 on answerRelevance while still being factually wrong.
Use in combination with faithfulness for a complete quality picture:
faithfulness: Is the answer grounded in context? (no hallucinations)
answerRelevance: Is the answer on-topic and responsive? (no evasions)
When to use: answerRelevance catches LLM responses that are technically
"about the right topic" but fail to answer the specific question asked —
e.g., an answer that recites background information instead of the specific
fact the user requested.
Score interpretation (5-point scale):
1.0: Answer directly and completely addresses the question — nothing missing
0.75: Answer mostly addresses the question with minor gaps or tangents
0.5: Answer partially addresses the question; significant gaps or tangents
0.25: Answer barely addresses the question; mostly off-topic or evasive
0.0: Answer does not address the question at all — entirely off-topic
Uses LLM-as-judge pattern — see arXiv:2306.05685 (RAGAS paper).
Answer Relevance — measures whether the generated answer actually addresses the question that was asked (on-topic response signal).
Score 1.0 = the answer directly and completely responds to the question. Score 0.0 = the answer is entirely off-topic or does not address the question.
Important distinction: This metric measures topicality, not accuracy. An answer can score 1.0 on answerRelevance while still being factually wrong. Use in combination with
faithfulnessfor a complete quality picture:When to use: answerRelevance catches LLM responses that are technically "about the right topic" but fail to answer the specific question asked — e.g., an answer that recites background information instead of the specific fact the user requested.
Score interpretation (5-point scale):
Uses LLM-as-judge pattern — see arXiv:2306.05685 (RAGAS paper).