序
本文主要研究一下Spring AI的Evaluator
Evaluator
spring-ai-client-chat/src/main/java/org/springframework/ai/evaluation/Evaluator.java
@FunctionalInterface
public interface Evaluator {
EvaluationResponse evaluate(EvaluationRequest evaluationRequest);
default String doGetSupportingData(EvaluationRequest evaluationRequest) {
List<Document> data = evaluationRequest.getDataList();
return data.stream()
.map(Document::getText)
.filter(StringUtils::hasText)
.collect(Collectors.joining(System.lineSeparator()));
}
}
Evaluator接口定義了evaluate方法,用于對ai生成的內容進行評估,避免AI沒有產生幻覺式的響應,它有兩個實現,分別是RelevancyEvaluator、FactCheckingEvaluator
EvaluationRequest
org/springframework/ai/evaluation/EvaluationRequest.java
public class EvaluationRequest {
private final String userText;
private final List<Document> dataList;
private final String responseContent;
public EvaluationRequest(String userText, String responseContent) {
this(userText, Collections.emptyList(), responseContent);
}
public EvaluationRequest(List<Document> dataList, String responseContent) {
this("", dataList, responseContent);
}
public EvaluationRequest(String userText, List<Document> dataList, String responseContent) {
this.userText = userText;
this.dataList = dataList;
this.responseContent = responseContent;
}
//......
}
EvaluationRequest定義了userText、dataList、responseContent屬性,其中userText是用戶的輸入,dataList是上下文數據,比如RAG追加的內容,responseContent是AI模型的響應
EvaluationResponse
org/springframework/ai/evaluation/EvaluationResponse.java
public class EvaluationResponse {
private final boolean pass;
private final float score;
private final String feedback;
private final Map<String, Object> metadata;
@Deprecated
public EvaluationResponse(boolean pass, float score, String feedback, Map<String, Object> metadata) {
this.pass = pass;
this.score = score;
this.feedback = feedback;
this.metadata = metadata;
}
public EvaluationResponse(boolean pass, String feedback, Map<String, Object> metadata) {
this.pass = pass;
this.score = 0;
this.feedback = feedback;
this.metadata = metadata;
}
//......
}
EvaluationResponse定義了pass、score、feedback、metadata屬性
RelevancyEvaluator
org/springframework/ai/evaluation/RelevancyEvaluator.java
public class RelevancyEvaluator implements Evaluator {
private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
Your task is to evaluate if the response for the query
is in line with the context information provided.\\n
You have two options to answer. Either YES/ NO.\\n
Answer - YES, if the response for the query
is in line with context information otherwise NO.\\n
Query: \\n {query}\\n
Response: \\n {response}\\n
Context: \\n {context}\\n
Answer: "
""";
private final ChatClient.Builder chatClientBuilder;
public RelevancyEvaluator(ChatClient.Builder chatClientBuilder) {
this.chatClientBuilder = chatClientBuilder;
}
@Override
public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {
var response = evaluationRequest.getResponseContent();
var context = doGetSupportingData(evaluationRequest);
String evaluationResponse = this.chatClientBuilder.build()
.prompt()
.user(userSpec -> userSpec.text(DEFAULT_EVALUATION_PROMPT_TEXT)
.param("query", evaluationRequest.getUserText())
.param("response", response)
.param("context", context))
.call()
.content();
boolean passing = false;
float score = 0;
if (evaluationResponse.toLowerCase().contains("yes")) {
passing = true;
score = 1;
}
return new EvaluationResponse(passing, score, "", Collections.emptyMap());
}
}
RelevancyEvaluator讓AI去評估響應是否與上下文信息一致,給出yes或者no的結果,如果是yes則passing為true,score為1,否則默認passing為false,score為0
示例
@Test
void testEvaluation() {
dataController.delete();
dataController.load();
String userText = "What is the purpose of Carina?";
ChatResponse response = ChatClient.builder(chatModel)
.build().prompt()
.advisors(new QuestionAnswerAdvisor(vectorStore))
.user(userText)
.call()
.chatResponse();
String responseContent = response.getResult().getOutput().getContent();
var relevancyEvaluator = new RelevancyEvaluator(ChatClient.builder(chatModel));
EvaluationRequest evaluationRequest = new EvaluationRequest(userText,
(List<Content>) response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS), responseContent);
EvaluationResponse evaluationResponse = relevancyEvaluator.evaluate(evaluationRequest);
assertTrue(evaluationResponse.isPass(), "Response is not relevant to the question");
}
這里先用userText去問下AI,然后將responseContent、QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS一起丟給relevancyEvaluator,再用AI去評估一下
FactCheckingEvaluator
org/springframework/ai/evaluation/FactCheckingEvaluator.java
public class FactCheckingEvaluator implements Evaluator {
private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
Evaluate whether or not the following claim is supported by the provided document.
Respond with "yes" if the claim is supported, or "no" if it is not.
Document: \\n {document}\\n
Claim: \\n {claim}
""";
private static final String BESPOKE_EVALUATION_PROMPT_TEXT = """
Document: \\n {document}\\n
Claim: \\n {claim}
""";
private final ChatClient.Builder chatClientBuilder;
private final String evaluationPrompt;
/**
* Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder. Uses
* the default evaluation prompt suitable for general purpose LLMs.
* @param chatClientBuilder The builder for the ChatClient used to perform the
* evaluation
*/
public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder) {
this(chatClientBuilder, DEFAULT_EVALUATION_PROMPT_TEXT);
}
/**
* Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder and
* evaluation prompt.
* @param chatClientBuilder The builder for the ChatClient used to perform the
* evaluation
* @param evaluationPrompt The prompt text to use for evaluation
*/
public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder, String evaluationPrompt) {
this.chatClientBuilder = chatClientBuilder;
this.evaluationPrompt = evaluationPrompt;
}
/**
* Creates a FactCheckingEvaluator configured for use with the Bespoke Minicheck
* model.
* @param chatClientBuilder The builder for the ChatClient used to perform the
* evaluation
* @return A FactCheckingEvaluator configured for Bespoke Minicheck
*/
public static FactCheckingEvaluator forBespokeMinicheck(ChatClient.Builder chatClientBuilder) {
return new FactCheckingEvaluator(chatClientBuilder, BESPOKE_EVALUATION_PROMPT_TEXT);
}
/**
* Evaluates whether the response content in the EvaluationRequest is factually
* supported by the context provided in the same request.
* @param evaluationRequest The request containing the response to be evaluated and
* the supporting context
* @return An EvaluationResponse indicating whether the claim is supported by the
* document
*/
@Override
public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {
var response = evaluationRequest.getResponseContent();
var context = doGetSupportingData(evaluationRequest);
String evaluationResponse = this.chatClientBuilder.build()
.prompt()
.user(userSpec -> userSpec.text(this.evaluationPrompt).param("document", context).param("claim", response))
.call()
.content();
boolean passing = evaluationResponse.equalsIgnoreCase("yes");
return new EvaluationResponse(passing, "", Collections.emptyMap());
}
}
FactCheckingEvaluator旨在評估AI生成的響應在給定上下文中的事實準確性。該評估器通過驗證給定的聲明(claim)是否邏輯上支持提供的上下文(document),幫助檢測和減少AI輸出中的幻覺現象;在使用FactCheckingEvaluator時,claim和document會被提交給AI模型進行評估。為了更高效地完成這一任務,可以使用更小且更高效的AI模型,例如Bespoke的Minicheck。Minicheck 是一種專門設計用于事實核查的小型高效模型,它通過分析事實信息片段和生成的輸出,驗證聲明是否與文檔相符。如果文檔能夠證實聲明的真實性,模型將回答“是”,否則回答“否”。這種模型特別適用于檢索增強型生成(RAG)應用,確保生成的答案基于上下文信息。
示例
@Test
void testFactChecking() {
// Set up the Ollama API
OllamaApi ollamaApi = new OllamaApi("http://localhost:11434");
ChatModel chatModel = new OllamaChatModel(ollamaApi,
OllamaOptions.builder().model(BESPOKE_MINICHECK).numPredict(2).temperature(0.0d).build())
// Create the FactCheckingEvaluator
var factCheckingEvaluator = new FactCheckingEvaluator(ChatClient.builder(chatModel));
// Example context and claim
String context = "The Earth is the third planet from the Sun and the only astronomical object known to harbor life.";
String claim = "The Earth is the fourth planet from the Sun.";
// Create an EvaluationRequest
EvaluationRequest evaluationRequest = new EvaluationRequest(context, Collections.emptyList(), claim);
// Perform the evaluation
EvaluationResponse evaluationResponse = factCheckingEvaluator.evaluate(evaluationRequest);
assertFalse(evaluationResponse.isPass(), "The claim should not be supported by the context");
}
這里使用ollama調用bespoke-minicheck模型,其temperature設置為0.0,之后把context與claim都傳遞給factCheckingEvaluator去評估
小結
Spring AI提供了Evaluator接口定義了evaluate方法,用于對ai生成的內容進行評估,避免AI沒有產生幻覺式的響應,它有兩個實現,分別是RelevancyEvaluator、FactCheckingEvaluator。RelevancyEvaluator用于評估相關性,FactCheckingEvaluator用于評估事實準確性。