聊聊Spring AI的Evaluator

本文主要研究一下Spring AI的Evaluator

Evaluator

spring-ai-client-chat/src/main/java/org/springframework/ai/evaluation/Evaluator.java

@FunctionalInterface
public interface Evaluator {

    EvaluationResponse evaluate(EvaluationRequest evaluationRequest);

    default String doGetSupportingData(EvaluationRequest evaluationRequest) {
        List<Document> data = evaluationRequest.getDataList();
        return data.stream()
            .map(Document::getText)
            .filter(StringUtils::hasText)
            .collect(Collectors.joining(System.lineSeparator()));
    }

}

Evaluator接口定義了evaluate方法,用于對ai生成的內容進行評估,避免AI沒有產生幻覺式的響應,它有兩個實現,分別是RelevancyEvaluator、FactCheckingEvaluator

EvaluationRequest

org/springframework/ai/evaluation/EvaluationRequest.java

public class EvaluationRequest {

    private final String userText;

    private final List<Document> dataList;

    private final String responseContent;

    public EvaluationRequest(String userText, String responseContent) {
        this(userText, Collections.emptyList(), responseContent);
    }

    public EvaluationRequest(List<Document> dataList, String responseContent) {
        this("", dataList, responseContent);
    }

    public EvaluationRequest(String userText, List<Document> dataList, String responseContent) {
        this.userText = userText;
        this.dataList = dataList;
        this.responseContent = responseContent;
    }

    //......
}   

EvaluationRequest定義了userText、dataList、responseContent屬性,其中userText是用戶的輸入,dataList是上下文數據,比如RAG追加的內容,responseContent是AI模型的響應

EvaluationResponse

org/springframework/ai/evaluation/EvaluationResponse.java

public class EvaluationResponse {

    private final boolean pass;

    private final float score;

    private final String feedback;

    private final Map<String, Object> metadata;

    @Deprecated
    public EvaluationResponse(boolean pass, float score, String feedback, Map<String, Object> metadata) {
        this.pass = pass;
        this.score = score;
        this.feedback = feedback;
        this.metadata = metadata;
    }

    public EvaluationResponse(boolean pass, String feedback, Map<String, Object> metadata) {
        this.pass = pass;
        this.score = 0;
        this.feedback = feedback;
        this.metadata = metadata;
    }

    //......
}   

EvaluationResponse定義了pass、score、feedback、metadata屬性

RelevancyEvaluator

org/springframework/ai/evaluation/RelevancyEvaluator.java

public class RelevancyEvaluator implements Evaluator {

    private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
                Your task is to evaluate if the response for the query
                is in line with the context information provided.\\n
                You have two options to answer. Either YES/ NO.\\n
                Answer - YES, if the response for the query
                is in line with context information otherwise NO.\\n
                Query: \\n {query}\\n
                Response: \\n {response}\\n
                Context: \\n {context}\\n
                Answer: "
            """;

    private final ChatClient.Builder chatClientBuilder;

    public RelevancyEvaluator(ChatClient.Builder chatClientBuilder) {
        this.chatClientBuilder = chatClientBuilder;
    }

    @Override
    public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {

        var response = evaluationRequest.getResponseContent();
        var context = doGetSupportingData(evaluationRequest);

        String evaluationResponse = this.chatClientBuilder.build()
            .prompt()
            .user(userSpec -> userSpec.text(DEFAULT_EVALUATION_PROMPT_TEXT)
                .param("query", evaluationRequest.getUserText())
                .param("response", response)
                .param("context", context))
            .call()
            .content();

        boolean passing = false;
        float score = 0;
        if (evaluationResponse.toLowerCase().contains("yes")) {
            passing = true;
            score = 1;
        }

        return new EvaluationResponse(passing, score, "", Collections.emptyMap());
    }

}

RelevancyEvaluator讓AI去評估響應是否與上下文信息一致,給出yes或者no的結果,如果是yes則passing為true,score為1,否則默認passing為false,score為0

示例

@Test
void testEvaluation() {

    dataController.delete();
    dataController.load();

    String userText = "What is the purpose of Carina?";

    ChatResponse response = ChatClient.builder(chatModel)
            .build().prompt()
            .advisors(new QuestionAnswerAdvisor(vectorStore))
            .user(userText)
            .call()
            .chatResponse();
    String responseContent = response.getResult().getOutput().getContent();

    var relevancyEvaluator = new RelevancyEvaluator(ChatClient.builder(chatModel));

    EvaluationRequest evaluationRequest = new EvaluationRequest(userText,
            (List<Content>) response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS), responseContent);

    EvaluationResponse evaluationResponse = relevancyEvaluator.evaluate(evaluationRequest);

    assertTrue(evaluationResponse.isPass(), "Response is not relevant to the question");

}

這里先用userText去問下AI,然后將responseContent、QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS一起丟給relevancyEvaluator,再用AI去評估一下

FactCheckingEvaluator

org/springframework/ai/evaluation/FactCheckingEvaluator.java

public class FactCheckingEvaluator implements Evaluator {

    private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
                Evaluate whether or not the following claim is supported by the provided document.
                Respond with "yes" if the claim is supported, or "no" if it is not.
                Document: \\n {document}\\n
                Claim: \\n {claim}
            """;

    private static final String BESPOKE_EVALUATION_PROMPT_TEXT = """
                Document: \\n {document}\\n
                Claim: \\n {claim}
            """;

    private final ChatClient.Builder chatClientBuilder;

    private final String evaluationPrompt;

    /**
     * Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder. Uses
     * the default evaluation prompt suitable for general purpose LLMs.
     * @param chatClientBuilder The builder for the ChatClient used to perform the
     * evaluation
     */
    public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder) {
        this(chatClientBuilder, DEFAULT_EVALUATION_PROMPT_TEXT);
    }

    /**
     * Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder and
     * evaluation prompt.
     * @param chatClientBuilder The builder for the ChatClient used to perform the
     * evaluation
     * @param evaluationPrompt The prompt text to use for evaluation
     */
    public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder, String evaluationPrompt) {
        this.chatClientBuilder = chatClientBuilder;
        this.evaluationPrompt = evaluationPrompt;
    }

    /**
     * Creates a FactCheckingEvaluator configured for use with the Bespoke Minicheck
     * model.
     * @param chatClientBuilder The builder for the ChatClient used to perform the
     * evaluation
     * @return A FactCheckingEvaluator configured for Bespoke Minicheck
     */
    public static FactCheckingEvaluator forBespokeMinicheck(ChatClient.Builder chatClientBuilder) {
        return new FactCheckingEvaluator(chatClientBuilder, BESPOKE_EVALUATION_PROMPT_TEXT);
    }

    /**
     * Evaluates whether the response content in the EvaluationRequest is factually
     * supported by the context provided in the same request.
     * @param evaluationRequest The request containing the response to be evaluated and
     * the supporting context
     * @return An EvaluationResponse indicating whether the claim is supported by the
     * document
     */
    @Override
    public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {
        var response = evaluationRequest.getResponseContent();
        var context = doGetSupportingData(evaluationRequest);

        String evaluationResponse = this.chatClientBuilder.build()
            .prompt()
            .user(userSpec -> userSpec.text(this.evaluationPrompt).param("document", context).param("claim", response))
            .call()
            .content();

        boolean passing = evaluationResponse.equalsIgnoreCase("yes");
        return new EvaluationResponse(passing, "", Collections.emptyMap());
    }

}

FactCheckingEvaluator旨在評估AI生成的響應在給定上下文中的事實準確性。該評估器通過驗證給定的聲明(claim)是否邏輯上支持提供的上下文(document),幫助檢測和減少AI輸出中的幻覺現象;在使用FactCheckingEvaluator時,claim和document會被提交給AI模型進行評估。為了更高效地完成這一任務,可以使用更小且更高效的AI模型,例如Bespoke的Minicheck。Minicheck 是一種專門設計用于事實核查的小型高效模型,它通過分析事實信息片段和生成的輸出,驗證聲明是否與文檔相符。如果文檔能夠證實聲明的真實性,模型將回答“是”,否則回答“否”。這種模型特別適用于檢索增強型生成(RAG)應用,確保生成的答案基于上下文信息。

示例

@Test
void testFactChecking() {
  // Set up the Ollama API
  OllamaApi ollamaApi = new OllamaApi("http://localhost:11434");

  ChatModel chatModel = new OllamaChatModel(ollamaApi,
                OllamaOptions.builder().model(BESPOKE_MINICHECK).numPredict(2).temperature(0.0d).build())


  // Create the FactCheckingEvaluator
  var factCheckingEvaluator = new FactCheckingEvaluator(ChatClient.builder(chatModel));

  // Example context and claim
  String context = "The Earth is the third planet from the Sun and the only astronomical object known to harbor life.";
  String claim = "The Earth is the fourth planet from the Sun.";

  // Create an EvaluationRequest
  EvaluationRequest evaluationRequest = new EvaluationRequest(context, Collections.emptyList(), claim);

  // Perform the evaluation
  EvaluationResponse evaluationResponse = factCheckingEvaluator.evaluate(evaluationRequest);

  assertFalse(evaluationResponse.isPass(), "The claim should not be supported by the context");

}

這里使用ollama調用bespoke-minicheck模型,其temperature設置為0.0,之后把context與claim都傳遞給factCheckingEvaluator去評估

小結

Spring AI提供了Evaluator接口定義了evaluate方法,用于對ai生成的內容進行評估,避免AI沒有產生幻覺式的響應,它有兩個實現,分別是RelevancyEvaluator、FactCheckingEvaluator。RelevancyEvaluator用于評估相關性,FactCheckingEvaluator用于評估事實準確性。

doc

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 227,797評論 6 531
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 98,179評論 3 414
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 175,628評論 0 373
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 62,642評論 1 309
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 71,444評論 6 405
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 54,948評論 1 321
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,040評論 3 440
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,185評論 0 287
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 48,717評論 1 333
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 40,602評論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 42,794評論 1 369
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,316評論 5 358
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,045評論 3 347
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,418評論 0 26
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,671評論 1 281
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,414評論 3 390
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 47,750評論 2 370

推薦閱讀更多精彩內容