Embedding 与 RAG

Java突击队2026/4/30大约 5 分钟

第七章：Embedding与向量检索（RAG入门）

点击这里👇🏻获取：企业智能知识库系统、100万QPS短链系统、复杂的商城微服务系统、智能翻译助手AI Agent、SaaS点餐系统、刷题吧小程序、商城系统、秒杀系统、AI项目、代码生成神器、苏三demo项目、智能天气播报AI Agent、智能代码审查AI Agent、智能商品推荐系统等 12 个项目的：项目源代码、开发教程和技术答疑

RAG流程示意

7.1 RAG 是什么？它解决什么痛点？

RAG（Retrieval-Augmented Generation，检索增强生成）的核心思想很简单：

先把企业知识（文档、FAQ、制度、产品说明）切分并向量化
回答问题时，先检索出最相关的片段作为上下文
再让模型“基于上下文回答”，而不是凭空发挥

它能显著改善三类问题：

降低幻觉：模型不再凭印象编造细节
提高命中率：答案更贴近你的业务资料
知识可更新：改文档 → 重新入库 → 不用重新训练模型

7.2 Embedding：把文本变成向量

Embedding 的结果是一个高维向量（float 数组）。工程上你只需要记住两点：

相似文本的向量距离更近
RAG 不是“全文喂给模型”，而是“检索 TopK 片段喂给模型”

7.3 配置 DashScope Embedding（通义向量化）

建议把 Key 放在环境变量：

export AI_DASHSCOPE_API_KEY="你的DashScopeKey"

application.yml 示例（把 embedding 也启用）：

spring:
  ai:
    dashscope:
      api-key: ${AI_DASHSCOPE_API_KEY:}
      chat:
        options:
          model: ${DASHSCOPE_CHAT_MODEL:qwen-plus}
          temperature: 0.2
    model:
      embedding: dashscope
    dashscope:
      embedding:
        options:
          model: ${DASHSCOPE_EMBEDDING_MODEL:text-embedding-v2}

7.4 两种实现方式：先能跑，再逐步升级

因为不同 Spring AI 版本与依赖组合里，向量库模块是否自动带上不一定一致。这里给两种方案：

方案 A：使用 VectorStore（推荐）：更贴近 Spring AI 的生态，后续替换到 pgvector/Redis/Elasticsearch 更顺
方案 B：自己实现最小向量库（无额外依赖）：依赖最少，便于理解与快速验证

下面先实现方案 B，确保“复制就能跑”，再给出方案 A 的接入方式。

7.5 方案 B：最小向量库 + RAG（可复制运行）

7.5.1 最小向量库（内存版）

package com.example.saa.rag;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.UUID;
import java.util.stream.IntStream;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.stereotype.Component;

@Component
public class InMemoryVectorIndex {

    private final EmbeddingModel embeddingModel;
    private final List<Entry> entries = new ArrayList<>();

    public InMemoryVectorIndex(EmbeddingModel embeddingModel) {
        this.embeddingModel = embeddingModel;
    }

    public synchronized String add(String text, Map<String, Object> metadata) {
        String id = UUID.randomUUID().toString();
        float[] vector = embeddingModel.embed(Objects.toString(text, ""));
        entries.add(new Entry(id, text, metadata == null ? Map.of() : metadata, vector));
        return id;
    }

    public synchronized List<Entry> search(String query, int topK) {
        if (entries.isEmpty()) {
            return List.of();
        }
        float[] q = embeddingModel.embed(Objects.toString(query, ""));

        return entries.stream()
                .map(e -> new Scored(e, cosineSimilarity(q, e.vector())))
                .sorted(Comparator.comparingDouble(Scored::score).reversed())
                .limit(Math.max(1, topK))
                .map(Scored::entry)
                .toList();
    }

    static double cosineSimilarity(float[] a, float[] b) {
        int n = Math.min(a.length, b.length);
        double dot = 0;
        double na = 0;
        double nb = 0;
        for (int i = 0; i < n; i++) {
            dot += (double) a[i] * b[i];
            na += (double) a[i] * a[i];
            nb += (double) b[i] * b[i];
        }
        if (na == 0 || nb == 0) {
            return 0;
        }
        return dot / (Math.sqrt(na) * Math.sqrt(nb));
    }

    public record Entry(String id, String text, Map<String, Object> metadata, float[] vector) {
    }

    private record Scored(Entry entry, double score) {
    }
}

7.5.2 RAG Service：检索 TopK → 组装 Prompt → 生成答案

package com.example.saa.rag;

import java.util.List;
import java.util.StringJoiner;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;

@Service
public class RagService {

    private static final int DEFAULT_TOP_K = 4;

    private final ChatClient chatClient;
    private final InMemoryVectorIndex vectorIndex;

    public RagService(ChatClient chatClient, InMemoryVectorIndex vectorIndex) {
        this.chatClient = chatClient;
        this.vectorIndex = vectorIndex;
    }

    public String ask(String question) {
        List<InMemoryVectorIndex.Entry> contexts = vectorIndex.search(question, DEFAULT_TOP_K);

        StringJoiner ctx = new StringJoiner("\n---\n");
        for (InMemoryVectorIndex.Entry e : contexts) {
            ctx.add(e.text());
        }

        String answer = chatClient.prompt()
                .system("""
                        你是企业知识库助手。
                        你必须优先基于【上下文】回答。
                        如果上下文中没有答案，明确回答“不确定”，并给出需要补充的资料类型。
                        """)
                .user("""
                        【上下文】
                        %s

                        【问题】
                        %s
                        """.formatted(ctx.toString(), question))
                .call()
                .content();

        return answer;
    }
}

7.5.3 Controller：提供导入与问答接口

package com.example.saa.api;

import com.example.saa.rag.InMemoryVectorIndex;
import com.example.saa.rag.RagService;
import java.util.Map;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class RagController {

    private final InMemoryVectorIndex vectorIndex;
    private final RagService ragService;

    public RagController(InMemoryVectorIndex vectorIndex, RagService ragService) {
        this.vectorIndex = vectorIndex;
        this.ragService = ragService;
    }

    @PostMapping(value = "/rag/import", consumes = MediaType.TEXT_PLAIN_VALUE)
    public Map<String, Object> importDoc(@RequestBody String text) {
        String id = vectorIndex.add(text, Map.of("source", "manual"));
        return Map.of("id", id);
    }

    @PostMapping(value = "/rag/ask", produces = MediaType.TEXT_PLAIN_VALUE)
    public String ask(@RequestParam String q) {
        return ragService.ask(q);
    }
}

7.5.4 单元测试示例（边界：空库、相似度）

package com.example.saa.rag;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;

import org.junit.jupiter.api.Test;

public class InMemoryVectorIndexTest {

    @Test
    void should_returnZero_when_vectorIsAllZeros() {
        double score = InMemoryVectorIndex.cosineSimilarity(new float[]{0, 0}, new float[]{1, 2});
        assertEquals(0, score);
    }

    @Test
    void should_returnOne_when_vectorsAreSameDirection() {
        double score = InMemoryVectorIndex.cosineSimilarity(new float[]{1, 0}, new float[]{2, 0});
        assertTrue(score > 0.999);
    }
}

7.6 方案 A：使用 VectorStore（推荐路线）

当你的项目依赖中包含 Spring AI 的向量库模块时，可以用 VectorStore 做标准化接入。通常思路是：

注入 VectorStore
导入文档时调用 vectorStore.add(List<Document>)
查询时 vectorStore.similaritySearch(SearchRequest.query(q).withTopK(k))

你可以先把方案 B 跑通，再把“存储与检索”替换成 VectorStore，不影响上层的 RagService 结构。

7.7 本章小结

你已经完成了 RAG 的最小闭环：

用 DashScope Embedding 把文本向量化
用相似度检索 TopK 片段
把片段作为上下文交给模型回答，降低幻觉

下一章我们会讲多模态（可选）：让模型理解图片，并把图片输入的工程边界（大小、格式、安全）讲清楚。点击这里👇🏻获取：企业智能知识库系统、100万QPS短链系统、复杂的商城微服务系统、智能翻译助手AI Agent、SaaS点餐系统、刷题吧小程序、商城系统、秒杀系统、AI项目、代码生成神器、苏三demo项目、智能天气播报AI Agent、智能代码审查AI Agent、智能商品推荐系统等 12 个项目的：项目源代码、开发教程和技术答疑