性能与成本

Java突击队2026/4/30大约 4 分钟

第十一章：性能与成本优化

点击这里👇🏻获取：企业智能知识库系统、100万QPS短链系统、复杂的商城微服务系统、智能翻译助手AI Agent、SaaS点餐系统、刷题吧小程序、商城系统、秒杀系统、AI项目、代码生成神器、苏三demo项目、智能天气播报AI Agent、智能代码审查AI Agent、智能商品推荐系统等 12 个项目的：项目源代码、开发教程和技术答疑

性能与成本优化示意

11.1 优化顺序：先“止血”，再“提效”

建议的优先级顺序是：

并发控制与超时：先保证系统不会被打爆
缓存：重复问题不重复花钱
prompt/上下文压缩：减少 token
模型路由：按任务选模型（便宜的做简单任务，强的做难任务）

11.2 并发控制：避免 429 雪崩

当你遇到 429，第一反应不是“加重试次数”，而是先把并发压下来。

这里给一个最小的并发闸门：限制同时进行的 AI 调用数量。

package com.example.saa.performance;

import java.time.Duration;
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;
import org.springframework.stereotype.Component;

@Component
public class AiConcurrencyGuard {

    private final Semaphore semaphore;

    public AiConcurrencyGuard() {
        this.semaphore = new Semaphore(8);
    }

    public <T> T execute(Duration timeout, GuardedCall<T> call) throws Exception {
        boolean acquired = semaphore.tryAcquire(timeout.toMillis(), TimeUnit.MILLISECONDS);
        if (!acquired) {
            throw new IllegalStateException("AI 调用并发过高，请稍后重试");
        }
        try {
            return call.invoke();
        } finally {
            semaphore.release();
        }
    }

    @FunctionalInterface
    public interface GuardedCall<T> {
        T invoke() throws Exception;
    }
}

单元测试示例（边界：许可耗尽）

package com.example.saa.performance;

import static org.junit.jupiter.api.Assertions.assertThrows;

import java.time.Duration;
import org.junit.jupiter.api.Test;

public class AiConcurrencyGuardTest {

    @Test
    void should_throw_when_cannotAcquire() {
        AiConcurrencyGuard guard = new AiConcurrencyGuard() {
            @Override
            public <T> T execute(Duration timeout, GuardedCall<T> call) throws Exception {
                throw new IllegalStateException("AI 调用并发过高，请稍后重试");
            }
        };
        assertThrows(IllegalStateException.class, () -> guard.execute(Duration.ofMillis(1), () -> "ok"));
    }
}

11.3 缓存：同样的问题别重复花钱

缓存适合：

FAQ、制度问答、固定规则解释
结构化抽取（同一段文本重复抽取）

不适合：

强实时数据（库存、价格）除非你设置极短 TTL
需要强一致性的写操作工具调用

建议把缓存 key 做成“稳定指纹”：system + user + 关键上下文的哈希，避免把敏感原文当 key 存进缓存系统。

package com.example.saa.performance;

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.util.HexFormat;
import java.util.Objects;

public final class PromptFingerprint {

    private PromptFingerprint() {
    }

    public static String sha256(String system, String user, String context) {
        String raw = Objects.toString(system, "") + "\n" + Objects.toString(user, "") + "\n" + Objects.toString(context, "");
        try {
            MessageDigest digest = MessageDigest.getInstance("SHA-256");
            byte[] bytes = digest.digest(raw.getBytes(StandardCharsets.UTF_8));
            return HexFormat.of().formatHex(bytes);
        } catch (Exception ex) {
            throw new IllegalStateException("hash 失败", ex);
        }
    }
}

单元测试示例（边界：稳定性与差异性）

package com.example.saa.performance;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertNotEquals;

import org.junit.jupiter.api.Test;

public class PromptFingerprintTest {

    @Test
    void should_beStable_when_sameInputs() {
        String a = PromptFingerprint.sha256("s", "u", "c");
        String b = PromptFingerprint.sha256("s", "u", "c");
        assertEquals(a, b);
    }

    @Test
    void should_change_when_inputsChange() {
        String a = PromptFingerprint.sha256("s", "u", "c");
        String b = PromptFingerprint.sha256("s", "u2", "c");
        assertNotEquals(a, b);
    }
}