多模态

第八章：多模态（可选）：Qwen-VL图像理解

点击这里👇🏻获取：100万QPS短链系统、复杂的商城微服务系统、智能翻译助手AI Agent、SaaS点餐系统、刷题吧小程序、商城系统、秒杀系统、AI项目、代码生成神器、苏三demo项目、智能天气播报AI Agent、智能代码审查AI Agent等 10 个项目的：项目源代码、开发教程和技术答疑

多模态输入示意

8.1 多模态能做什么？

以图像理解为例，常见业务场景包括：

商品图理解：识别品类、卖点、瑕疵
票据/截图理解：OCR + 关键信息抽取
内容审核：涉政涉黄涉暴风险提示（需结合业务风控策略）
运维与排障：分析截图中的错误信息并给出建议

8.2 工程边界先讲清楚

多模态不是“把图片丢给模型就完了”，最少要有这些边界：

大小限制：限制单张图片尺寸与字节数，避免内存与带宽被打爆
格式白名单：只允许 png/jpg/webp 等
输入来源：URL 下载要做域名白名单与超时，避免 SSRF 风险
隐私与合规：图片可能含 PII，存储与日志必须脱敏或不落盘

8.3 两种接入方式

不同团队对“图片从哪里来”选择不同：

方式 A：上传文件（最常见）：前端上传 MultipartFile，后端直接把图片作为 Resource 发送给模型
方式 B：图片 URL：后端下载图片再转成 Resource（需要严格的安全控制）

下面给出方式 A 的最小示例。

8.4 最小可运行示例：上传图片 → 模型描述

8.4.1 pom.xml

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <dependency>
        <groupId>com.alibaba.cloud.ai</groupId>
        <artifactId>spring-ai-alibaba-starter-dashscope</artifactId>
    </dependency>
</dependencies>

8.4.2 Controller：MultipartFile 转 Resource，组成 ChatPrompt

这个示例的关键点是：把图片作为 Resource 放进用户消息里，然后让模型“描述图片并输出结构化要点”。

package com.example.saa.api;

import java.util.Map;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.ChatPrompt;
import org.springframework.core.io.ByteArrayResource;
import org.springframework.http.MediaType;
import org.springframework.util.StringUtils;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

@RestController
public class VisionController {

    private static final long MAX_IMAGE_BYTES = 2 * 1024 * 1024;

    private final ChatClient chatClient;

    public VisionController(ChatClient chatClient) {
        this.chatClient = chatClient;
    }

    @PostMapping(value = "/vision/describe", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
    public Map<String, Object> describe(
            @RequestParam("image") MultipartFile image,
            @RequestParam(value = "question", required = false) String question
    ) throws Exception {
        if (image == null || image.isEmpty()) {
            return Map.of("success", false, "error", "image 为空");
        }
        if (image.getSize() > MAX_IMAGE_BYTES) {
            return Map.of("success", false, "error", "图片过大，最大允许 2MB");
        }
        String contentType = image.getContentType();
        if (!isAllowedImageType(contentType)) {
            return Map.of("success", false, "error", "不支持的图片类型: " + contentType);
        }

        ByteArrayResource resource = new ByteArrayResource(image.getBytes()) {
            @Override
            public String getFilename() {
                return StringUtils.hasText(image.getOriginalFilename()) ? image.getOriginalFilename() : "upload";
            }
        };

        ChatPrompt prompt = new ChatPrompt();
        prompt.add(new SystemMessage("""
                你是一个图像理解助手。
                你需要描述图片的主要内容，并输出 3~6 个要点。
                如果看不清，请说明“不确定”并提示需要更清晰的图。
                """));
        prompt.add(new UserMessage(StringUtils.hasText(question) ? question : "请描述这张图片并输出要点。"));
        prompt.add(new UserMessage(resource));

        String answer = chatClient.prompt(prompt).call().getContent();
        return Map.of("success", true, "answer", answer);
    }

    private boolean isAllowedImageType(String contentType) {
        if (!StringUtils.hasText(contentType)) {
            return false;
        }
        return contentType.equalsIgnoreCase(MediaType.IMAGE_JPEG_VALUE)
                || contentType.equalsIgnoreCase(MediaType.IMAGE_PNG_VALUE)
                || contentType.equalsIgnoreCase("image/webp");
    }
}

8.5 方式 B：图片 URL（建议只在内网/白名单环境使用）

如果你要支持 URL，建议至少做到：

域名白名单（例如只允许你的 OSS 域名）
连接与读取超时
最大下载大小限制
禁止私网地址（避免 SSRF）

8.6 本章小结

多模态的关键不是“能不能识图”，而是你把输入边界与安全风险管住：

文件大小与格式白名单
URL 下载防 SSRF
隐私脱敏与审计

下一章我们会把安全话题系统化：从密钥管理到 Prompt 注入、越权工具调用与输出脱敏，给出一套可落地的最小安全基线。点击这里👇🏻获取：100万QPS短链系统、复杂的商城微服务系统、智能翻译助手AI Agent、SaaS点餐系统、刷题吧小程序、商城系统、秒杀系统、AI项目、代码生成神器、苏三demo项目、智能天气播报AI Agent、智能代码审查AI Agent等 10 个项目的：项目源代码、开发教程和技术答疑