【LangChain入门 10】多模态数据输入

这里的多模态模型采用, deepseek暂时不支持。

hjxu2016

740人浏览 · 2025-03-31 18:41:47

hjxu2016 · 2025-03-31 18:41:47 发布

文章目录

一、LangChain的图片数据输入
二、Type多模态内容输入

这里的多模态模型采用 gemma3:latest, deepseek暂时不支持。

一、LangChain的图片数据输入

先定义Message，需要编码伟Base64

import base64
from langchain_core.messages import HumanMessage
from langchain_ollama import OllamaLLM, ChatOllama


file = open("./天气.jpg", "rb")
image_data = base64.b64encode(file.read()).decode('utf-8') 

llm = ChatOllama(model="gemma3:latest")

message = HumanMessage(
    content=[
        {"type": "text", "text": "用中文描述这张图片中的天气"},
        {"type": "image_url", "image_url":{"url":f"data:image/jpg;base64,{image_data}"}},
    ]
)
print("begin----------------")
# # model_with_tools = llm.bind_tool(weather_tool)
response = llm.invoke([message])
print(response)

输出如下
在这里插入图片描述

二、Type多模态内容输入

除了纯文本，HumanMessage 的 content 还可以包含以下多模态内容类型：

type: image_url： 用于嵌入图像的 URL。
type: audio_url： 用于嵌入音频的 URL。
type: video_url： 用于嵌入视频的 URL。
type: file_url： 用于嵌入文件（如 PDF）的 URL。

from langchain_core.messages import HumanMessage

message = HumanMessage(
    content=[
        "Please analyze the following image: ",
        {"type": "image_url", "url": "https://example.com/image.jpg"},
        " and tell me what you see."
    ]
)