.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "build_tutorial/multimodality.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_build_tutorial_multimodality.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_build_tutorial_multimodality.py:


.. _multimodality:

多模态
============================

在本节中,我们将展示如何在 AgentScope 中构建多模态应用程序。

构建视觉智能体
------------------------------

对于大多数大语言模型 API,视觉和非视觉模型共享相同的 API,只是输入格式有所不同。
在 AgentScope 中,模型包装器的 `format` 函数负责将输入的 `Msg` 对象转换为视觉模型所需的格式。

也就是说,我们只需指定视觉大语言模型而无需更改智能体的代码。
有关 AgentScope 支持的视觉大语言模型 API,请参阅 :ref:`model_api` 部分。

以 "qwen-vl-max" 为例,我们将使用视觉大语言模型构建一个智能体。

.. GENERATED FROM PYTHON SOURCE LINES 21-28

.. code-block:: Python


    model_config = {
        "config_name": "my-qwen-vl",
        "model_type": "dashscope_multimodal",
        "model_name": "qwen-vl-max",
    }








.. GENERATED FROM PYTHON SOURCE LINES 29-31

如往常一样,我们使用上述配置初始化 AgentScope,并使用视觉大语言模型创建一个新的智能体。


.. GENERATED FROM PYTHON SOURCE LINES 32-44

.. code-block:: Python


    from agentscope.agents import DialogAgent
    import agentscope

    agentscope.init(model_configs=model_config)

    agent = DialogAgent(
        name="Monday",
        sys_prompt="你是一个名为Monday的助手。",
        model_config_name="my-qwen-vl",
    )





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2025-03-10 03:39:31 | INFO     | agentscope.manager._model:load_model_configs:138 - Load configs for model wrapper: my-qwen-vl




.. GENERATED FROM PYTHON SOURCE LINES 45-50

为了与智能体进行多模态数据的交互,`Msg` 类提供了一个 `url` 字段。
你可以在 `url` 字段中放置本地或在线的图片 URL。

这里让我们首先使用 matplotlib 创建一个图片


.. GENERATED FROM PYTHON SOURCE LINES 50-62

.. code-block:: Python


    import matplotlib.pyplot as plt

    plt.figure(figsize=(6, 6))
    plt.bar(range(3), [2, 1, 4])
    plt.xticks(range(3), ["Alice", "Bob", "Charlie"])
    plt.title("The Apples Each Person Has in 2023")
    plt.xlabel("Number of Apples")

    plt.show()
    plt.savefig("./bar.png")




.. image-sg:: /build_tutorial/images/sphx_glr_multimodality_001.png
   :alt: The Apples Each Person Has in 2023
   :srcset: /build_tutorial/images/sphx_glr_multimodality_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 63-65

然后,我们创建一个包含图像 URL 的 `Msg` 对象


.. GENERATED FROM PYTHON SOURCE LINES 65-75

.. code-block:: Python


    from agentscope.message import Msg

    msg = Msg(
        name="用户",
        content="为我详细描述一下这个图片。",
        role="user",
        url="./bar.png",
    )








.. GENERATED FROM PYTHON SOURCE LINES 76-77

之后,我们可以将消息发送给视觉智能体并获取响应。

.. GENERATED FROM PYTHON SOURCE LINES 77-79

.. code-block:: Python


    response = agent(msg)




.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Monday: 这张图片是一张垂直条形图,标题为 "The Apples Each Person Has in 2023"。图表显示了三个人(Alice、Bob 和 Charlie)在 2023 年拥有的苹果数量。

    - 横轴标签为 "Number of Apples",表示苹果的数量。
    - 纵轴上没有明确的标签,但根据条形的高度可以推测是代表不同的人。
    - 条形的颜色为蓝色。

    具体数据如下:
    - Alice 拥有 2 个苹果。
    - Bob 拥有 1 个苹果。
    - Charlie 拥有 4 个苹果。





.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 15.666 seconds)


.. _sphx_glr_download_build_tutorial_multimodality.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: multimodality.ipynb <multimodality.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: multimodality.py <multimodality.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: multimodality.zip <multimodality.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_