.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "build_tutorial/multimodality.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_build_tutorial_multimodality.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_build_tutorial_multimodality.py:


.. _multimodality:

MultiModality
============================

In this section, we will show how to build multimodal applications in AgentScope with two examples.

- The first example demonstrates how to use vision LLMs within an agent, and
- the second example shows how to use text to image generation within an agent.

Building Vision Agent
------------------------------

For most LLM APIs, the vision and non-vision LLMs share the same APIs, and only differ in the input format.
In AgentScope, the `format` function of the model wrapper is responsible for converting the input `Msg` objects into the required format for vision LLMs.

That is, we only need to specify the vision LLM without changing the agent's code.
Taking "qwen-vl-max" as an example, its model configuration is the same as the non-vision LLMs in DashScope Chat API.

Refer to section :ref:`model_api` for the vision LLM APIs supported in AgentScope.

.. GENERATED FROM PYTHON SOURCE LINES 24-31

.. code-block:: Python


    model_config = {
        "config_name": "my-qwen-vl",
        "model_type": "dashscope_multimodal",
        "model_name": "qwen-vl-max",
    }


.. GENERATED FROM PYTHON SOURCE LINES 32-33

As usual, we initialize AgentScope with the above configuration, and create a new agent with the vision LLM.

.. GENERATED FROM PYTHON SOURCE LINES 34-46

.. code-block:: Python


    from agentscope.agents import DialogAgent
    import agentscope

    agentscope.init(model_configs=model_config)

    agent = DialogAgent(
        name="Monday",
        sys_prompt="You're a helpful assistant named Monday.",
        model_config_name="my-qwen-vl",
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2025-07-07 07:26:11 | INFO     | agentscope.manager._model:load_model_configs:138 - Load configs for model wrapper: my-qwen-vl


.. GENERATED FROM PYTHON SOURCE LINES 47-51

To communicate with the vision agent with pictures, `Msg` class provides an `url` field.
You can put both local or online image URL(s) in the `url` field.

Let's first create an image with matplotlib

.. GENERATED FROM PYTHON SOURCE LINES 51-63

.. code-block:: Python


    import matplotlib.pyplot as plt

    plt.figure(figsize=(6, 6))
    plt.bar(range(3), [2, 1, 4])
    plt.xticks(range(3), ["Alice", "Bob", "Charlie"])
    plt.title("The Apples Each Person Has in 2023")
    plt.xlabel("Number of Apples")

    plt.show()
    plt.savefig("./bar.png")


.. image-sg:: /build_tutorial/images/sphx_glr_multimodality_001.png
   :alt: The Apples Each Person Has in 2023
   :srcset: /build_tutorial/images/sphx_glr_multimodality_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 64-65

Then, we create a `Msg` object with the image URL

.. GENERATED FROM PYTHON SOURCE LINES 65-75

.. code-block:: Python


    from agentscope.message import Msg

    msg = Msg(
        name="User",
        content="Describe the attached image for me.",
        role="user",
        url="./bar.png",
    )


.. GENERATED FROM PYTHON SOURCE LINES 76-77

After that, we can send the message to the vision agent and get the response.

.. GENERATED FROM PYTHON SOURCE LINES 77-79

.. code-block:: Python


    response = agent(msg)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2025-07-07 07:26:11 | ERROR    | agentscope.message.msg:__init__:112 - The url argument will be deprecated in the future. Consider using the ContentBlock instead to attach files to the message
    Monday: The image is a bar chart titled **"The Apples Each Person Has in 2023"**. It displays the number of apples that three individuals—Alice, Bob, and Charlie—possess. Here are the details:

    - **X-axis**: Labeled "Number of Apples," it represents the names of the individuals: Alice, Bob, and Charlie.
    - **Y-axis**: Represents the quantity of apples, ranging from 0 to 4.

    ### Data Representation:
    - **Alice**: Has **2 apples**.
    - **Bob**: Has **1 apple**.
    - **Charlie**: Has **4 apples**.

    ### Observations:
    - Charlie has the most apples (4), followed by Alice (2), and then Bob (1).
    - The bars are colored in blue, and the chart is simple and easy to read. 

    This visualization effectively compares the number of apples each person has in the year 2023.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 10.175 seconds)


.. _sphx_glr_download_build_tutorial_multimodality.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: multimodality.ipynb <multimodality.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: multimodality.py <multimodality.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: multimodality.zip <multimodality.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_