.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "build_tutorial/multimodality.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_build_tutorial_multimodality.py: .. _multimodality: MultiModality ============================ In this section, we will show how to build multimodal applications in AgentScope with two examples. - The first example demonstrates how to use vision LLMs within an agent, and - the second example shows how to use text to image generation within an agent. Building Vision Agent ------------------------------ For most LLM APIs, the vision and non-vision LLMs share the same APIs, and only differ in the input format. In AgentScope, the `format` function of the model wrapper is responsible for converting the input `Msg` objects into the required format for vision LLMs. That is, we only need to specify the vision LLM without changing the agent's code. Taking "qwen-vl-max" as an example, its model configuration is the same as the non-vision LLMs in DashScope Chat API. Refer to section :ref:`model_api` for the vision LLM APIs supported in AgentScope. .. GENERATED FROM PYTHON SOURCE LINES 24-31 .. code-block:: Python model_config = { "config_name": "my-qwen-vl", "model_type": "dashscope_multimodal", "model_name": "qwen-vl-max", } .. GENERATED FROM PYTHON SOURCE LINES 32-33 As usual, we initialize AgentScope with the above configuration, and create a new agent with the vision LLM. .. GENERATED FROM PYTHON SOURCE LINES 34-46 .. code-block:: Python from agentscope.agents import DialogAgent import agentscope agentscope.init(model_configs=model_config) agent = DialogAgent( name="Monday", sys_prompt="You're a helpful assistant named Monday.", model_config_name="my-qwen-vl", ) .. rst-class:: sphx-glr-script-out .. code-block:: none 2025-03-10 03:33:19 | INFO | agentscope.manager._model:load_model_configs:138 - Load configs for model wrapper: my-qwen-vl .. GENERATED FROM PYTHON SOURCE LINES 47-51 To communicate with the vision agent with pictures, `Msg` class provides an `url` field. You can put both local or online image URL(s) in the `url` field. Let's first create an image with matplotlib .. GENERATED FROM PYTHON SOURCE LINES 51-63 .. code-block:: Python import matplotlib.pyplot as plt plt.figure(figsize=(6, 6)) plt.bar(range(3), [2, 1, 4]) plt.xticks(range(3), ["Alice", "Bob", "Charlie"]) plt.title("The Apples Each Person Has in 2023") plt.xlabel("Number of Apples") plt.show() plt.savefig("./bar.png") .. image-sg:: /build_tutorial/images/sphx_glr_multimodality_001.png :alt: The Apples Each Person Has in 2023 :srcset: /build_tutorial/images/sphx_glr_multimodality_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 64-65 Then, we create a `Msg` object with the image URL .. GENERATED FROM PYTHON SOURCE LINES 65-75 .. code-block:: Python from agentscope.message import Msg msg = Msg( name="User", content="Describe the attached image for me.", role="user", url="./bar.png", ) .. GENERATED FROM PYTHON SOURCE LINES 76-77 After that, we can send the message to the vision agent and get the response. .. GENERATED FROM PYTHON SOURCE LINES 77-79 .. code-block:: Python response = agent(msg) .. rst-class:: sphx-glr-script-out .. code-block:: none Monday: The image is a bar chart titled "The Apples Each Person Has in 2023." It shows the number of apples that three individuals, Alice, Bob, and Charlie, have. The y-axis represents the number of apples, ranging from 0 to 4, while the x-axis lists the names of the individuals. - Alice has 2 apples. - Bob has 1 apple. - Charlie has 4 apples. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 9.579 seconds) .. _sphx_glr_download_build_tutorial_multimodality.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: multimodality.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: multimodality.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: multimodality.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_