Model

In this tutorial, we introduce the model APIs integrated in AgentScope, how to use them and how to integrate new model APIs. The supported model APIs and providers include:

API

Class

Compatible

Streaming

Tools

Vision

Reasoning

OpenAI

OpenAIChatModel

vLLM, DeepSeek

DashScope

DashScopeChatModel

Anthropic

AnthropicChatModel

Gemini

GeminiChatModel

Ollama

OllamaChatModel

Note

When using vLLM, you need to configure the appropriate tool calling parameters for different models during deployment, such as --enable-auto-tool-choice, --tool-call-parser, etc. For more details, refer to the official vLLM documentation.

Note

For OpenAI-compatible models (e.g. vLLM, Deepseek), developers can use the OpenAIChatModel class, and specify the API endpoint by the client_kwargs parameter: client_kwargs={"base_url": "http://your-api-endpoint"}. For example:

OpenAIChatModel(client_kwargs={"base_url": "http://localhost:8000/v1"})

Note

Model behavior parameters (such as temperature, maximum length, etc.) can be preset in the constructor function via the generate_kwargs parameter. For example:

OpenAIChatModel(generate_kwargs={"temperature": 0.3, "max_tokens": 1000})

To provide unified model interfaces, the above model classes has the following common methods:

  • The first three arguments of the __call__ method are messages , tools and tool_choice, representing the input messages, JSON schema of tool functions, and tool selection mode, respectively.

  • The return type are either a ChatResponse instance or an async generator of ChatResponse in streaming mode.

Note

Different model APIs differ in the input message format, refer to Prompt Formatter for more details.

The ChatResponse instance contains the generated thinking/text/tool use content, identity, created time and usage information.

import asyncio
import json
import os

from agentscope.message import TextBlock, ToolUseBlock, ThinkingBlock, Msg
from agentscope.model import ChatResponse, DashScopeChatModel

response = ChatResponse(
    content=[
        ThinkingBlock(
            type="thinking",
            thinking="I should search for AgentScope on Google.",
        ),
        TextBlock(type="text", text="I'll search for AgentScope on Google."),
        ToolUseBlock(
            type="tool_use",
            id="642n298gjna",
            name="google_search",
            input={"query": "AgentScope?"},
        ),
    ],
)

print(response)
ChatResponse(content=[{'type': 'thinking', 'thinking': 'I should search for AgentScope on Google.'}, {'type': 'text', 'text': "I'll search for AgentScope on Google."}, {'type': 'tool_use', 'id': '642n298gjna', 'name': 'google_search', 'input': {'query': 'AgentScope?'}}], id='2026-02-07 17:19:03.227_1df985', created_at='2026-02-07 17:19:03.227', type='chat', usage=None, metadata=None)

Taking DashScopeChatModel as an example, we can use it to create a chat model instance and call it with messages and tools:

async def example_model_call() -> None:
    """An example of using the DashScopeChatModel."""
    model = DashScopeChatModel(
        model_name="qwen-max",
        api_key=os.environ["DASHSCOPE_API_KEY"],
        stream=False,
    )

    res = await model(
        messages=[
            {"role": "user", "content": "Hi!"},
        ],
    )

    # You can directly create a ``Msg`` object with the response content
    msg_res = Msg("Friday", res.content, "assistant")

    print("The response:", res)
    print("The response as Msg:", msg_res)


asyncio.run(example_model_call())
The response: ChatResponse(content=[{'type': 'text', 'text': 'Hello! How can I assist you today?'}], id='2026-02-07 17:19:05.304_030a1e', created_at='2026-02-07 17:19:05.304', type='chat', usage=ChatUsage(input_tokens=10, output_tokens=9, time=2.076061, type='chat', metadata=GenerationUsage(input_tokens=10, output_tokens=9)), metadata=None)
The response as Msg: Msg(id='DQ9FEP9Cg9oPQ6Z3ToFLW5', name='Friday', content=[{'type': 'text', 'text': 'Hello! How can I assist you today?'}], role='assistant', metadata=None, timestamp='2026-02-07 17:19:05.304', invocation_id='None')

Streaming

To enable streaming model, set the stream parameter in the model constructor to True. When streaming is enabled, the __call__ method will return an async generator that yields ChatResponse instances as they are generated by the model.

Note

The streaming mode in AgentScope is designed to be cumulative, meaning the content in each chunk contains all the previous content plus the newly generated content.

async def example_streaming() -> None:
    """An example of using the streaming model."""
    model = DashScopeChatModel(
        model_name="qwen-max",
        api_key=os.environ["DASHSCOPE_API_KEY"],
        stream=True,
    )

    generator = await model(
        messages=[
            {
                "role": "user",
                "content": "Count from 1 to 20, and just report the number without any other information.",
            },
        ],
    )
    print("The type of the response:", type(generator))

    i = 0
    async for chunk in generator:
        print(f"Chunk {i}")
        print(f"\ttype: {type(chunk.content)}")
        print(f"\t{chunk}\n")
        i += 1


asyncio.run(example_streaming())
The type of the response: <class 'async_generator'>
Chunk 0
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1'}], id='2026-02-07 17:19:06.671_ecdd79', created_at='2026-02-07 17:19:06.671', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=1, time=1.364635, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=1)), metadata=None)

Chunk 1
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n'}], id='2026-02-07 17:19:06.748_c37668', created_at='2026-02-07 17:19:06.748', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=4, time=1.441684, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=4)), metadata=None)

Chunk 2
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4'}], id='2026-02-07 17:19:06.822_d377e6', created_at='2026-02-07 17:19:06.822', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=7, time=1.515707, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=7)), metadata=None)

Chunk 3
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n'}], id='2026-02-07 17:19:06.894_84f055', created_at='2026-02-07 17:19:06.894', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=10, time=1.587621, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=10)), metadata=None)

Chunk 4
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n'}], id='2026-02-07 17:19:07.669_e78f8f', created_at='2026-02-07 17:19:07.669', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=16, time=2.363373, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=16)), metadata=None)

Chunk 5
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n1'}], id='2026-02-07 17:19:07.816_023059', created_at='2026-02-07 17:19:07.816', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=22, time=2.510118, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=22)), metadata=None)

Chunk 6
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n1'}], id='2026-02-07 17:19:07.970_111604', created_at='2026-02-07 17:19:07.970', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=28, time=2.664182, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=28)), metadata=None)

Chunk 7
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n1'}], id='2026-02-07 17:19:08.134_d58443', created_at='2026-02-07 17:19:08.135', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=34, time=2.828537, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=34)), metadata=None)

Chunk 8
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n1'}], id='2026-02-07 17:19:08.269_124e56', created_at='2026-02-07 17:19:08.269', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=40, time=2.96277, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=40)), metadata=None)

Chunk 9
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n1'}], id='2026-02-07 17:19:08.417_045f19', created_at='2026-02-07 17:19:08.417', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=46, time=3.111216, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=46)), metadata=None)

Chunk 10
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20'}], id='2026-02-07 17:19:08.589_f4e5f9', created_at='2026-02-07 17:19:08.589', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=50, time=3.283434, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=50)), metadata=None)

Chunk 11
        type: <class 'list'>
        ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20'}], id='2026-02-07 17:19:08.608_635b05', created_at='2026-02-07 17:19:08.608', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=50, time=3.301949, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=50)), metadata=None)

Reasoning

AgentScope supports reasoning models by providing the ThinkingBlock.

async def example_reasoning() -> None:
    """An example of using the reasoning model."""
    model = DashScopeChatModel(
        model_name="qwen-turbo",
        api_key=os.environ["DASHSCOPE_API_KEY"],
        enable_thinking=True,
    )

    res = await model(
        messages=[
            {"role": "user", "content": "Who am I?"},
        ],
    )

    last_chunk = None
    async for chunk in res:
        last_chunk = chunk
    print("The final response:")
    print(last_chunk)


asyncio.run(example_reasoning())
The final response:
ChatResponse(content=[{'type': 'thinking', 'thinking': 'Okay, the user asked "Who am I?" which is a pretty open-ended question. I need to figure out how to respond appropriately. First, I should consider the context. Since this is a chatbot, the user might be asking about their identity in a general sense or maybe they want to know more about themselves through the conversation.\n\nI should start by acknowledging the question and then guide them to provide more details. Maybe they have a specific aspect in mind, like their role, purpose, or personal information. But since I can\'t know that, I should ask for clarification. However, I need to make sure my response is helpful and not too vague. \n\nI should also remember that as an AI, I don\'t have a personal identity. So I should explain that while I can\'t answer who they are, I can help them explore their own identity. Maybe ask them to share more about what they\'re thinking or what they want to know. That way, I can offer more tailored assistance.\n\nI should keep the tone friendly and open. Avoid making assumptions. Let them know I\'m here to help if they have specific questions. Maybe suggest examples of areas they could explore, like their values, goals, or experiences. But I need to phrase it in a way that invites them to elaborate without being pushy.'}, {'type': 'text', 'text': 'The question "Who am I?" is a profound and deeply personal one. As an AI, I don’t have a personal identity, but I can help you explore this question in a way that’s meaningful to you. Here are a few angles to consider:\n\n1. **Self-Reflection**: Think about your values, passions, experiences, and goals. What defines you? Are you shaped by your relationships, your culture, your beliefs, or your aspirations?\n\n2. **Perspectives**: Others might see you differently than you see yourself. How do your friends, family, or colleagues perceive you? How does that align with your own view?\n\n3. **Growth**: Who are you today, and who do you want to become? Identity is often a journey, not a fixed point.\n\n4. **Philosophical Angle**: Existentialists might say you create your own meaning, while others might look to biology, psychology, or spirituality for answers.\n\nIf you’d like, I can help you brainstorm or reflect on this further. What aspects of "who you are" feel most meaningful or confusing to you? 🌟'}], id='2026-02-07 17:19:17.575_22e4cf', created_at='2026-02-07 17:19:17.575', type='chat', usage=ChatUsage(input_tokens=12, output_tokens=497, time=8.962455, type='chat', metadata=GenerationUsage(input_tokens=12, output_tokens=497)), metadata=None)

Tools API

Different model providers differ in their tools APIs, e.g. the tools JSON schema, the tool call/response format. To provide a unified interface, AgentScope solves the problem by:

  • Providing unified tool call block ToolUseBlock and tool response block ToolResultBlock, respectively.

  • Providing a unified tools interface in the __call__ method of the model classes, that accepts a list of tools JSON schemas as follows:

json_schemas = [
    {
        "type": "function",
        "function": {
            "name": "google_search",
            "description": "Search for a query on Google.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query.",
                    },
                },
                "required": ["query"],
            },
        },
    },
]

Further Reading

Total running time of the script: (0 minutes 14.353 seconds)

Gallery generated by Sphinx-Gallery