Note
Go to the end to download the full example code.
Model¶
In this tutorial, we introduce the model APIs integrated in AgentScope, how to use them and how to integrate new model APIs. The supported model APIs and providers include:
API |
Class |
Compatible |
Streaming |
Tools |
Vision |
Reasoning |
|---|---|---|---|---|---|---|
OpenAI |
|
vLLM, DeepSeek |
✅ |
✅ |
✅ |
✅ |
DashScope |
|
✅ |
✅ |
✅ |
✅ |
|
Anthropic |
|
✅ |
✅ |
✅ |
✅ |
|
Gemini |
|
✅ |
✅ |
✅ |
✅ |
|
Ollama |
|
✅ |
✅ |
✅ |
✅ |
Note
When using vLLM, you need to configure the appropriate tool calling parameters for different models during deployment, such as --enable-auto-tool-choice, --tool-call-parser, etc. For more details, refer to the official vLLM documentation.
Note
For OpenAI-compatible models (e.g. vLLM, Deepseek), developers can use the OpenAIChatModel class, and specify the API endpoint by the client_kwargs parameter: client_kwargs={"base_url": "http://your-api-endpoint"}. For example:
OpenAIChatModel(client_kwargs={"base_url": "http://localhost:8000/v1"})
Note
Model behavior parameters (such as temperature, maximum length, etc.) can be preset in the constructor function via the generate_kwargs parameter. For example:
OpenAIChatModel(generate_kwargs={"temperature": 0.3, "max_tokens": 1000})
To provide unified model interfaces, the above model classes has the following common methods:
The first three arguments of the
__call__method aremessages,toolsandtool_choice, representing the input messages, JSON schema of tool functions, and tool selection mode, respectively.The return type are either a
ChatResponseinstance or an async generator ofChatResponsein streaming mode.
Note
Different model APIs differ in the input message format, refer to Prompt Formatter for more details.
The ChatResponse instance contains the generated thinking/text/tool use content, identity, created time and usage information.
import asyncio
import json
import os
from agentscope.message import TextBlock, ToolUseBlock, ThinkingBlock, Msg
from agentscope.model import ChatResponse, DashScopeChatModel
response = ChatResponse(
content=[
ThinkingBlock(
type="thinking",
thinking="I should search for AgentScope on Google.",
),
TextBlock(type="text", text="I'll search for AgentScope on Google."),
ToolUseBlock(
type="tool_use",
id="642n298gjna",
name="google_search",
input={"query": "AgentScope?"},
),
],
)
print(response)
ChatResponse(content=[{'type': 'thinking', 'thinking': 'I should search for AgentScope on Google.'}, {'type': 'text', 'text': "I'll search for AgentScope on Google."}, {'type': 'tool_use', 'id': '642n298gjna', 'name': 'google_search', 'input': {'query': 'AgentScope?'}}], id='2026-02-15 02:57:35.022_2578be', created_at='2026-02-15 02:57:35.022', type='chat', usage=None, metadata=None)
Taking DashScopeChatModel as an example, we can use it to create a chat model instance and call it with messages and tools:
async def example_model_call() -> None:
"""An example of using the DashScopeChatModel."""
model = DashScopeChatModel(
model_name="qwen-max",
api_key=os.environ["DASHSCOPE_API_KEY"],
stream=False,
)
res = await model(
messages=[
{"role": "user", "content": "Hi!"},
],
)
# You can directly create a ``Msg`` object with the response content
msg_res = Msg("Friday", res.content, "assistant")
print("The response:", res)
print("The response as Msg:", msg_res)
asyncio.run(example_model_call())
The response: ChatResponse(content=[{'type': 'text', 'text': 'Hello! How can I assist you today?'}], id='2026-02-15 02:57:36.784_2d89c2', created_at='2026-02-15 02:57:36.784', type='chat', usage=ChatUsage(input_tokens=10, output_tokens=9, time=1.761315, type='chat', metadata=GenerationUsage(input_tokens=10, output_tokens=9)), metadata=None)
The response as Msg: Msg(id='RJcKEWMrLQj58ABjgzNttG', name='Friday', content=[{'type': 'text', 'text': 'Hello! How can I assist you today?'}], role='assistant', metadata={}, timestamp='2026-02-15 02:57:36.784', invocation_id='None')
Streaming¶
To enable streaming model, set the stream parameter in the model constructor to True.
When streaming is enabled, the __call__ method will return an async generator that yields ChatResponse instances as they are generated by the model.
Note
The streaming mode in AgentScope is designed to be cumulative, meaning the content in each chunk contains all the previous content plus the newly generated content.
async def example_streaming() -> None:
"""An example of using the streaming model."""
model = DashScopeChatModel(
model_name="qwen-max",
api_key=os.environ["DASHSCOPE_API_KEY"],
stream=True,
)
generator = await model(
messages=[
{
"role": "user",
"content": "Count from 1 to 20, and just report the number without any other information.",
},
],
)
print("The type of the response:", type(generator))
i = 0
async for chunk in generator:
print(f"Chunk {i}")
print(f"\ttype: {type(chunk.content)}")
print(f"\t{chunk}\n")
i += 1
asyncio.run(example_streaming())
The type of the response: <class 'async_generator'>
Chunk 0
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1'}], id='2026-02-15 02:57:38.134_569470', created_at='2026-02-15 02:57:38.134', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=1, time=1.348886, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=1)), metadata=None)
Chunk 1
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n'}], id='2026-02-15 02:57:38.191_00ff4f', created_at='2026-02-15 02:57:38.192', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=4, time=1.405954, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=4)), metadata=None)
Chunk 2
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4'}], id='2026-02-15 02:57:38.259_604ac0', created_at='2026-02-15 02:57:38.259', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=7, time=1.47328, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=7)), metadata=None)
Chunk 3
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n'}], id='2026-02-15 02:57:38.523_64df65', created_at='2026-02-15 02:57:38.523', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=10, time=1.737262, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=10)), metadata=None)
Chunk 4
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n'}], id='2026-02-15 02:57:38.675_0edf44', created_at='2026-02-15 02:57:38.675', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=16, time=1.8899, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=16)), metadata=None)
Chunk 5
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n1'}], id='2026-02-15 02:57:38.791_7a6bea', created_at='2026-02-15 02:57:38.791', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=22, time=2.005497, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=22)), metadata=None)
Chunk 6
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n1'}], id='2026-02-15 02:57:39.025_b277f9', created_at='2026-02-15 02:57:39.025', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=28, time=2.239314, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=28)), metadata=None)
Chunk 7
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n1'}], id='2026-02-15 02:57:39.177_2ae189', created_at='2026-02-15 02:57:39.177', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=34, time=2.39141, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=34)), metadata=None)
Chunk 8
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n1'}], id='2026-02-15 02:57:39.496_a2dff0', created_at='2026-02-15 02:57:39.496', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=40, time=2.710102, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=40)), metadata=None)
Chunk 9
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n1'}], id='2026-02-15 02:57:39.664_9e2a39', created_at='2026-02-15 02:57:39.664', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=46, time=2.878117, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=46)), metadata=None)
Chunk 10
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20'}], id='2026-02-15 02:57:39.784_05db46', created_at='2026-02-15 02:57:39.784', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=50, time=2.998322, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=50)), metadata=None)
Chunk 11
type: <class 'list'>
ChatResponse(content=[{'type': 'text', 'text': '1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20'}], id='2026-02-15 02:57:39.802_c58878', created_at='2026-02-15 02:57:39.802', type='chat', usage=ChatUsage(input_tokens=27, output_tokens=50, time=3.016847, type='chat', metadata=GenerationUsage(input_tokens=27, output_tokens=50)), metadata=None)
Reasoning¶
AgentScope supports reasoning models by providing the ThinkingBlock.
async def example_reasoning() -> None:
"""An example of using the reasoning model."""
model = DashScopeChatModel(
model_name="qwen-turbo",
api_key=os.environ["DASHSCOPE_API_KEY"],
enable_thinking=True,
)
res = await model(
messages=[
{"role": "user", "content": "Who am I?"},
],
)
last_chunk = None
async for chunk in res:
last_chunk = chunk
print("The final response:")
print(last_chunk)
asyncio.run(example_reasoning())
The final response:
ChatResponse(content=[{'type': 'thinking', 'thinking': 'Okay, the user asked, "Who am I?" That\'s pretty vague. I need to figure out what they\'re really asking. Maybe they want to know about their identity, purpose, or something else. Since I\'m an AI, I can\'t know their personal details unless they tell me. I should respond by asking for more context.\n\nFirst, I should acknowledge that the question is broad. Then, I can ask them to clarify what aspect of their identity they\'re interested in. Are they asking about their role, their existence, or something else? Maybe they\'re feeling lost or curious about their purpose. I should keep the tone empathetic and open-ended.\n\nI need to make sure not to assume anything about their situation. Let them explain further so I can provide a more accurate response. Also, I should stay within my capabilities as an AI assistant. If they share personal information, I can help them explore it, but I can\'t know their identity without that input.\n\nSo, the best approach is to ask them to elaborate on what they mean by "Who am I?" and offer assistance based on their specific needs.\n'}, {'type': 'text', 'text': 'The question "Who am I?" is deeply personal and can mean different things depending on your perspective. Here are a few ways to explore it:\n\n1. **Philosophical/Existential**: If you\'re reflecting on your purpose or identity, it might involve considering your values, goals, and how you relate to the world. This is a journey many people undertake to find meaning.\n\n2. **Personal/Individual**: If you\'re seeking to understand your unique traits, experiences, or role in life, it could involve introspection, self-discovery, or conversations with others.\n\n3. **Spiritual/Religious**: Some people explore this question through faith, meditation, or practices that connect them to something greater than themselves.\n\n4. **Practical**: If you\'re feeling lost or uncertain, it might help to break it down into smaller questions: *What do I value? What are my strengths? What brings me fulfillment?*\n\nSince I don’t have access to your personal experiences or context, I can’t define *your* identity. However, I’m here to help you explore these questions further if you’d like! What would you like to discuss? 😊'}], id='2026-02-15 02:57:47.300_6e17a5', created_at='2026-02-15 02:57:47.300', type='chat', usage=ChatUsage(input_tokens=12, output_tokens=467, time=7.491742, type='chat', metadata=GenerationUsage(input_tokens=12, output_tokens=467)), metadata=None)
Tools API¶
Different model providers differ in their tools APIs, e.g. the tools JSON schema, the tool call/response format. To provide a unified interface, AgentScope solves the problem by:
Providing unified tool call block ToolUseBlock and tool response block ToolResultBlock, respectively.
Providing a unified tools interface in the
__call__method of the model classes, that accepts a list of tools JSON schemas as follows:
json_schemas = [
{
"type": "function",
"function": {
"name": "google_search",
"description": "Search for a query on Google.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query.",
},
},
"required": ["query"],
},
},
},
]
Further Reading¶
Total running time of the script: (0 minutes 12.283 seconds)