agentscope.tool¶
The tool module in agentscope.
- class Toolkit[source]¶
Bases:
StateModule
The class that supports both function- and group-level tool management.
Use the following methods to manage the tool functions:
register_tool_function
remove_tool_function
For group-level management:
create_tool_group
update_tool_groups
remove_tool_groups
MCP related methods:
register_mcp_server
remove_mcp_servers
To run the tool functions or get the data from the activated tools:
call_tool_function
get_json_schemas
get_tool_group_notes
- create_tool_group(group_name, description, active=False, notes=None)[source]¶
Create a tool group to organize tool functions
- Parameters:
group_name (str) – The name of the tool group.
description (str) – The description of the tool group.
active (bool, defaults to False) – If the group is active, meaning the tool functions in this group are included in the JSON schema.
notes (str | None, optional) – The notes used to remind the agent how to use the tool functions properly, which can be combined into the system prompt.
- Return type:
None
- update_tool_groups(group_names, active)[source]¶
Update the activation status of the given tool groups.
- Parameters:
group_names (list[str]) – The list of tool group names to be updated.
active (bool) – If the tool groups should be activated or deactivated.
- Return type:
None
- remove_tool_groups(group_names)[source]¶
Remove tool functions from the toolkit by their group names.
- Parameters:
group_names (str) – The group names to be removed from the toolkit.
- Return type:
None
- register_tool_function(tool_func, group_name='basic', preset_kwargs=None, func_description=None, json_schema=None, include_long_description=True, include_var_positional=False, include_var_keyword=False, postprocess_func=None)[source]¶
Register a tool function to the toolkit.
- Parameters:
tool_func (ToolFunction) – The tool function, which can be async or sync, streaming or not-streaming, but the response must be a ToolResponse object.
group_name (str | Literal[“basic”], defaults to “basic”) – The belonging group of the tool function. Tools in “basic” group is always included in the JSON schema, while the others are only included when their group is active.
preset_kwargs (dict[str, JSONSerializableObject] | None, optional) – Preset arguments by the user, which will not be included in the JSON schema, nor exposed to the agent.
func_description (str | None, optional) – The function description. If not provided, the description will be extracted from the docstring automatically.
json_schema (dict | None, optional) – Manually provided JSON schema for the tool function, which should be {“type”: “function”, “function”: {“name”: “function_name”: “xx”, “description”: “xx”, “parameters”: {…}}}
include_long_description (bool, defaults to True) – When extracting function description from the docstring, if the long description will be included.
include_var_positional (bool, defaults to False) – Whether to include the variable positional arguments (*args) in the function schema.
include_var_keyword (bool, defaults to False) – Whether to include the variable keyword arguments (**kwargs) in the function schema.
postprocess_func (Callable[[ToolUseBlock, ToolResponse], ToolResponse | None] | None, optional) – A post-processing function that will be called after the tool function is executed, taking the tool call block and tool response as arguments. If it returns None, the tool result will be returned as is. If it returns a ToolResponse, the returned block will be used as the final tool result.
- Return type:
None
- remove_tool_function(tool_name)[source]¶
Remove tool function from the toolkit by its name.
- Parameters:
tool_name (str) – The name of the tool function to be removed.
- Return type:
None
- get_json_schemas()[source]¶
Get the JSON schemas from the tool functions that belong to the active groups.
Note
The preset keyword arguments is removed from the JSON schema, and the extended model is applied if it is set.
Example
Example of tool function JSON schemas¶[ { "type": "function", "function": { "name": "google_search", "description": "Search on Google.", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query." } }, "required": ["query"] } } }, ... ]
- Returns:
A list of function JSON schemas.
- Return type:
list[dict]
- set_extended_model(func_name, model)[source]¶
Set the extended model for a tool function, so that the original JSON schema will be extended.
- Parameters:
func_name (str) – The name of the tool function.
model (Union[Type[BaseModel], None]) – The extended model to be set.
- Return type:
None
- async remove_mcp_clients(client_names)[source]¶
Remove tool functions from the MCP clients by their names.
- Parameters:
client_names (list[str]) – The names of the MCP client, which used to initialize the client instance.
- Return type:
None
- async call_tool_function(tool_call)[source]¶
Execute the tool function by the ToolUseBlock and return the tool response chunk in unified streaming mode, i.e. an async generator of ToolResponse objects.
Note
The tool response chunk is accumulated.
- Parameters:
tool_call (ToolUseBlock) – A tool call block.
- Yields:
ToolResponse – The tool response chunk, in accumulative manner.
- Return type:
AsyncGenerator[ToolResponse, None]
- async register_mcp_client(mcp_client, group_name='basic', enable_funcs=None, disable_funcs=None, preset_kwargs_mapping=None, postprocess_func=None)[source]¶
Register tool functions from an MCP client.
- Parameters:
mcp_client (MCPClientBase) – The MCP client instance to connect to the MCP server.
group_name (str, defaults to “basic”) – The group name that the tool functions will be added to.
enable_funcs (list[str] | None, optional) – The functions to be added into the toolkit. If None, all tool functions within the MCP servers will be added.
disable_funcs (list[str] | None, optional) – The functions that will be filtered out. If None, no tool functions will be filtered out.
preset_kwargs_mapping (dict[str, dict[str, Any]] | None) – (Optional[dict[str, dict[str, Any]]], defaults to None): The preset keyword arguments mapping, whose keys are the tool function names and values are the preset keyword arguments.
postprocess_func (Callable[[ToolUseBlock, ToolResponse], ToolResponse | None] | None, optional) – A post-processing function that will be called after the tool function is executed, taking the tool call block and tool response as arguments. If it returns None, the tool result will be returned as is. If it returns a ToolResponse, the returned block will be used as the final tool result.
- Return type:
None
- state_dict()[source]¶
Get the state dictionary of the toolkit.
- Returns:
A dictionary containing the active tool group names.
- Return type:
dict[str, Any]
- load_state_dict(state_dict, strict=True)[source]¶
Load the state dictionary into the toolkit.
- Parameters:
state_dict (dict) – The state dictionary to load, which should have “active_groups” key and its value must be a list of group names.
strict (bool, defaults to True) – If True, raises an error if any key in the module is not found in the state_dict. If False, skips missing keys.
- Return type:
None
- get_activated_notes()[source]¶
Get the notes from the active tool groups, which can be used to construct the system prompt for the agent.
- Returns:
The combined notes from the active tool groups.
- Return type:
str
- reset_equipped_tools(**kwargs)[source]¶
Choose appropriate tools to equip yourself with, so that you can finish your task. Each argument in this function represents a group of related tools, and the value indicates whether to activate the group or not. Besides, the tool response of this function will contain the precaution notes for using them, which you MUST pay attention to and follow. You can also reuse this function to check the notes of the tool groups.
Note this function will reset the tools, so that the original tools will be removed first.
- Parameters:
kwargs (Any)
- Return type:
- class ToolResponse[source]¶
Bases:
object
The result chunk of a tool call.
- content: List[TextBlock | ImageBlock | AudioBlock]¶
The execution output of the tool function.
- metadata: dict | None = None¶
The metadata to be accessed within the agent, so that we don’t need to parse the tool result block.
- stream: bool = False¶
Whether the tool output is streamed.
- __init__(content, metadata=None, stream=False, is_last=True, is_interrupted=False, id=<factory>)¶
- Parameters:
content (List[TextBlock | ImageBlock | AudioBlock])
metadata (dict | None)
stream (bool)
is_last (bool)
is_interrupted (bool)
id (str)
- Return type:
None
- is_last: bool = True¶
Whether this is the last response in a stream tool execution.
- is_interrupted: bool = False¶
Whether the tool execution is interrupted.
- id: str¶
The identity of the tool response.
- async execute_python_code(code, timeout=300, **kwargs)[source]¶
Execute the given python code in a temp file and capture the return code, standard output and error. Note you must print the output to get the result, and the tmp file will be removed right after the execution.
- Parameters:
code (str) – The Python code to be executed.
timeout (float, defaults to 300) – The maximum time (in seconds) allowed for the code to run.
kwargs (Any)
- Returns:
The response containing the return code, standard output, and standard error of the executed code.
- Return type:
ToolResponse
- async execute_shell_command(command, timeout=300, **kwargs)[source]¶
Execute given command and return the return code, standard output and error within <returncode></returncode>, <stdout></stdout> and <stderr></stderr> tags.
- Parameters:
command (str) – The shell command to execute.
timeout (float, defaults to 300) – The maximum time (in seconds) allowed for the command to run.
kwargs (Any)
- Returns:
The tool response containing the return code, standard output, and standard error of the executed command.
- Return type:
ToolResponse
- async view_text_file(file_path, ranges=None)[source]¶
View the file content in the specified range with line numbers. If ranges is not provided, the entire file will be returned.
- Parameters:
file_path (str) – The target file path.
ranges (list[int] | None) – The range of lines to be viewed (e.g. lines 1 to 100: [1, 100]), inclusive. If not provided, the entire file will be returned. To view the last 100 lines, use [-100, -1].
- Returns:
The tool response containing the file content or an error message.
- Return type:
ToolResponse
- async write_text_file(file_path, content, ranges=None)[source]¶
Create/Replace/Overwrite content in a text file. When ranges is provided, the content will be replaced in the specified range. Otherwise, the entire file (if exists) will be overwritten.
- Parameters:
file_path (str) – The target file path.
content (str) – The content to be written.
ranges (list[int] | None, defaults to None) – The range of lines to be replaced. If None, the entire file will be overwritten.
- Returns:
The tool response containing the result of the writing operation.
- Return type:
ToolResponse
- async insert_text_file(file_path, content, line_number)[source]¶
Insert the content at the specified line number in a text file.
- Parameters:
file_path (str) – The target file path.
content (str) – The content to be inserted.
line_number (int) – The line number at which the content should be inserted, starting from 1. If exceeds the number of lines in the file, it will be appended to the end of the file.
- Returns:
The tool response containing the result of the insertion operation.
- Return type:
ToolResponse
- dashscope_text_to_image(prompt, api_key, n=1, size='1024*1024', model='wanx-v1', use_base64=False)[source]¶
Generate image(s) based on the given prompt, and return image url(s) or base64 data.
- Parameters:
prompt (str) – The text prompt to generate image.
api_key (str) – The api key for the dashscope api.
n (int, defaults to 1) – The number of images to generate.
size (Literal[“1024*1024”, “720*1280”, “1280*720”], defaults to “1024*1024”) – Size of the image.
model (str, defaults to ‘“wanx-v1”’) – The model to use, such as “wanx-v1”, “qwen-image”, “wan2.2-t2i-flash”, etc.
use_base64 (bool, defaults to ‘False’) – Whether to use base64 data for images.
- Returns:
- A ToolResponse containing the generated content
(ImageBlock/TextBlock/AudioBlock) or error information if the operation failed.
- Return type:
ToolResponse
- dashscope_text_to_audio(text, api_key, model='sambert-zhichu-v1', sample_rate=48000)[source]¶
Convert the given text to audio.
- Parameters:
text (str) – The text to be converted into audio.
api_key (str) – The api key for the dashscope API.
model (str, defaults to ‘sambert-zhichu-v1’) – The model to use. Full model list can be found in https://help.aliyun.com/zh/dashscope/model-list
sample_rate (int, defaults to 48000) – Sample rate of the audio.
- Returns:
- A ToolResponse containing the generated content
(ImageBlock/TextBlock/AudioBlock) or error information if the operation failed.
- Return type:
ToolResponse
- dashscope_image_to_text(image_urls, api_key, prompt='Describe the image', model='qwen-vl-plus')[source]¶
Generate text based on the given images.
- Parameters:
image_urls (str | Sequence[str]) – The url of single or multiple images.
api_key (str) – The api key for the dashscope api.
prompt (str, defaults to ‘Describe the image’) – The text prompt.
model (str, defaults to ‘qwen-vl-plus’) – The model to use in DashScope MultiModal API.
- Returns:
- A ToolResponse containing the generated content
(ImageBlock/TextBlock/AudioBlock) or error information if the operation failed.
- Return type:
ToolResponse
- openai_text_to_image(prompt, api_key, n=1, model='dall-e-2', size='256x256', quality='auto', style='vivid', response_format='url')[source]¶
Generate image(s) based on the given prompt, and return image URL(s) or base64 data.
- Parameters:
prompt (str) – The text prompt to generate images.
api_key (str) – The API key for the OpenAI API.
n (int, defaults to 1) – The number of images to generate.
model (Literal[“dall-e-2”, “dall-e-3”], defaults to “dall-e-2”) – The model to use for image generation.
size (Literal[“256x256”, “512x512”, “1024x1024”, “1792x1024”, “1024x1792”], defaults to “256x256”) –
- The size of the generated images.
Must be one of 1024x1024, 1536x1024 (landscape), 1024x1536 ( portrait), or auto (default value) for gpt-image-1,
- one of 256x256, 512x512, or 1024x1024 for dall-e-2,
and one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3.
quality (Literal[“auto”, “standard”, “hd”, “high”, “medium”, “low”], defaults to “auto”) –
The quality of the image that will be generated.
- auto (default value) will automatically select the best
quality for the given model.
high, medium and low are supported for gpt-image-1.
hd and standard are supported for dall-e-3.
standard is the only option for dall-e-2.
style (Literal[“vivid”, “natural”], defaults to “vivid”) –
The style of the generated images. This parameter is only supported for dall-e-3. Must be one of vivid or natural. - Vivid causes the model to lean towards generating hyper-real
and dramatic images.
- Natural causes the model to produce more natural,
less hyper-real looking images.
response_format (Literal[“url”, “b64_json”], defaults to “url”) –
- The format in which generated images with dall-e-2 and dall-e-3
are returned.
Must be one of “url” or “b64_json”.
- URLs are only valid for 60 minutes after the image has been
generated.
- This parameter isn’t supported for gpt-image-1 which will always
return base64-encoded images.
- Returns:
A ToolResponse containing the generated content (ImageBlock/TextBlock/AudioBlock) or error information if the operation failed.
- Return type:
ToolResponse
- openai_text_to_audio(text, api_key, model='tts-1', voice='alloy', speed=1.0, res_format='mp3')[source]¶
Convert text to an audio file using a specified model and voice.
- Parameters:
text (str) – The text to convert to audio.
api_key (str) – The API key for the OpenAI API.
model (Literal[“tts-1”, “tts-1-hd”], defaults to “tts-1”) – The model to use for text-to-speech conversion.
voice (Literal[“alloy”, “echo”, “fable”, “onyx”, “nova”, “shimmer”], defaults to “alloy”) – The voice to use for the audio output.
speed (float, defaults to 1.0) – The speed of the audio playback. A value of 1.0 is normal speed.
(`Literal["mp3" (res_format) –
"wav"
"opus"
"aac"
"flac"
"wav"
"pcm"]`
res_format (Literal['mp3', 'opus', 'aac', 'flac', 'wav', 'pcm'])
- Return type:
:param : :param defaults to “mp3”): The format of the audio file.
- Returns:
- A ToolResponse containing the generated content
(ImageBlock/TextBlock/AudioBlock) or error information if the operation failed.
- Return type:
ToolResponse
- Parameters:
text (str)
api_key (str)
model (Literal['tts-1', 'tts-1-hd', 'gpt-4o-mini-tts'])
voice (Literal['alloy', 'ash', 'ballad', 'coral', 'echo', 'fable', 'nova', 'onyx', 'sage', 'shimmer'])
speed (float)
res_format (Literal['mp3', 'opus', 'aac', 'flac', 'wav', 'pcm'])
- openai_edit_image(image_url, prompt, api_key, model='dall-e-2', mask_url=None, n=1, size='256x256', response_format='url')[source]¶
Edit an image based on the provided mask and prompt, and return the edited image URL(s) or base64 data.
- Parameters:
image_url (str) – The file path or URL to the image that needs editing.
prompt (str) – The text prompt describing the edits to be made to the image.
api_key (str) – The API key for the OpenAI API.
model (Literal[“dall-e-2”, “gpt-image-1”], defaults to “dall-e-2”) – The model to use for image generation.
mask_url (str | None, defaults to None) – The file path or URL to the mask image that specifies the regions to be edited.
n (int, defaults to 1) – The number of edited images to generate.
size (Literal[“256x256”, “512x512”, “1024x1024”], defaults to “256x256”) – The size of the edited images.
response_format (Literal[“url”, “b64_json”], defaults to “url”) –
The format in which generated images are returned. Must be one of “url” or “b64_json”. URLs are only valid for 60 minutes after generation.
- This parameter isn’t supported for gpt-image-1 which will
always return base64-encoded images.
- Returns:
- A ToolResponse containing the generated content
(ImageBlock/TextBlock/AudioBlock) or error information if the operation failed.
- Return type:
ToolResponse
- openai_create_image_variation(image_url, api_key, n=1, model='dall-e-2', size='256x256', response_format='url')[source]¶
Create variations of an image and return the image URL(s) or base64 data.
- Parameters:
image_url (str) – The file path or URL to the image from which variations will be generated.
api_key (str) – The API key for the OpenAI API.
n (int, defaults to 1) – The number of image variations to generate.
model (` Literal[“dall-e-2”]`, default to dall-e-2) – The model to use for image variation.
size (Literal[“256x256”, “512x512”, “1024x1024”], defaults to “256x256”) – The size of the generated image variations.
response_format (Literal[“url”, “b64_json”], defaults to “url”) – The format in which generated images are returned. Must be one of url or b64_json. URLs are only valid for 60 minutes after the image has been generated.
- Returns:
- A ToolResponse containing the generated content
(ImageBlock/TextBlock/AudioBlock) or error information if the operation failed.
- Return type:
ToolResponse
- openai_image_to_text(image_urls, api_key, prompt='Describe the image', model='gpt-4o')[source]¶
Generate descriptive text for given image(s) using a specified model, and return the generated text.
- Parameters:
image_urls (str | list[str]) – The URL or list of URLs pointing to the images that need to be described.
api_key (str) – The API key for the OpenAI API.
prompt (str, defaults to “Describe the image”) – The prompt that instructs the model on how to describe the image(s).
model (str, defaults to “gpt-4o”) – The model to use for generating the text descriptions.
- Returns:
- A ToolResponse containing the generated content
(ImageBlock/TextBlock/AudioBlock) or error information if the operation failed.
- Return type:
ToolResponse
- openai_audio_to_text(audio_file_url, api_key, language='en', temperature=0.2)[source]¶
Convert an audio file to text using OpenAI’s transcription service.
- Parameters:
audio_file_url (str) – The file path or URL to the audio file that needs to be transcribed.
api_key (str) – The API key for the OpenAI API.
language (str, defaults to “en”) – The language of the input audio in ISO-639-1 format (e.g., “en”, “zh”, “fr”). Improves accuracy and latency.
temperature (float, defaults to 0.2) – The temperature for the transcription, which affects the randomness of the output.
- Returns:
- A ToolResponse containing the generated content
(ImageBlock/TextBlock/AudioBlock) or error information if the operation failed.
- Return type:
ToolResponse