Note

Go to the end to download the full example code.

Tuner¶

AgentScope provides the tuner module for training agent applications using reinforcement learning (RL). This tutorial will guide you through how to leverage the tuner module to improve agent performance on specific tasks, including:

Introducing the core components of the tuner module
Demonstrating the key code required for the tuning workflow
Showing how to configure and run the tuning process

Main Components¶

The tuner module introduces three core components essential for RL-based agent training:

Task Dataset: A collection of tasks for training and evaluating the agent.
Workflow Function: Encapsulates the agent’s logic to be tuned.
Judge Function: Evaluates the agent’s performance on tasks and provides reward signals for tuning.

In addition, tuner provides several configuration classes for customizing the tuning process, including:

TunerModelConfig: Model configurations for tuning purposes.
AlgorithmConfig: Specifies the RL algorithm (e.g., GRPO, PPO) and its parameters.

Implementation¶

This section demonstrates how to use tuner to train a simple math agent.

Task Dataset¶

The task dataset contains tasks for training and evaluating your agent.

You dataset should follow the Huggingface datasets format, which can be loaded with datasets.load_dataset. For example:

my_dataset/
    ├── train.jsonl  # training samples
    └── test.jsonl   # evaluation samples

Suppose your train.jsonl contains:

{"question": "What is 2 + 2?", "answer": "4"}
{"question": "What is 4 + 4?", "answer": "8"}

Before starting tuning, you can verify that your dataset is loaded correctly with:

from agentscope.tuner import DatasetConfig

dataset = DatasetConfig(path="my_dataset", split="train")
dataset.preview(n=2)
# Output the first two samples to verify correct loading
# [
#   {
#     "question": "What is 2 + 2?",
#     "answer": "4"
#   },
#   {
#     "question": "What is 4 + 4?",
#     "answer": "8"
#   }
# ]

Workflow Function¶

The workflow function defines how the agent interacts with the environment and makes decisions. All workflow functions should follow the input/output signature defined in agentscope.tuner.WorkflowType.

Below is an example workflow function using a ReAct agent to answer math questions:

from typing import Dict, Optional
from agentscope.agent import ReActAgent
from agentscope.formatter import OpenAIChatFormatter
from agentscope.message import Msg
from agentscope.model import ChatModelBase
from agentscope.tuner import WorkflowOutput


async def example_workflow_function(
    task: Dict,
    model: ChatModelBase,
    auxiliary_models: Optional[Dict[str, ChatModelBase]] = None,
) -> WorkflowOutput:
    """An example workflow function for tuning.

    Args:
        task (`Dict`): The task information.
        model (`ChatModelBase`): The chat model used by the agent.
        auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional
            chat models, generally used to simulate the behavior of other
            non-training agents in multi-agent scenarios.

    Returns:
        `WorkflowOutput`: The output generated by the workflow.
    """
    agent = ReActAgent(
        name="react_agent",
        sys_prompt="You are a helpful math problem solving agent.",
        model=model,
        formatter=OpenAIChatFormatter(),
    )

    response = await agent.reply(
        msg=Msg(
            "user",
            task["question"],
            role="user",
        ),  # extract question from task
    )

    return WorkflowOutput(  # return the response
        response=response,
    )

You can directly run this workflow function with a task dictionary and a DashScopeChatModel / OpenAIChatModel to test its correctness before formal training. For example:

import asyncio
import os
from agentscope.model import DashScopeChatModel

task = {"question": "What is 123 plus 456?", "answer": "579"}
model = DashScopeChatModel(
    model_name="qwen-max",
    api_key=os.environ["DASHSCOPE_API_KEY"],
)
workflow_output = asyncio.run(example_workflow_function(task, model))
assert isinstance(
    workflow_output.response,
    Msg,
), "In this example, the response should be a Msg instance."
print("\nWorkflow response:", workflow_output.response.get_text_content())

react_agent: To find the sum of 123 and 456, you simply add the two numbers together:

\[ 123 + 456 = 579 \]

So, 123 plus 456 equals 579.

Workflow response: To find the sum of 123 and 456, you simply add the two numbers together:

\[ 123 + 456 = 579 \]

So, 123 plus 456 equals 579.

Judge Function¶

The judge function evaluates the agent’s performance on a given task and provides a reward signal for tuning. All judge functions should follow the input/output signature defined in agentscope.tuner.JudgeType. Below is a simple judge function that compares the agent’s response with the ground truth answer:

from typing import Any
from agentscope.tuner import JudgeOutput


async def example_judge_function(
    task: Dict,
    response: Any,
    auxiliary_models: Optional[Dict[str, ChatModelBase]] = None,
) -> JudgeOutput:
    """A very simple judge function only for demonstration.

    Args:
        task (`Dict`): The task information.
        response (`Any`): The response field from the WorkflowOutput.
        auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional
            chat models for LLM-as-a-Judge purpose.
    Returns:
        `JudgeOutput`: The reward assigned by the judge.
    """
    ground_truth = task["answer"]
    reward = 1.0 if ground_truth in response.get_text_content() else 0.0
    return JudgeOutput(reward=reward)


judge_output = asyncio.run(
    example_judge_function(
        task,
        workflow_output.response,
    ),
)
print(f"Judge reward: {judge_output.reward}")

Judge reward: 1.0

The judge function can also be locally tested in the same way as shown above before formal training to ensure its logic is correct.

Tip

You can leverage existing MetricBase implementations in your judge function to compute more sophisticated metrics and combine them into a composite reward.

Configuration and Running¶

Finally, you can configure and run the tuning process using the tuner module. Before starting, ensure that Trinity-RFT is installed in your environment, as it is required for tuning.

Below is an example of configuring and starting the tuning process:

Note

This example is for demonstration only. For a complete runnable example, see Tune ReActAgent

from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig
# your workflow / judge function here...

if __name__ == "__main__":
    dataset = DatasetConfig(path="my_dataset", split="train")
    model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384)
    algorithm = AlgorithmConfig(
        algorithm_type="multi_step_grpo",
        group_size=8,
        batch_size=32,
        learning_rate=1e-6,
    )
    tune(
        workflow_func=example_workflow_function,
        judge_func=example_judge_function,
        model=model,
        train_dataset=dataset,
        algorithm=algorithm,
    )

Here, DatasetConfig configures the training dataset, TunerModelConfig sets the parameters for the trainable model, and AlgorithmConfig specifies the reinforcement learning algorithm and its hyperparameters.

Tip

The tune function is based on Trinity-RFT and internally converts input parameters to a YAML configuration. Advanced users can skip the model, train_dataset, and algorithm arguments and instead provide a YAML config file path via the config_path argument. Using a configuration file is recommended for fine-grained control and to leverage advanced Trinity-RFT features. See the Trinity-RFT Configuration Guide for more options.

Save the above code as main.py and run it with:

ray start --head
python main.py

Checkpoints and logs are automatically saved to the checkpoints/AgentScope directory under your workspace, with each run in a timestamped sub-directory. Tensorboard logs can be found in monitor/tensorboard within the checkpoint directory.

your_workspace/
    └── checkpoints/
        └──AgentScope/
            └── Experiment-20260104185355/  # each run saved in a sub-directory with timestamp
                ├── monitor/
                │   └── tensorboard/  # tensorboard logs
                └── global_step_x/    # saved model checkpoints at step x

Tip

For more tuning examples, refer to the tuner directory of the AgentScope-Samples repository.

Total running time of the script: (0 minutes 2.772 seconds)

Gallery generated by Sphinx-Gallery