.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorial/task_tuner.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorial_task_tuner.py: .. _tuner: Tuner ================= AgentScope provides the ``tuner`` module for training agent applications using reinforcement learning (RL). This tutorial will guide you through how to leverage the ``tuner`` module to improve agent performance on specific tasks, including: - Introducing the core components of the ``tuner`` module - Demonstrating the key code required for the tuning workflow - Showing how to configure and run the tuning process Main Components ~~~~~~~~~~~~~~~~~~~ The ``tuner`` module introduces three core components essential for RL-based agent training: - **Task Dataset**: A collection of tasks for training and evaluating the agent. - **Workflow Function**: Encapsulates the agent's logic to be tuned. - **Judge Function**: Evaluates the agent's performance on tasks and provides reward signals for tuning. In addition, ``tuner`` provides several configuration classes for customizing the tuning process, including: - **TunerModelConfig**: Model configurations for tuning purposes. - **AlgorithmConfig**: Specifies the RL algorithm (e.g., GRPO, PPO) and its parameters. Implementation ~~~~~~~~~~~~~~~~~~~ This section demonstrates how to use ``tuner`` to train a simple math agent. Task Dataset -------------------- The task dataset contains tasks for training and evaluating your agent. You dataset should follow the Huggingface `datasets `_ format, which can be loaded with ``datasets.load_dataset``. For example: .. code-block:: text my_dataset/ ├── train.jsonl # training samples └── test.jsonl # evaluation samples Suppose your `train.jsonl` contains: .. code-block:: json {"question": "What is 2 + 2?", "answer": "4"} {"question": "What is 4 + 4?", "answer": "8"} Before starting tuning, you can verify that your dataset is loaded correctly with: .. code-block:: python from agentscope.tuner import DatasetConfig dataset = DatasetConfig(path="my_dataset", split="train") dataset.preview(n=2) # Output the first two samples to verify correct loading # [ # { # "question": "What is 2 + 2?", # "answer": "4" # }, # { # "question": "What is 4 + 4?", # "answer": "8" # } # ] Workflow Function -------------------- The workflow function defines how the agent interacts with the environment and makes decisions. All workflow functions should follow the input/output signature defined in ``agentscope.tuner.WorkflowType``. Below is an example workflow function using a ReAct agent to answer math questions: .. GENERATED FROM PYTHON SOURCE LINES 77-123 .. code-block:: Python from typing import Dict, Optional from agentscope.agent import ReActAgent from agentscope.formatter import OpenAIChatFormatter from agentscope.message import Msg from agentscope.model import ChatModelBase from agentscope.tuner import WorkflowOutput async def example_workflow_function( task: Dict, model: ChatModelBase, auxiliary_models: Optional[Dict[str, ChatModelBase]] = None, ) -> WorkflowOutput: """An example workflow function for tuning. Args: task (`Dict`): The task information. model (`ChatModelBase`): The chat model used by the agent. auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional chat models, generally used to simulate the behavior of other non-training agents in multi-agent scenarios. Returns: `WorkflowOutput`: The output generated by the workflow. """ agent = ReActAgent( name="react_agent", sys_prompt="You are a helpful math problem solving agent.", model=model, formatter=OpenAIChatFormatter(), ) response = await agent.reply( msg=Msg( "user", task["question"], role="user", ), # extract question from task ) return WorkflowOutput( # return the response response=response, ) .. GENERATED FROM PYTHON SOURCE LINES 124-125 You can directly run this workflow function with a task dictionary and a ``DashScopeChatModel`` / ``OpenAIChatModel`` to test its correctness before formal training. For example: .. GENERATED FROM PYTHON SOURCE LINES 125-142 .. code-block:: Python import asyncio import os from agentscope.model import DashScopeChatModel task = {"question": "What is 123 plus 456?", "answer": "579"} model = DashScopeChatModel( model_name="qwen-max", api_key=os.environ["DASHSCOPE_API_KEY"], ) workflow_output = asyncio.run(example_workflow_function(task, model)) assert isinstance( workflow_output.response, Msg, ), "In this example, the response should be a Msg instance." print("\nWorkflow response:", workflow_output.response.get_text_content()) .. rst-class:: sphx-glr-script-out .. code-block:: none react_agent: To find the sum of 123 and 456, you simply add the two numbers together: \[ 123 + 456 = 579 \] So, 123 plus 456 is 579. Workflow response: To find the sum of 123 and 456, you simply add the two numbers together: \[ 123 + 456 = 579 \] So, 123 plus 456 is 579. .. GENERATED FROM PYTHON SOURCE LINES 143-148 Judge Function -------------------- The judge function evaluates the agent's performance on a given task and provides a reward signal for tuning. All judge functions should follow the input/output signature defined in ``agentscope.tuner.JudgeType``. Below is a simple judge function that compares the agent's response with the ground truth answer: .. GENERATED FROM PYTHON SOURCE LINES 149-182 .. code-block:: Python from typing import Any from agentscope.tuner import JudgeOutput async def example_judge_function( task: Dict, response: Any, auxiliary_models: Optional[Dict[str, ChatModelBase]] = None, ) -> JudgeOutput: """A very simple judge function only for demonstration. Args: task (`Dict`): The task information. response (`Any`): The response field from the WorkflowOutput. auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional chat models for LLM-as-a-Judge purpose. Returns: `JudgeOutput`: The reward assigned by the judge. """ ground_truth = task["answer"] reward = 1.0 if ground_truth in response.get_text_content() else 0.0 return JudgeOutput(reward=reward) judge_output = asyncio.run( example_judge_function( task, workflow_output.response, ), ) print(f"Judge reward: {judge_output.reward}") .. rst-class:: sphx-glr-script-out .. code-block:: none Judge reward: 1.0 .. GENERATED FROM PYTHON SOURCE LINES 183-248 The judge function can also be locally tested in the same way as shown above before formal training to ensure its logic is correct. .. tip:: You can leverage existing `MetricBase `_ implementations in your judge function to compute more sophisticated metrics and combine them into a composite reward. Configuration and Running ~~~~~~~~~~~~~~~ Finally, you can configure and run the tuning process using the ``tuner`` module. Before starting, ensure that `Trinity-RFT `_ is installed in your environment, as it is required for tuning. Below is an example of configuring and starting the tuning process: .. note:: This example is for demonstration only. For a complete runnable example, see `Tune ReActAgent `_ .. code-block:: python from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig # your workflow / judge function here... if __name__ == "__main__": dataset = DatasetConfig(path="my_dataset", split="train") model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384) algorithm = AlgorithmConfig( algorithm_type="multi_step_grpo", group_size=8, batch_size=32, learning_rate=1e-6, ) tune( workflow_func=example_workflow_function, judge_func=example_judge_function, model=model, train_dataset=dataset, algorithm=algorithm, ) Here, ``DatasetConfig`` configures the training dataset, ``TunerModelConfig`` sets the parameters for the trainable model, and ``AlgorithmConfig`` specifies the reinforcement learning algorithm and its hyperparameters. .. tip:: The ``tune`` function is based on `Trinity-RFT `_ and internally converts input parameters to a YAML configuration. Advanced users can skip the ``model``, ``train_dataset``, and ``algorithm`` arguments and instead provide a YAML config file path via the ``config_path`` argument. Using a configuration file is recommended for fine-grained control and to leverage advanced Trinity-RFT features. See the Trinity-RFT `Configuration Guide `_ for more options. Save the above code as ``main.py`` and run it with: .. code-block:: bash ray start --head python main.py Checkpoints and logs are automatically saved to the ``checkpoints/AgentScope`` directory under your workspace, with each run in a timestamped sub-directory. Tensorboard logs can be found in ``monitor/tensorboard`` within the checkpoint directory. .. code-block:: text your_workspace/ └── checkpoints/ └──AgentScope/ └── Experiment-20260104185355/ # each run saved in a sub-directory with timestamp ├── monitor/ │ └── tensorboard/ # tensorboard logs └── global_step_x/ # saved model checkpoints at step x .. tip:: For more tuning examples, refer to the `tuner directory `_ of the AgentScope-Samples repository. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.984 seconds) .. _sphx_glr_download_tutorial_task_tuner.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: task_tuner.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: task_tuner.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: task_tuner.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_