.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorial/task_tuner.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorial_task_tuner.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorial_task_tuner.py:


.. _tuner:

Tuner
=================

AgentScope provides the ``tuner`` module for training agent applications using reinforcement learning (RL).
This tutorial will guide you through how to leverage the ``tuner`` module to improve agent performance on specific tasks, including:

- Introducing the core components of the ``tuner`` module
- Demonstrating the key code required for the tuning workflow
- Showing how to configure and run the tuning process

Main Components
~~~~~~~~~~~~~~~~~~~
The ``tuner`` module introduces three core components essential for RL-based agent training:

- **Task Dataset**: A collection of tasks for training and evaluating the agent.
- **Workflow Function**: Encapsulates the agent's logic to be tuned.
- **Judge Function**: Evaluates the agent's performance on tasks and provides reward signals for tuning.

In addition, ``tuner`` provides several configuration classes for customizing the tuning process, including:

- **TunerModelConfig**: Model configurations for tuning purposes.
- **AlgorithmConfig**: Specifies the RL algorithm (e.g., GRPO, PPO) and its parameters.

Implementation
~~~~~~~~~~~~~~~~~~~
This section demonstrates how to use ``tuner`` to train a simple math agent.

Task Dataset
--------------------
The task dataset contains tasks for training and evaluating your agent.

You dataset should follow the Huggingface `datasets <https://huggingface.co/docs/datasets/quickstart>`_ format, which can be loaded with ``datasets.load_dataset``. For example:

.. code-block:: text

    my_dataset/
        ├── train.jsonl  # training samples
        └── test.jsonl   # evaluation samples

Suppose your `train.jsonl` contains:

.. code-block:: json

    {"question": "What is 2 + 2?", "answer": "4"}
    {"question": "What is 4 + 4?", "answer": "8"}

Before starting tuning, you can verify that your dataset is loaded correctly with:

.. code-block:: python

    from agentscope.tuner import DatasetConfig

    dataset = DatasetConfig(path="my_dataset", split="train")
    dataset.preview(n=2)
    # Output the first two samples to verify correct loading
    # [
    #   {
    #     "question": "What is 2 + 2?",
    #     "answer": "4"
    #   },
    #   {
    #     "question": "What is 4 + 4?",
    #     "answer": "8"
    #   }
    # ]

Workflow Function
--------------------
The workflow function defines how the agent interacts with the environment and makes decisions. All workflow functions should follow the input/output signature defined in ``agentscope.tuner.WorkflowType``.

Below is an example workflow function using a ReAct agent to answer math questions:

.. GENERATED FROM PYTHON SOURCE LINES 77-123

.. code-block:: Python


    from typing import Dict, Optional
    from agentscope.agent import ReActAgent
    from agentscope.formatter import OpenAIChatFormatter
    from agentscope.message import Msg
    from agentscope.model import ChatModelBase
    from agentscope.tuner import WorkflowOutput


    async def example_workflow_function(
        task: Dict,
        model: ChatModelBase,
        auxiliary_models: Optional[Dict[str, ChatModelBase]] = None,
    ) -> WorkflowOutput:
        """An example workflow function for tuning.

        Args:
            task (`Dict`): The task information.
            model (`ChatModelBase`): The chat model used by the agent.
            auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional
                chat models, generally used to simulate the behavior of other
                non-training agents in multi-agent scenarios.

        Returns:
            `WorkflowOutput`: The output generated by the workflow.
        """
        agent = ReActAgent(
            name="react_agent",
            sys_prompt="You are a helpful math problem solving agent.",
            model=model,
            formatter=OpenAIChatFormatter(),
        )

        response = await agent.reply(
            msg=Msg(
                "user",
                task["question"],
                role="user",
            ),  # extract question from task
        )

        return WorkflowOutput(  # return the response
            response=response,
        )


.. GENERATED FROM PYTHON SOURCE LINES 124-125

You can directly run this workflow function with a task dictionary and a ``DashScopeChatModel`` / ``OpenAIChatModel`` to test its correctness before formal training. For example:

.. GENERATED FROM PYTHON SOURCE LINES 125-142

.. code-block:: Python


    import asyncio
    import os
    from agentscope.model import DashScopeChatModel

    task = {"question": "What is 123 plus 456?", "answer": "579"}
    model = DashScopeChatModel(
        model_name="qwen-max",
        api_key=os.environ["DASHSCOPE_API_KEY"],
    )
    workflow_output = asyncio.run(example_workflow_function(task, model))
    assert isinstance(
        workflow_output.response,
        Msg,
    ), "In this example, the response should be a Msg instance."
    print("\nWorkflow response:", workflow_output.response.get_text_content())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    react_agent: To find the sum of 123 and 456, you simply add the two numbers together:

    \[ 123 + 456 = 579 \]

    So, 123 plus 456 is 579.

    Workflow response: To find the sum of 123 and 456, you simply add the two numbers together:

    \[ 123 + 456 = 579 \]

    So, 123 plus 456 is 579.


.. GENERATED FROM PYTHON SOURCE LINES 143-148

Judge Function
--------------------
The judge function evaluates the agent's performance on a given task and provides a reward signal for tuning.
All judge functions should follow the input/output signature defined in ``agentscope.tuner.JudgeType``.
Below is a simple judge function that compares the agent's response with the ground truth answer:

.. GENERATED FROM PYTHON SOURCE LINES 149-182

.. code-block:: Python


    from typing import Any
    from agentscope.tuner import JudgeOutput


    async def example_judge_function(
        task: Dict,
        response: Any,
        auxiliary_models: Optional[Dict[str, ChatModelBase]] = None,
    ) -> JudgeOutput:
        """A very simple judge function only for demonstration.

        Args:
            task (`Dict`): The task information.
            response (`Any`): The response field from the WorkflowOutput.
            auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional
                chat models for LLM-as-a-Judge purpose.
        Returns:
            `JudgeOutput`: The reward assigned by the judge.
        """
        ground_truth = task["answer"]
        reward = 1.0 if ground_truth in response.get_text_content() else 0.0
        return JudgeOutput(reward=reward)


    judge_output = asyncio.run(
        example_judge_function(
            task,
            workflow_output.response,
        ),
    )
    print(f"Judge reward: {judge_output.reward}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Judge reward: 1.0


.. GENERATED FROM PYTHON SOURCE LINES 183-248

The judge function can also be locally tested in the same way as shown above before formal training to ensure its logic is correct.

.. tip::
   You can leverage existing `MetricBase <https://github.com/agentscope-ai/agentscope/blob/main/src/agentscope/evaluate/_metric_base.py>`_ implementations in your judge function to compute more sophisticated metrics and combine them into a composite reward.

Configuration and Running
~~~~~~~~~~~~~~~
Finally, you can configure and run the tuning process using the ``tuner`` module.
Before starting, ensure that `Trinity-RFT <https://github.com/agentscope-ai/Trinity-RFT>`_ is installed in your environment, as it is required for tuning.

Below is an example of configuring and starting the tuning process:

.. note::
   This example is for demonstration only. For a complete runnable example, see `Tune ReActAgent <https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent>`_

.. code-block:: python

       from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig
       # your workflow / judge function here...

       if __name__ == "__main__":
           dataset = DatasetConfig(path="my_dataset", split="train")
           model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384)
           algorithm = AlgorithmConfig(
               algorithm_type="multi_step_grpo",
               group_size=8,
               batch_size=32,
               learning_rate=1e-6,
           )
           tune(
               workflow_func=example_workflow_function,
               judge_func=example_judge_function,
               model=model,
               train_dataset=dataset,
               algorithm=algorithm,
           )

Here, ``DatasetConfig`` configures the training dataset, ``TunerModelConfig`` sets the parameters for the trainable model, and ``AlgorithmConfig`` specifies the reinforcement learning algorithm and its hyperparameters.

.. tip::
   The ``tune`` function is based on `Trinity-RFT <https://github.com/agentscope-ai/Trinity-RFT>`_ and internally converts input parameters to a YAML configuration.
   Advanced users can skip the ``model``, ``train_dataset``, and ``algorithm`` arguments and instead provide a YAML config file path via the ``config_path`` argument.
   Using a configuration file is recommended for fine-grained control and to leverage advanced Trinity-RFT features. See the Trinity-RFT `Configuration Guide <https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html>`_ for more options.

Save the above code as ``main.py`` and run it with:

.. code-block:: bash

       ray start --head
       python main.py

Checkpoints and logs are automatically saved to the ``checkpoints/AgentScope`` directory under your workspace, with each run in a timestamped sub-directory. Tensorboard logs can be found in ``monitor/tensorboard`` within the checkpoint directory.

.. code-block:: text

       your_workspace/
           └── checkpoints/
               └──AgentScope/
                   └── Experiment-20260104185355/  # each run saved in a sub-directory with timestamp
                       ├── monitor/
                       │   └── tensorboard/  # tensorboard logs
                       └── global_step_x/    # saved model checkpoints at step x

.. tip::
   For more tuning examples, refer to the `tuner directory <https://github.com/agentscope-ai/agentscope-samples/tree/main/tuner>`_ of the AgentScope-Samples repository.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.984 seconds)


.. _sphx_glr_download_tutorial_task_tuner.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: task_tuner.ipynb <task_tuner.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: task_tuner.py <task_tuner.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: task_tuner.zip <task_tuner.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_