Response Parser

Table of Contents

Background

In the process of building LLM-empowered application, parsing the LLM generated string into a specific format and extracting the required information is a very important step. However, due to the following reasons, this process is also a very complex process:

  1. Diversity: The target format of parsing is diverse, and the information to be extracted may be a specific text, a JSON object, or a complex data structure.

  2. Complexity: The result parsing is not only to convert the text generated by LLM into the target format, but also involves a series of issues such as prompt engineering (reminding LLM what format of output should be generated), error handling, etc.

  3. Flexibility: Even in the same application, different stages may also require the agent to generate output in different formats.

For the convenience of developers, AgentScope provides a parser module to help developers parse LLM response into a specific format. By using the parser module, developers can easily parse the response into the target format by simple configuration, and switch the target format flexibly.

In AgentScope, the parser module features

  1. Flexibility: Developers can flexibly set the required format, flexibly switch the parser without modifying the code of agent class. That is, the specific “target format” and the agent’s reply function are decoupled.

  2. Freedom: The format instruction, result parsing and prompt engineering are all explicitly finished in the reply function. Developers and users can freely choose to use the parser or parse LLM response by their own code.

  3. Transparency: When using the parser, the process and results of prompt construction are completely visible and transparent to developers in the reply function, and developers can precisely debug their applications.

Parser Module

Overview

The main functions of the parser module include:

  1. Provide “format instruction”, that is, remind LLM where to generate what output, for example

You should generate python code in a fenced code block as follows
```python
{your_python_code}
```
  1. Provide a parse function, which directly parses the text generated by LLM into the target data format,

  2. Post-processing for dictionary format. After parsing the text into a dictionary, different fields may have different uses.

AgentScope provides multiple built-in parsers, and developers can choose according to their needs.

Target Format

Parser Class

Description

String

MarkdownCodeBlockParser

Requires LLM to generate specified text within a Markdown code block marked by ```. The result is a string.

Dictionary

MarkdownJsonDictParser

Requires LLM to produce a specified dictionary within the code block marked by ```json and ```. The result is a Python dictionary.

MultiTaggedContentParser

Requires LLM to generate specified content within multiple tags. Contents from different tags will be parsed into a single Python dictionary with different key-value pairs.

RegexTaggedContentParser

For uncertain tag names and quantities, allows users to modify regular expressions, and the return result is a dictionary.

JSON / Python Object Type

MarkdownJsonObjectParser

Requires LLM to produce specified content within the code block marked by ```json and ```. The result will be converted into a Python object via json.loads.

NOTE: Compared to MarkdownJsonDictParser, MultiTaggedContentParser is more suitable for weak LLMs and when the required format is too complex. For example, when LLM is required to generate Python code, if the code is returned directly within a dictionary, LLM needs to be aware of escaping characters (\t, \n, …), and the differences between double and single quotes when calling json.loads

In contrast, MultiTaggedContentParser guides LLM to generate each key-value pair separately in individual tags and then combines them into a dictionary, thus reducing the difficulty.

NOTE: The built-in strategies to construct format instruction just provide some examples. In AgentScope, developer has complete control over prompt construction. So they can choose not to use the format instruction provided by parsers, customizing their format instruction by hand or implementing new parser class are all feasible.

In the following sections, we will introduce the usage of these parsers based on different target formats.

String Type

MarkdownCodeBlockParser

Initialization
  • MarkdownCodeBlockParser requires LLM to generate specific text within a specified code block in Markdown format. Different languages can be specified with the language_name parameter to utilize the large model’s ability to produce corresponding outputs. For example, when asking the large model to produce Python code, initialize as follows:

    from agentscope.parsers import MarkdownCodeBlockParser
    
    parser = MarkdownCodeBlockParser(language_name="python", content_hint="your python code")
    
Format Instruction Template
  • MarkdownCodeBlockParser provides the following format instruction template. When the user calls the format_instruction attribute, {language_name} will be replaced with the string entered at initialization:

    You should generate {language_name} code in a {language_name} fenced code block as follows:
    ```{language_name}
    {content_hint}
    ```
    
  • For the above initialization with language_name as "python", when the format_instruction attribute is called, the following string will be returned:

    print(parser.format_instruction)
    
    You should generate python code in a python fenced code block as follows
    ```python
    your python code
    ```
    
Parse Function
  • MarkdownCodeBlockParser provides a parse method to parse the text generated by LLM。Its input and output are both ModelResponse objects, and the parsing result will be mounted on the parsed attribute of the output object.

    res = parser.parse(
        ModelResponse(
            text="""The following is generated python code
    ```python
    print("Hello world!")
    ```
    """
        )
    )
    
    print(res.parsed)
    
    print("hello world!")
    

Dictionary Type

Different from string and general JSON/Python object, as a powerful format in LLM applications, AgentScope provides additional post-processing functions for dictionary type. When initializing the parser, you can set the keys_to_content, keys_to_memory, and keys_to_metadata parameters to achieve filtering of key-value pairs when calling the parser’s to_content, to_memory, and to_metadata methods.

  • keys_to_content specifies the key-value pairs that will be placed in the content field of the returned Msg object. The content field will be returned to other agents, participate in their prompt construction, and will also be called by the self.speak function for display.

  • keys_to_memory specifies the key-value pairs that will be stored in the memory of the agent.

  • keys_to_metadata specifies the key-value pairs that will be placed in the metadata field of the returned Msg object, which can be used for application control flow judgment, or mount some information that does not need to be returned to other agents.

The three parameters receive bool values, string and a list of strings. The meaning of their values is as follows:

  • False: The corresponding filter function will return None.

  • True: The whole dictionary will be returned.

  • str: The corresponding value will be directly returned.

  • List[str]: A filtered dictionary will be returned according to the list of keys.

By default, keys_to_content and keys_to_memory are True, that is, the whole dictionary will be returned. keys_to_metadata defaults to False, that is, the corresponding filter function will return None.

For example, the dictionary generated by the werewolf in the daytime discussion in a werewolf game. In this example,

  • "thought" should not be returned to other agents, but should be stored in the agent’s memory to ensure the continuity of the werewolf strategy;

  • "speak" should be returned to other agents and stored in the agent’s memory;

  • "finish_discussion" is used in the application’s control flow to determine whether the discussion has ended. To save tokens, this field should not be returned to other agents or stored in the agent’s memory.

    {
        "thought": "The others didn't realize I was a werewolf. I should end the discussion soon.",
        "speak": "I agree with you.",
        "finish_discussion": True
    }
    

In AgentScope, we achieve post-processing by calling the to_content, to_memory, and to_metadata methods, as shown in the following code:

  • The code for the application’s control flow, create the corresponding parser object and load it

    from agentscope.parsers import MarkdownJsonDictParser
    
    # ...
    
    agent = DictDialogAgent(...)
    
    # Take MarkdownJsonDictParser as example
    parser = MarkdownJsonDictParser(
        content_hint={
            "thought": "what you thought",
            "speak": "what you speak",
            "finish_discussion": "whether the discussion is finished"
        },
        keys_to_content="speak",
        keys_to_memory=["thought", "speak"],
        keys_to_metadata=["finish_discussion"]
    )
    
    # Load parser, which is equivalent to specifying the required format
    agent.set_parser(parser)
    
    # The discussion process
    while True:
        # ...
        x = agent(x)
        # Break the loop according to the finish_discussion field in metadata
        if x.metadata["finish_discussion"]:
            break
    
  • Filter the dictionary in the agent’s reply function

        # ...
        def reply(self, x: Optional[Union[Msg, Sequence[Msg]]] = None) -> Msg:
    
            # ...
            res = self.model(prompt, parse_func=self.parser.parse)
    
            # Story the thought and speak fields into memory
            self.memory.add(
                Msg(
                    self.name,
                    content=self.parser.to_memory(res.parsed),
                    role="assistant",
                )
            )
    
            # Store in content and metadata fields in the returned Msg object
            msg = Msg(
              self.name,
              content=self.parser.to_content(res.parsed),
              role="assistant",
              metadata=self.parser.to_metadata(res.parsed),
            )
            self.speak(msg)
    
            return msg
    

Note: keys_to_content, keys_to_memory, and keys_to_metadata parameters can be a string, a list of strings, or a bool value.

  • For True, the to_content, to_memory, and to_metadata methods will directly return the whole dictionary.

  • For False, the to_content, to_memory, and to_metadata methods will directly return None.

  • For a string, the to_content, to_memory, and to_metadata methods will directly extract the corresponding value. For example, if keys_to_content="speak", the to_content method will put res.parsed["speak"] into the content field of the Msg object, and the content field will be a string rather than a dictionary.

  • For a list of string, the to_content, to_memory, and to_metadata methods will filter the dictionary according to the list of keys.

      parser = MarkdownJsonDictParser(
         content_hint={
             "thought": "what you thought",
             "speak": "what you speak",
         },
         keys_to_content="speak",
         keys_to_memory=["thought", "speak"],
      )
    
      example_dict = {"thought": "abc", "speak": "def"}
      print(parser.to_content(example_dict))   # def
      print(parser.to_memory(example_dict))    # {"thought": "abc", "speak": "def"}
      print(parser.to_metadata(example_dict))  # None
    
    def
    {"thought": "abc", "speak": "def"}
    None
    

Parsers

For dictionary type return values, AgentScope provides multiple parsers for developers to choose from according to their needs.

RegexTaggedContentParser
Initialization

RegexTaggedContentParser is designed for scenarios where 1) the tag name is uncertain, and 2) the number of tags is uncertain. In this case, the parser cannot provide a general response format instruction, so developers need to provide the corresponding response format instruction (format_instruction) when initializing. Of course, the developers can handle the prompt engineering by themselves optionally.

from agentscope.parsers import RegexTaggedContentParser

parser = RegexTaggedContentParser(
    format_instruction="""Respond with specific tags as outlined below
<thought>what you thought</thought>
<speak>what you speak</speak>
""",
    try_parse_json=True,                    # Try to parse the content of the tag as JSON object
    required_keys=["thought", "speak"]      # Required keys in the returned dictionary
)
MarkdownJsonDictParser
Initialization & Format Instruction Template
  • MarkdownJsonDictParser requires LLM to generate dictionary within a code block fenced by ```json and ``` tags.

  • Except keys_to_content, keys_to_memory and keys_to_metadata, the content_hint parameter can be provided to give an example and explanation of the response result, that is, to remind LLM where and what kind of dictionary should be generated. This parameter can be a string or a dictionary. For dictionary, it will be automatically converted to a string when constructing the format instruction.

    from agentscope.parsers import MarkdownJsonDictParser
    
    # dictionary as content_hint
    MarkdownJsonDictParser(
        content_hint={
          "thought": "what you thought",
          "speak": "what you speak",
        }
    )
    # or string as content_hint
    MarkdownJsonDictParser(
        content_hint="""{
      "thought": "what you thought",
      "speak": "what you speak",
    }"""
    )
    
    • The corresponding instruction_format attribute

    You should respond a json object in a json fenced code block as follows:
    ```json
    {content_hint}
    ```
    
Validation

The content_hint parameter in MarkdownJsonDictParser also supports type validation based on Pydantic. When initializing, you can set content_hint to a Pydantic model class, and AgentScope will modify the instruction_format attribute based on this class. Besides, Pydantic will be used to validate the dictionary returned by LLM during parsing.

A simple example is as follows, where "..." can be filled with specific type validation rules, which can be referred to the Pydantic documentation.

from pydantic import BaseModel, Field
from agentscope.parsers import MarkdownJsonDictParser

class Schema(BaseModel):
    thought: str = Field(..., description="what you thought")
    speak: str = Field(..., description="what you speak")
    end_discussion: bool = Field(..., description="whether the discussion is finished")

parser = MarkdownJsonDictParser(content_hint=Schema)
  • The corresponding instruction_format attribute

Respond a JSON dictionary in a markdown's fenced code block as follows:
```json
{a_JSON_dictionary}
```
The generated JSON dictionary MUST follow this schema:
{'properties': {'speak': {'description': 'what you speak', 'title': 'Speak', 'type': 'string'}, 'thought': {'description': 'what you thought', 'title': 'Thought', 'type': 'string'}, 'end_discussion': {'description': 'whether the discussion reached an agreement or not', 'title': 'End Discussion', 'type': 'boolean'}}, 'required': ['speak', 'thought', 'end_discussion'], 'title': 'Schema', 'type': 'object'}
  • During the parsing process, Pydantic will be used for type validation, and an exception will be thrown if the validation fails. Meanwhile, Pydantic also provides some fault tolerance capabilities, such as converting the string "true" to Python’s True:

parser.parser("""
```json
{
  "thought": "The others didn't realize I was a werewolf. I should end the discussion soon.",
  "speak": "I agree with you.",
  "end_discussion": "true"
}
```
""")
MultiTaggedContentParser

MultiTaggedContentParser asks LLM to generate specific content within multiple tag pairs. The content from different tag pairs will be parsed into a single Python dictionary. Its usage is similar to MarkdownJsonDictParser, but the initialization method is different, and it is more suitable for weak LLMs or complex return content.

Initialization & Format Instruction Template

Within MultiTaggedContentParser, each tag pair will be specified by as TaggedContent object, which contains

  • Tag name (name), the key value in the returned dictionary

  • Start tag (tag_begin)

  • Hint for content (content_hint)

  • End tag (tag_end)

  • Content parsing indication (parse_json), default as False. When set to True, the parser will automatically add hint that requires JSON object between the tags, and its extracted content will be parsed into a Python object via json.loads

from agentscope.parsers import MultiTaggedContentParser, TaggedContent
parser = MultiTaggedContentParser(
  TaggedContent(
    name="thought",
    tag_begin="[THOUGHT]",
    content_hint="what you thought",
    tag_end="[/THOUGHT]"
  ),
  TaggedContent(
    name="speak",
    tag_begin="[SPEAK]",
    content_hint="what you speak",
    tag_end="[/SPEAK]"
  ),
  TaggedContent(
    name="finish_discussion",
    tag_begin="[FINISH_DISCUSSION]",
    content_hint="true/false, whether the discussion is finished",
    tag_end="[/FINISH_DISCUSSION]",
    parse_json=True,         # we expect the content of this field to be parsed directly into a Python boolean value
  )
)

print(parser.format_instruction)
Respond with specific tags as outlined below, and the content between [FINISH_DISCUSSION] and [/FINISH_DISCUSSION] MUST be a JSON object:
[THOUGHT]what you thought[/THOUGHT]
[SPEAK]what you speak[/SPEAK]
[FINISH_DISCUSSION]true/false, whether the discussion is finished[/FINISH_DISCUSSION]
Parse Function
  • MultiTaggedContentParser’s parsing result is a dictionary, whose keys are the value of name in the TaggedContent objects. The following is an example of parsing the LLM response in the werewolf game:

res_dict = parser.parse(
    ModelResponse(
        text="""As a werewolf, I should keep pretending to be a villager
[THOUGHT]The others didn't realize I was a werewolf. I should end the discussion soon.[/THOUGHT]
[SPEAK]I agree with you.[/SPEAK]
[FINISH_DISCUSSION]true[/FINISH_DISCUSSION]"""
    )
)

print(res_dict)
{
  "thought": "The others didn't realize I was a werewolf. I should end the discussion soon.",
  "speak": "I agree with you.",
  "finish_discussion": true
}

JSON / Python Object Type

MarkdownJsonObjectParser

MarkdownJsonObjectParser also uses the ```json and ``` tags in Markdown, but does not limit the content type. It can be a list, dictionary, number, string, etc., which can be parsed into a Python object via json.loads.

Initialization & Format Instruction Template
from agentscope.parsers import MarkdownJsonObjectParser

parser = MarkdownJsonObjectParser(
  content_hint="{A list of numbers.}"
)

print(parser.format_instruction)
You should respond a json object in a json fenced code block as follows:
```json
{a list of numbers}
```
Parse Function
res = parser.parse(
    ModelResponse(
        text="""Yes, here is the generated list
```json
[1,2,3,4,5]
```
""")
)

print(type(res))
print(res)
<class 'list'>
[1, 2, 3, 4, 5]

Typical Use Cases

WereWolf Game

Werewolf game is a classic use case of dictionary parser. In different stages of the game, the same agent needs to generate different identification fields in addition to "thought" and "speak", such as whether the discussion is over, whether the seer uses its ability, whether the witch uses the antidote and poison, and voting.

AgentScope has built-in examples of werewolf game, which uses DictDialogAgent class and different parsers to achieve flexible target format switching. By using the post-processing function of the parser, it separates “thought” and “speak”, and controls the progress of the game successfully. More details can be found in the werewolf game source code.

ReAct Agent and Tool Usage

ReActAgent is an agent class built for tool usage in AgentScope, based on the ReAct algorithm, and can be used with different tool functions. The tool call, format parsing, and implementation of ReActAgent are similar to the parser. For detailed implementation, please refer to the source code.

Customized Parser

AgentScope provides a base class ParserBase for parsers. Developers can inherit this base class, and implement the format_instruction attribute and parse method to create their own parser.

For dictionary type parsing, you can also inherit the agentscope.parser.DictFilterMixin class to implement post-processing for dictionary type.

from abc import ABC, abstractmethod

from agentscope.models import ModelResponse


class ParserBase(ABC):
    """The base class for model response parser."""

    format_instruction: str
    """The instruction for the response format."""

    @abstractmethod
    def parse(self, response: ModelResponse) -> ModelResponse:
        """Parse the response text to a specific object, and stored in the
        parsed field of the response object."""

    # ...