检索增强生成(RAG)

Agentscope 内置了对检索增强生成(RAG)的支持。AgentScope 中与 RAG 相关的两个关键模块是:Knowledge 和 KnowledgeBank。

创建和使用知识(Knowledge)实例

虽然 Knowledge 是一个基类,但 AgentScope 中目前有一个具体的内置知识类。(在线搜索会知识类会很快更新。)

  • LlamaIndexKnowledge:旨在与最流行的 RAG 库之一 LlamaIndex 协同工作,用作本地知识,并通过配置支持 LlamaIndex 的大部分功能。

创建一个 LlamaIndexKnowledge 实例

快速创建一个 LlamaIndexKnowledge 实例的方法是使用 build_knowledge_instance 函数。 该函数需要传入三个参数:

knowledge_id:该知识实例的唯一标识符

data_dirs_and_types:一个字典,其键为数据所在目录的字符串,值为数据文件的扩展名

emb_model_config_name:AgentScope 中embedding模型配置的名称(需要在 AgentScope 中预先初始化)

一个简单的例子如下。

import os
import agentscope
from agentscope.rag.llama_index_knowledge import LlamaIndexKnowledge

agentscope.init(
    model_configs=[
        {
            "model_type": "dashscope_text_embedding",
            "config_name": "qwen_emb_config",
            "model_name": "text-embedding-v2",
            "api_key": os.getenv("DASHSCOPE_API_KEY"),
        },
    ],
)

local_knowledge = LlamaIndexKnowledge.build_knowledge_instance(
    knowledge_id="agentscope_qa",
    data_dirs_and_types={"./": [".md"]},
    emb_model_config_name="qwen_emb_config",
)


nodes = local_knowledge.retrieve(
    "what is agentscope?",
    similarity_top_k=1,
)

print(f"\nThe retrieved content:\n{nodes[0].content}")
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /opt/hostedtoolcache/Python/3.9.21/x64/lib/python3.9/s
[nltk_data]     ite-packages/llama_index/core/_static/nltk_cache...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.

Parsing nodes:   0%|          | 0/17 [00:00<?, ?it/s]
Parsing nodes: 100%|██████████| 17/17 [00:00<00:00, 3220.27it/s]

Generating embeddings:   0%|          | 0/17 [00:00<?, ?it/s]
Generating embeddings:   6%|▌         | 1/17 [00:01<00:17,  1.08s/it]
Generating embeddings:  12%|█▏        | 2/17 [00:02<00:15,  1.03s/it]
Generating embeddings:  18%|█▊        | 3/17 [00:03<00:13,  1.02it/s]
Generating embeddings:  24%|██▎       | 4/17 [00:04<00:12,  1.01it/s]
Generating embeddings:  29%|██▉       | 5/17 [00:04<00:11,  1.05it/s]
Generating embeddings:  35%|███▌      | 6/17 [00:05<00:10,  1.04it/s]
Generating embeddings:  41%|████      | 7/17 [00:06<00:09,  1.07it/s]
Generating embeddings:  47%|████▋     | 8/17 [00:07<00:08,  1.09it/s]
Generating embeddings:  53%|█████▎    | 9/17 [00:08<00:07,  1.04it/s]
Generating embeddings:  59%|█████▉    | 10/17 [00:09<00:06,  1.06it/s]
Generating embeddings:  65%|██████▍   | 11/17 [00:10<00:05,  1.05it/s]
Generating embeddings:  71%|███████   | 12/17 [00:11<00:04,  1.06it/s]
Generating embeddings:  76%|███████▋  | 13/17 [00:12<00:03,  1.07it/s]
Generating embeddings:  82%|████████▏ | 14/17 [00:13<00:02,  1.04it/s]
Generating embeddings:  88%|████████▊ | 15/17 [00:14<00:01,  1.06it/s]
Generating embeddings:  94%|█████████▍| 16/17 [00:15<00:00,  1.04it/s]
Generating embeddings: 100%|██████████| 17/17 [00:16<00:00,  1.03it/s]
Generating embeddings: 100%|██████████| 17/17 [00:16<00:00,  1.04it/s]

The retrieved content:
关于 AgentScope
_**Q**:AgentScope 与其他代理平台/框架有什么区别?_

**A**:AgentScope 是一个面向开发者的多智能体平台,旨在简化**多智能体应用程序**的开发、部署和监控。

如果希望对数据的预处理有更多的控制, 可以将一个知识配置传递给该函数。 特别地,SimpleDirectoryReader 是 LlamaIndex 库中的一个类,而 init_argsSimpleDirectoryReader 的初始化参数。 至于数据预处理,开发者可以选择不同的 LlamaIndex transformation operations 来对数据进行预处理。

flex_knowledge_config = {
    "knowledge_id": "agentscope_qa_flex",
    "knowledge_type": "llamaindex_knowledge",
    "emb_model_config_name": "qwen_emb_config",
    "chunk_size": 1024,
    "chunk_overlap": 40,
    "data_processing": [
        {
            "load_data": {
                "loader": {
                    "create_object": True,
                    "module": "llama_index.core",
                    "class": "SimpleDirectoryReader",
                    "init_args": {
                        "input_dir": "./",
                        "required_exts": [
                            ".md",
                        ],
                    },
                },
            },
            "store_and_index": {
                "transformations": [
                    {
                        "create_object": True,
                        "module": "llama_index.core.node_parser",
                        "class": "SentenceSplitter",
                        "init_args": {
                            "chunk_size": 1024,
                        },
                    },
                ],
            },
        },
    ],
}

local_knowledge_flex = LlamaIndexKnowledge.build_knowledge_instance(
    knowledge_id="agentscope_qa_flex",
    knowledge_config=flex_knowledge_config,
)


nodes = local_knowledge.retrieve(
    "what is agentscope?",
    similarity_top_k=1,
)

print(f"\nThe retrieved content:\n{nodes[0].content}")
Parsing nodes:   0%|          | 0/17 [00:00<?, ?it/s]
Parsing nodes: 100%|██████████| 17/17 [00:00<00:00, 2811.75it/s]

Generating embeddings:   0%|          | 0/17 [00:00<?, ?it/s]
Generating embeddings:   6%|▌         | 1/17 [00:01<00:19,  1.23s/it]
Generating embeddings:  12%|█▏        | 2/17 [00:02<00:15,  1.03s/it]
Generating embeddings:  18%|█▊        | 3/17 [00:03<00:14,  1.01s/it]
Generating embeddings:  24%|██▎       | 4/17 [00:04<00:13,  1.02s/it]
Generating embeddings:  29%|██▉       | 5/17 [00:05<00:11,  1.00it/s]
Generating embeddings:  35%|███▌      | 6/17 [00:06<00:10,  1.03it/s]
Generating embeddings:  41%|████      | 7/17 [00:07<00:09,  1.02it/s]
Generating embeddings:  47%|████▋     | 8/17 [00:07<00:08,  1.03it/s]
Generating embeddings:  53%|█████▎    | 9/17 [00:08<00:07,  1.05it/s]
Generating embeddings:  59%|█████▉    | 10/17 [00:09<00:06,  1.02it/s]
Generating embeddings:  65%|██████▍   | 11/17 [00:10<00:05,  1.01it/s]
Generating embeddings:  71%|███████   | 12/17 [00:11<00:05,  1.01s/it]
Generating embeddings:  76%|███████▋  | 13/17 [00:13<00:04,  1.04s/it]
Generating embeddings:  82%|████████▏ | 14/17 [00:14<00:03,  1.03s/it]
Generating embeddings:  88%|████████▊ | 15/17 [00:14<00:01,  1.01it/s]
Generating embeddings:  94%|█████████▍| 16/17 [00:15<00:00,  1.03it/s]
Generating embeddings: 100%|██████████| 17/17 [00:16<00:00,  1.03it/s]
Generating embeddings: 100%|██████████| 17/17 [00:16<00:00,  1.01it/s]

The retrieved content:
关于 AgentScope
_**Q**:AgentScope 与其他代理平台/框架有什么区别?_

**A**:AgentScope 是一个面向开发者的多智能体平台,旨在简化**多智能体应用程序**的开发、部署和监控。

Create a Batch of Knowledge Instances

For some cases where different knowledge sources exists and require different preprocessing and/or post-process, a good strategy is to create multiple knowledge instances. Thus, we introduce KnowledgeBank to better manage the knowledge instances. One can initialize a batch of knowledge with a file of multiple knowledge configurations.

knowledge_bank = KnowledgeBank(configs=path_to_knowledge_configs_json)

或者,也可以将知识实例动态地添加到知识库中。

knowledge_bank.add_data_as_knowledge(
     knowledge_id="agentscope_tutorial_rag",
     emb_model_name="qwen_emb_config",
     data_dirs_and_types={
         "../../docs/sphinx_doc/en/source/tutorial": [".md"],
     },
 )

这里,knowledge_id 应该是唯一的。 如果开发者有他们自己的新知识类,可以预先注册该新类。

from your_knowledge import NewKnowledgeClass1, NewKnowledgeClass2
knowledge_bank = KnowledgeBank(
  configs="configs/knowledge_config.json",
  new_knowledge_types=[NewKnowledgeClass1, NewKnowledgeClass2]
)
# or
knowledge_bank.register_knowledge_type(NewKnowledgeClass2)

(拓展) 架设自己的embedding model服务

我们在此也对架设本地embedding model感兴趣的用户提供以下的样例。 以下样例基于在embedding model范围中很受欢迎的`sentence_transformers` 包(基于`transformer` 而且兼容HuggingFace和ModelScope的模型)。 这个样例中,我们会使用当下最好的文本向量模型之一`gte-Qwen2-7B-instruct`。

import datetime
import argparse

from flask import Flask
from flask import request
from sentence_transformers import SentenceTransformer

def create_timestamp(format_: str = "%Y-%m-%d %H:%M:%S") -> str:
    """Get current timestamp."""
    return datetime.datetime.now().strftime(format_)

app = Flask(__name__)

@app.route("/embedding/", methods=["POST"])
def get_embedding() -> dict:
    """Receive post request and return response"""
    json = request.get_json()

    inputs = json.pop("inputs")

    global model

    if isinstance(inputs, str):
        inputs = [inputs]

    embeddings = model.encode(inputs)

    return {
        "data": {
            "completion_tokens": 0,
            "messages": {},
            "prompt_tokens": 0,
            "response": {
                "data": [
                    {
                        "embedding": emb.astype(float).tolist(),
                    }
                    for emb in embeddings
                ],
                "created": "",
                "id": create_timestamp(),
                "model": "flask_model",
                "object": "text_completion",
                "usage": {
                    "completion_tokens": 0,
                    "prompt_tokens": 0,
                    "total_tokens": 0,
                },
            },
            "total_tokens": 0,
            "username": "",
        },
    }

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_name_or_path", type=str, required=True)
    parser.add_argument("--device", type=str, default="auto")
    parser.add_argument("--port", type=int, default=8000)
    args = parser.parse_args()

    global model

    print("setting up for embedding model....")
    model = SentenceTransformer(
        args.model_name_or_path
    )

    app.run(port=args.port)
  • 第三部:启动服务器。

python setup_ms_service.py --model_name_or_path {$PATH_TO_gte_Qwen2_7B_instruct}

测试服务是否成功启动。

from agentscope.models.post_model import PostAPIEmbeddingWrapper


model = PostAPIEmbeddingWrapper(
    config_name="test_config",
    api_url="http://127.0.0.1:8000/embedding/",
    json_args={
        "max_length": 4096,
        "temperature": 0.5
    }
)

print(model("testing"))

Total running time of the script: (0 minutes 37.369 seconds)

Gallery generated by Sphinx-Gallery