.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "build_tutorial/rag.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_build_tutorial_rag.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_build_tutorial_rag.py:


.. _rag:

Retrieval Augmentation Generation (RAG)
==================================================================

Agentscope has built-in supports for the retrieval augmentation generation
(RAG). There are two key modules related to RAG in AgentScope: `Knowledge` and
`KnowledgeBank`.

Create and Use Knowledge Instances
----------------------------------------------

While `Knowledge` is a base class, a specific built-in knowledge class is in
the AgentScope now. (Online search is coming soon.)


- `LlamaIndexKnowledge`: Designed to work with one of the most popular RAG library `LlamaIndex <https://www.llamaindex.ai/>`_ as local knowledge, and supporting most of LlamaIndex functionality by configuration.


Create a `LlamaIndexKnowledge` instance
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A quick start to create a `LlamaIndexKnowledge` instance is to use the `build_knowledge_instance` function.
There are three parameters need to be passed to the function.

- `knowledge_id`: a unique identifier for this knowledge instance

- `data_dirs_and_types`: a dictionary whose keys are strings of directories of the data, and values are the file extensions of the data

- `emb_model_config_name`: name of the configuration of a embedding model in AgentScope (need to be initialized in AgentScope beforehand)

A simple example is as follows.

.. GENERATED FROM PYTHON SOURCE LINES 36-67

.. code-block:: Python

    import os
    import agentscope
    from agentscope.rag.llama_index_knowledge import LlamaIndexKnowledge

    agentscope.init(
        model_configs=[
            {
                "model_type": "dashscope_text_embedding",
                "config_name": "qwen_emb_config",
                "model_name": "text-embedding-v2",
                "api_key": os.getenv("DASHSCOPE_API_KEY"),
            },
        ],
    )

    local_knowledge = LlamaIndexKnowledge.build_knowledge_instance(
        knowledge_id="agentscope_qa",
        data_dirs_and_types={"./": [".md"]},
        emb_model_config_name="qwen_emb_config",
    )


    nodes = local_knowledge.retrieve(
        "what is agentscope?",
        similarity_top_k=1,
    )


    print(f"\nThe retrieved content:\n{nodes[0].content}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [nltk_data] Downloading package punkt_tab to /opt/hostedtoolcache/Pyth
    [nltk_data]     on/3.10.16/x64/lib/python3.10/site-
    [nltk_data]     packages/llama_index/core/_static/nltk_cache...
    [nltk_data]   Unzipping tokenizers/punkt_tab.zip.
    Parsing nodes:   0%|          | 0/17 [00:00<?, ?it/s]    Parsing nodes: 100%|██████████| 17/17 [00:00<00:00, 3383.95it/s]
    Generating embeddings:   0%|          | 0/17 [00:00<?, ?it/s]    Generating embeddings:   6%|▌         | 1/17 [00:01<00:18,  1.16s/it]    Generating embeddings:  12%|█▏        | 2/17 [00:02<00:15,  1.03s/it]    Generating embeddings:  18%|█▊        | 3/17 [00:03<00:14,  1.01s/it]    Generating embeddings:  24%|██▎       | 4/17 [00:04<00:12,  1.01it/s]    Generating embeddings:  29%|██▉       | 5/17 [00:05<00:12,  1.03s/it]    Generating embeddings:  35%|███▌      | 6/17 [00:06<00:10,  1.02it/s]    Generating embeddings:  41%|████      | 7/17 [00:06<00:09,  1.04it/s]    Generating embeddings:  47%|████▋     | 8/17 [00:07<00:08,  1.04it/s]    Generating embeddings:  53%|█████▎    | 9/17 [00:08<00:07,  1.00it/s]    Generating embeddings:  59%|█████▉    | 10/17 [00:10<00:07,  1.01s/it]    Generating embeddings:  65%|██████▍   | 11/17 [00:10<00:05,  1.02it/s]    Generating embeddings:  71%|███████   | 12/17 [00:11<00:04,  1.03it/s]    Generating embeddings:  76%|███████▋  | 13/17 [00:12<00:03,  1.00it/s]    Generating embeddings:  82%|████████▏ | 14/17 [00:13<00:03,  1.01s/it]    Generating embeddings:  88%|████████▊ | 15/17 [00:14<00:01,  1.01it/s]    Generating embeddings:  94%|█████████▍| 16/17 [00:15<00:00,  1.03it/s]    Generating embeddings: 100%|██████████| 17/17 [00:16<00:00,  1.01it/s]    Generating embeddings: 100%|██████████| 17/17 [00:16<00:00,  1.01it/s]
    [03/27/25 11:01:19] DEBUG    Building index from IDs objects     __init__.py:362

    The retrieved content:
    About AgentScope
    _**Q**: What's the difference between AgentScope and other agent platforms/frameworks/packages?_

    **A**: AgentScope is a developer-centric and multi-agent platform, aiming to ease the development, deployment and monitoring of **multi-agent applications**.


.. GENERATED FROM PYTHON SOURCE LINES 68-72

If one wants to have more control on how the data are preprocessing,
a knowledge configuration can be passed to the function.
Especially, `SimpleDirectoryReader` is the class in LlamaIndex library, and `init_args` is the initialization parameters of `SimpleDirectoryReader`.
As for the data preprocessing, developers can choose different LlamaIndex `transformation operations <https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/transformations/>`_ to preprocess the data.

.. GENERATED FROM PYTHON SOURCE LINES 72-124

.. code-block:: Python


    flex_knowledge_config = {
        "knowledge_id": "agentscope_qa_flex",
        "knowledge_type": "llamaindex_knowledge",
        "emb_model_config_name": "qwen_emb_config",
        "chunk_size": 1024,
        "chunk_overlap": 40,
        "data_processing": [
            {
                "load_data": {
                    "loader": {
                        "create_object": True,
                        "module": "llama_index.core",
                        "class": "SimpleDirectoryReader",
                        "init_args": {
                            "input_dir": "./",
                            "required_exts": [
                                ".md",
                            ],
                        },
                    },
                },
                "store_and_index": {
                    "transformations": [
                        {
                            "create_object": True,
                            "module": "llama_index.core.node_parser",
                            "class": "SentenceSplitter",
                            "init_args": {
                                "chunk_size": 1024,
                            },
                        },
                    ],
                },
            },
        ],
    }

    local_knowledge_flex = LlamaIndexKnowledge.build_knowledge_instance(
        knowledge_id="agentscope_qa_flex",
        knowledge_config=flex_knowledge_config,
    )


    nodes = local_knowledge.retrieve(
        "what is agentscope?",
        similarity_top_k=1,
    )

    print(f"\nThe retrieved content:\n{nodes[0].content}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Parsing nodes:   0%|          | 0/17 [00:00<?, ?it/s]    Parsing nodes: 100%|██████████| 17/17 [00:00<00:00, 3430.35it/s]
    Generating embeddings:   0%|          | 0/17 [00:00<?, ?it/s]    Generating embeddings:   6%|▌         | 1/17 [00:00<00:14,  1.07it/s]    Generating embeddings:  12%|█▏        | 2/17 [00:02<00:16,  1.10s/it]    Generating embeddings:  18%|█▊        | 3/17 [00:03<00:14,  1.02s/it]    Generating embeddings:  24%|██▎       | 4/17 [00:04<00:13,  1.02s/it]    Generating embeddings:  29%|██▉       | 5/17 [00:05<00:11,  1.01it/s]    Generating embeddings:  35%|███▌      | 6/17 [00:06<00:10,  1.01it/s]    Generating embeddings:  41%|████      | 7/17 [00:07<00:10,  1.01s/it]    Generating embeddings:  47%|████▋     | 8/17 [00:08<00:08,  1.00it/s]    Generating embeddings:  53%|█████▎    | 9/17 [00:09<00:08,  1.02s/it]    Generating embeddings:  59%|█████▉    | 10/17 [00:10<00:07,  1.04s/it]    Generating embeddings:  65%|██████▍   | 11/17 [00:11<00:06,  1.00s/it]    Generating embeddings:  71%|███████   | 12/17 [00:12<00:05,  1.01s/it]    Generating embeddings:  76%|███████▋  | 13/17 [00:13<00:04,  1.04s/it]    Generating embeddings:  82%|████████▏ | 14/17 [00:14<00:03,  1.01s/it]    Generating embeddings:  88%|████████▊ | 15/17 [00:15<00:01,  1.00it/s]    Generating embeddings:  94%|█████████▍| 16/17 [00:16<00:00,  1.04it/s]    Generating embeddings: 100%|██████████| 17/17 [00:17<00:00,  1.02it/s]    Generating embeddings: 100%|██████████| 17/17 [00:17<00:00,  1.00s/it]
    [03/27/25 11:01:37] DEBUG    Building index from IDs objects     __init__.py:362

    The retrieved content:
    About AgentScope
    _**Q**: What's the difference between AgentScope and other agent platforms/frameworks/packages?_

    **A**: AgentScope is a developer-centric and multi-agent platform, aiming to ease the development, deployment and monitoring of **multi-agent applications**.


.. GENERATED FROM PYTHON SOURCE LINES 125-159

Create a Batch of Knowledge Instances
----------------------------------------------
For some cases where different knowledge sources exists and require different preprocessing and/or post-proprocess, a good strategy is to create multiple knolwedge instances.
Thus, we introduce `KnowledgeBank` to better manage the knowledge instances. One can initialize a batch of knowledge with a file of mulltiple knodledge configurations.

.. code-block:: python

   knowledge_bank = KnowledgeBank(configs=path_to_knowledge_configs_json)


Alternatively, one can add knowledge instance dynamically to knowledge bank as well.

.. code-block:: python

  knowledge_bank.add_data_as_knowledge(
       knowledge_id="agentscope_tutorial_rag",
       emb_model_name="qwen_emb_config",
       data_dirs_and_types={
           "../../docs/sphinx_doc/en/source/tutorial": [".md"],
       },
   )

Here, the `knowledge_id` should be unique.
If developers have their new knowledge class, they can register the new class beforehand

.. code-block:: python

   from your_knowledge import NewKnowledgeClass1, NewKnowledgeClass2
   knowledge_bank = KnowledgeBank(
     configs="configs/knowledge_config.json",
     new_knowledge_types=[NewKnowledgeClass1, NewKnowledgeClass2]
   )
   # or
   knowledge_bank.register_knowledge_type(NewKnowledgeClass2)

.. GENERATED FROM PYTHON SOURCE LINES 161-274

(Optional) Setting up a local embedding model service
-----------------------------------------------------------

For those who are interested in setting up a local embedding service, we provide the following example based on the
`sentence_transformers` package, which is a popular specialized package for embedding models (based on the `transformer` package and compatible with both HuggingFace and ModelScope models).
In this example, we will use one of the SOTA embedding models, `gte-Qwen2-7B-instruct`.

* Step 1: Follow the instruction on `HuggingFace <https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct>`_ or `ModelScope <https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct >"_ to download the embedding model.
  (For those who cannot access HuggingFace directly, you may want to use a HuggingFace mirror by running a bash command
    `export HF_ENDPOINT=https://hf-mirror.com` or add a line of code `os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"` in your Python code.)
* Step 2: Set up the server. The following code is for reference.

.. code-block:: python

    import datetime
    import argparse

    from flask import Flask
    from flask import request
    from sentence_transformers import SentenceTransformer

    def create_timestamp(format_: str = "%Y-%m-%d %H:%M:%S") -> str:
        """Get current timestamp."""
        return datetime.datetime.now().strftime(format_)

    app = Flask(__name__)

    @app.route("/embedding/", methods=["POST"])
    def get_embedding() -> dict:
        """Receive post request and return response"""
        json = request.get_json()

        inputs = json.pop("inputs")

        global model

        if isinstance(inputs, str):
            inputs = [inputs]

        embeddings = model.encode(inputs)

        return {
            "data": {
                "completion_tokens": 0,
                "messages": {},
                "prompt_tokens": 0,
                "response": {
                    "data": [
                        {
                            "embedding": emb.astype(float).tolist(),
                        }
                        for emb in embeddings
                    ],
                    "created": "",
                    "id": create_timestamp(),
                    "model": "flask_model",
                    "object": "text_completion",
                    "usage": {
                        "completion_tokens": 0,
                        "prompt_tokens": 0,
                        "total_tokens": 0,
                    },
                },
                "total_tokens": 0,
                "username": "",
            },
        }

    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("--model_name_or_path", type=str, required=True)
        parser.add_argument("--device", type=str, default="auto")
        parser.add_argument("--port", type=int, default=8000)
        args = parser.parse_args()

        global model

        print("setting up for embedding model....")
        model = SentenceTransformer(
            args.model_name_or_path
        )

        app.run(port=args.port)


* Step 3: start server.

.. code-block:: bash

    python setup_ms_service.py --model_name_or_path {$PATH_TO_gte_Qwen2_7B_instruct}


Testing whether the model is running successfully.

.. code-block:: python

    from agentscope.models.post_model import PostAPIEmbeddingWrapper


    model = PostAPIEmbeddingWrapper(
        config_name="test_config",
        api_url="http://127.0.0.1:8000/embedding/",
        json_args={
            "max_length": 4096,
            "temperature": 0.5
        }
    )

    print(model("testing"))


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 38.224 seconds)


.. _sphx_glr_download_build_tutorial_rag.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: rag.ipynb <rag.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: rag.py <rag.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: rag.zip <rag.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_