agentscope.rag.llama_index_knowledge module

This module is an integration of the Llama index RAG into AgentScope package

class agentscope.rag.llama_index_knowledge.LlamaIndexKnowledge(knowledge_id: str, emb_model: ModelWrapperBase | BaseEmbedding | None = None, knowledge_config: dict | None = None, model: ModelWrapperBase | None = None, persist_root: str | None = None, overwrite_index: bool | None = False, showprogress: bool | None = True, **kwargs: Any)[source]

Bases: Knowledge

This class is a wrapper with the llama index RAG.

initialize the knowledge component based on the llama-index framework: https://github.com/run-llama/llama_index

Notes

In LlamaIndex, one of the most important concepts is index, which is a data structure composed of Document objects, designed to enable querying by an LLM. The core workflow of initializing RAG is to convert data to index, and retrieve information from index. For example: 1) preprocessing documents with data loaders 2) generate embedding by configuring pipline with embedding models 3) store the embedding-content to vector database

the default dir is “./rag_storage/knowledge_id”

Parameters:

knowledge_id (str) – The id of the RAG knowledge unit.
emb_model (ModelWrapperBase) – The embedding model used for generate embeddings
knowledge_config (dict) – The configuration for llama-index to generate or load the index.
model (ModelWrapperBase) – The language model used for final synthesis
persist_root (str) – The root directory for index persisting
overwrite_index (Optional[bool]) – Whether to overwrite the index while refreshing
showprogress (Optional[bool]) – Whether to show the indexing progress

retrieve(query: str, similarity_top_k: int | None = None, to_list_strs: bool = False, retriever: BaseRetriever | None = None, **kwargs: Any) → list[Any][source]

This is a basic retrieve function for knowledge. It will build a retriever on the fly and return the result of the query. :param query: query is expected to be a question in string :type query: str :param similarity_top_k: the number of most similar data returned by the

retriever.

Parameters:

to_list_strs (bool) – whether returns the list of strings; if False, return NodeWithScore
retriever (BaseRetriever) – for advanced usage, user can pass their own retriever.

Returns:

list of str or NodeWithScore

Return type:

list[Any]

More advanced query processing can refer to https://docs.llamaindex.ai/en/stable/examples/query_transformations/query_transform_cookbook.html

refresh_index() → None[source]: Refresh the index when needed.