agentscope.rag.knowledge module

Base class module for retrieval augmented generation (RAG). To accommodate the RAG process of different packages, we abstract the RAG process into four stages: - data loading: loading data into memory for following processing; - data indexing and storage: document chunking, embedding generation, and off-load the data into VDB; - data retrieval: taking a query and return a batch of documents or document chunks; - post-processing of the retrieved data: use the retrieved data to generate an answer.

class agentscope.rag.knowledge.Knowledge(knowledge_id: str, emb_model: Any | None = None, knowledge_config: dict | None = None, model: ModelWrapperBase | None = None, **kwargs: Any)[source]

Bases: ABC

Base class for RAG, CANNOT be instantiated directly

__init__(knowledge_id: str, emb_model: Any | None = None, knowledge_config: dict | None = None, model: ModelWrapperBase | None = None, **kwargs: Any) None[source]

initialize the knowledge component Args: knowledge_id (str):

The id of the knowledge unit.

emb_model (ModelWrapperBase):

The embedding model used for generate embeddings

knowledge_config (dict):

The configuration to generate or load the index.

abstract retrieve(query: Any, similarity_top_k: int | None = None, to_list_strs: bool = False, **kwargs: Any) list[Any][source]

retrieve list of content from database (vector stored index) to memory :param query: query for retrieval :type query: Any :param similarity_top_k: the number of most similar data returned by the

retriever.

Parameters:

to_list_strs (bool) – whether return a list of str

Returns:

return a list with retrieved documents (in strings)

post_processing(retrieved_docs: list[str], prompt: str, **kwargs: Any) Any[source]

A default solution for post-processing function, generates answer based on the retrieved documents. :param retrieved_docs: list of retrieved documents :type retrieved_docs: list[str] :param prompt: prompt for LLM generating answer with the retrieved documents :type prompt: str

Returns:

a synthesized answer from LLM with retrieved documents

Return type:

Any

Example

self.postprocessing_model(prompt.format(retrieved_docs))