agentscope.rag.knowledge module
Base class module for retrieval augmented generation (RAG). To accommodate the RAG process of different packages, we abstract the RAG process into four stages: - data loading: loading data into memory for following processing; - data indexing and storage: document chunking, embedding generation, and off-load the data into VDB; - data retrieval: taking a query and return a batch of documents or document chunks; - post-processing of the retrieved data: use the retrieved data to generate an answer.
- class agentscope.rag.knowledge.Knowledge(knowledge_id: str, emb_model: Any | None = None, knowledge_config: dict | None = None, model: ModelWrapperBase | None = None, **kwargs: Any)[source]
Bases:
ABC
Base class for RAG, CANNOT be instantiated directly
- __init__(knowledge_id: str, emb_model: Any | None = None, knowledge_config: dict | None = None, model: ModelWrapperBase | None = None, **kwargs: Any) None [source]
initialize the knowledge component Args: knowledge_id (str):
The id of the knowledge unit.
- emb_model (ModelWrapperBase):
The embedding model used for generate embeddings
- knowledge_config (dict):
The configuration to generate or load the index.
- abstract retrieve(query: Any, similarity_top_k: int | None = None, to_list_strs: bool = False, **kwargs: Any) list[Any] [source]
retrieve list of content from database (vector stored index) to memory :param query: query for retrieval :type query: Any :param similarity_top_k: the number of most similar data returned by the
retriever.
- Parameters:
to_list_strs (bool) – whether return a list of str
- Returns:
return a list with retrieved documents (in strings)
- post_processing(retrieved_docs: list[str], prompt: str, **kwargs: Any) Any [source]
A default solution for post-processing function, generates answer based on the retrieved documents. :param retrieved_docs: list of retrieved documents :type retrieved_docs: list[str] :param prompt: prompt for LLM generating answer with the retrieved documents :type prompt: str
- Returns:
a synthesized answer from LLM with retrieved documents
- Return type:
Any
Example
self.postprocessing_model(prompt.format(retrieved_docs))