agentscope.rag.knowledge module
Base class module for retrieval augmented generation (RAG). To accommodate the RAG process of different packages, we abstract the RAG process into four stages: - data loading: loading data into memory for following processing; - data indexing and storage: document chunking, embedding generation, and off-load the data into VDB; - data retrieval: taking a query and return a batch of documents or document chunks; - post-processing of the retrieved data: use the retrieved data to generate an answer.
- class Knowledge(knowledge_id: str, emb_model: Any | None = None, knowledge_config: dict | None = None, model: ModelWrapperBase | None = None, **kwargs: Any)[source]
Bases:
ABC
Base class for RAG, CANNOT be instantiated directly
- classmethod build_knowledge_instance(knowledge_id: str, knowledge_config: dict | None = None, **kwargs: Any) Knowledge [source]
A constructor to build a knowledge base instance.
- Parameters:
knowledge_id (str) – The id of the knowledge instance.
knowledge_config (dict) – The configuration to the knowledge instance.
- Returns:
a Knowledge instance
- Return type:
- classmethod default_config(**kwargs: Any) dict [source]
Return a default config for a knowledge class.
- Parameters:
kwargs (Any) – Parameters for config
- Returns:
a default config of the knowledge class
- Return type:
dict
- post_processing(retrieved_docs: list[str], prompt: str, **kwargs: Any) Any [source]
A default solution for post-processing function, generates answer based on the retrieved documents.
- Parameters:
retrieved_docs (list[str]) – List of retrieved documents
prompt (str) – Prompt for LLM generating answer with the retrieved documents
- Returns:
A synthesized answer from LLM with retrieved documents
- Return type:
Any
Example
self.postprocessing_model(prompt.format(retrieved_docs))
- abstract retrieve(query: Any, similarity_top_k: int | None = None, to_list_strs: bool = False, **kwargs: Any) list[RetrievedChunk | str] [source]
Retrieve list of content from database (vector stored index) to memory
- Parameters:
query (Any) – Query for retrieval
similarity_top_k (int) – The number of most similar data returned by the retriever.
to_list_strs (bool) – Whether return a list of str
- Returns:
Return a list with retrieved documents (in strings)
- knowledge_type: str = 'base_knowledge'
A string to identify a knowledge base class
- class RetrievedChunk(score: float = 0.0, content: Any | None = None, metadata: dict | None = None, embedding: Any | None = None, hash: str | None = None)[source]
Bases:
object
Retrieved content with score and meta information
- score
Similarity score of this retrieved chunk
- Type:
float
- content
The retrieved content
- Type:
Any
- metadata
The meta data of this retrieved chunk, such as file path
- Type:
Optional[dict]
- embedding
The embedding of the chunk
- Type:
Optional[Any]
- hash
The hash of the retrieved content
- Type:
Optional[str]
- content: Any = None
- embedding: Any | None = None
- hash: str | None = None
- metadata: dict | None = None
- score: float = 0.0