agentscope.rag.knowledge¶
Base class module for retrieval augmented generation (RAG). To accommodate the RAG process of different packages, we abstract the RAG process into four stages: - data loading: loading data into memory for following processing; - data indexing and storage: document chunking, embedding generation, and off-load the data into VDB; - data retrieval: taking a query and return a batch of documents or document chunks; - post-processing of the retrieved data: use the retrieved data to generate an answer.
- class Knowledge(knowledge_id: str, emb_model: Any | None = None, knowledge_config: dict | None = None, model: ModelWrapperBase | None = None, **kwargs: Any)[source]¶
Bases:
ABC
Base class for RAG, CANNOT be instantiated directly
- classmethod build_knowledge_instance(knowledge_id: str, knowledge_config: dict | None = None, **kwargs: Any) Knowledge [source]¶
A constructor to build a knowledge base instance.
- Parameters:
knowledge_id (str) – The id of the knowledge instance.
knowledge_config (dict) – The configuration to the knowledge instance.
- Returns:
a Knowledge instance
- Return type:
- classmethod default_config(**kwargs: Any) dict [source]¶
Return a default config for a knowledge class.
- Parameters:
kwargs (Any) – Parameters for config
- Returns:
a default config of the knowledge class
- Return type:
dict
- post_processing(retrieved_docs: list[str], prompt: str, **kwargs: Any) Any [source]¶
A default solution for post-processing function, generates answer based on the retrieved documents.
- Parameters:
retrieved_docs (list[str]) – List of retrieved documents
prompt (str) – Prompt for LLM generating answer with the retrieved documents
- Returns:
A synthesized answer from LLM with retrieved documents
- Return type:
Any
Example
self.postprocessing_model(prompt.format(retrieved_docs))
- abstract retrieve(query: Any, similarity_top_k: int | None = None, to_list_strs: bool = False, **kwargs: Any) list[RetrievedChunk | str] [source]¶
Retrieve list of content from database (vector stored index) to memory
- Parameters:
query (Any) – Query for retrieval
similarity_top_k (int) – The number of most similar data returned by the retriever.
to_list_strs (bool) – Whether return a list of str
- Returns:
Return a list with retrieved documents (in strings)
- knowledge_type: str = 'base_knowledge'¶
A string to identify a knowledge base class
- class RetrievedChunk(score: float = 0.0, content: Any | None = None, metadata: dict | None = None, embedding: Any | None = None, hash: str | None = None)[source]¶
Bases:
object
Retrieved content with score and meta information
- score¶
Similarity score of this retrieved chunk
- Type:
float
- content¶
The retrieved content
- Type:
Any
- metadata¶
The meta data of this retrieved chunk, such as file path
- Type:
Optional[dict]
- embedding¶
The embedding of the chunk
- Type:
Optional[Any]
- hash¶
The hash of the retrieved content
- Type:
Optional[str]
- content: Any = None¶
- embedding: Any | None = None¶
- hash: str | None = None¶
- metadata: dict | None = None¶
- score: float = 0.0¶