agentscope.rag.knowledge

Base class module for retrieval augmented generation (RAG). To accommodate the RAG process of different packages, we abstract the RAG process into four stages: - data loading: loading data into memory for following processing; - data indexing and storage: document chunking, embedding generation, and off-load the data into VDB; - data retrieval: taking a query and return a batch of documents or document chunks; - post-processing of the retrieved data: use the retrieved data to generate an answer.

class Knowledge(knowledge_id: str, emb_model: Any | None = None, knowledge_config: dict | None = None, model: ModelWrapperBase | None = None, **kwargs: Any)[source]

Bases: ABC

Base class for RAG, CANNOT be instantiated directly

classmethod build_knowledge_instance(knowledge_id: str, knowledge_config: dict | None = None, **kwargs: Any) → Knowledge[source]

A constructor to build a knowledge base instance.

Parameters:

knowledge_id (str) – The id of the knowledge instance.
knowledge_config (dict) – The configuration to the knowledge instance.

Returns:

a Knowledge instance

Return type:

Knowledge

classmethod default_config(**kwargs: Any) → dict[source]

Return a default config for a knowledge class.

Parameters:: kwargs (Any) – Parameters for config
Returns:: a default config of the knowledge class
Return type:: dict

post_processing(retrieved_docs: list[str], prompt: str, **kwargs: Any) → Any[source]

A default solution for post-processing function, generates answer based on the retrieved documents.

Parameters:

retrieved_docs (list[str]) – List of retrieved documents
prompt (str) – Prompt for LLM generating answer with the retrieved documents

Returns:

A synthesized answer from LLM with retrieved documents

Return type:

Any

Example

self.postprocessing_model(prompt.format(retrieved_docs))

abstract retrieve(query: Any, similarity_top_k: int | None = None, to_list_strs: bool = False, **kwargs: Any) → list[RetrievedChunk | str][source]

Retrieve list of content from database (vector stored index) to memory

Parameters:

query (Any) – Query for retrieval
similarity_top_k (int) – The number of most similar data returned by the retriever.
to_list_strs (bool) – Whether return a list of str

Returns:

Return a list with retrieved documents (in strings)

knowledge_type: str = 'base_knowledge': A string to identify a knowledge base class

class RetrievedChunk(score: float = 0.0, content: Any | None = None, metadata: dict | None = None, embedding: Any | None = None, hash: str | None = None)[source]

Bases: object

Retrieved content with score and meta information

score

Similarity score of this retrieved chunk

Type:: float

content

The retrieved content

Type:: Any

metadata

The meta data of this retrieved chunk, such as file path

Type:: Optional[dict]

embedding

The embedding of the chunk

Type:: Optional[Any]

hash

The hash of the retrieved content

Type:: Optional[str]

to_dict() → dict[source]: convert object to dict

content: Any = None

embedding: Any | None = None

hash: str | None = None

metadata: dict | None = None

score: float = 0.0