agentscope.rag.knowledge module

Base class module for retrieval augmented generation (RAG). To accommodate the RAG process of different packages, we abstract the RAG process into four stages: - data loading: loading data into memory for following processing; - data indexing and storage: document chunking, embedding generation, and off-load the data into VDB; - data retrieval: taking a query and return a batch of documents or document chunks; - post-processing of the retrieved data: use the retrieved data to generate an answer.

class Knowledge(knowledge_id: str, emb_model: Any | None = None, knowledge_config: dict | None = None, model: ModelWrapperBase | None = None, **kwargs: Any)[source]

Bases: ABC

Base class for RAG, CANNOT be instantiated directly

classmethod build_knowledge_instance(knowledge_id: str, knowledge_config: dict | None = None, **kwargs: Any) Knowledge[source]

A constructor to build a knowledge base instance.

Parameters:
  • knowledge_id (str) – The id of the knowledge instance.

  • knowledge_config (dict) – The configuration to the knowledge instance.

Returns:

a Knowledge instance

Return type:

Knowledge

classmethod default_config(**kwargs: Any) dict[source]

Return a default config for a knowledge class.

Parameters:

kwargs (Any) – Parameters for config

Returns:

a default config of the knowledge class

Return type:

dict

post_processing(retrieved_docs: list[str], prompt: str, **kwargs: Any) Any[source]

A default solution for post-processing function, generates answer based on the retrieved documents.

Parameters:
  • retrieved_docs (list[str]) – List of retrieved documents

  • prompt (str) – Prompt for LLM generating answer with the retrieved documents

Returns:

A synthesized answer from LLM with retrieved documents

Return type:

Any

Example

self.postprocessing_model(prompt.format(retrieved_docs))

abstract retrieve(query: Any, similarity_top_k: int | None = None, to_list_strs: bool = False, **kwargs: Any) list[RetrievedChunk | str][source]

Retrieve list of content from database (vector stored index) to memory

Parameters:
  • query (Any) – Query for retrieval

  • similarity_top_k (int) – The number of most similar data returned by the retriever.

  • to_list_strs (bool) – Whether return a list of str

Returns:

Return a list with retrieved documents (in strings)

knowledge_type: str = 'base_knowledge'

A string to identify a knowledge base class

class RetrievedChunk(score: float = 0.0, content: Any | None = None, metadata: dict | None = None, embedding: Any | None = None, hash: str | None = None)[source]

Bases: object

Retrieved content with score and meta information

score

Similarity score of this retrieved chunk

Type:

float

content

The retrieved content

Type:

Any

metadata

The meta data of this retrieved chunk, such as file path

Type:

Optional[dict]

embedding

The embedding of the chunk

Type:

Optional[Any]

hash

The hash of the retrieved content

Type:

Optional[str]

to_dict() dict[source]

convert object to dict

content: Any = None
embedding: Any | None = None
hash: str | None = None
metadata: dict | None = None
score: float = 0.0