Pipeline Lifecycle

The Aris RAG framework exposes three synchronous hooks for customizing the retrieval and generation process. These hooks allow for query transformation, result filtering, and prompt safety checks.

Hook Definitions

onBeforeRetrieve(query: str, context: dict) -> str

Purpose: Modify the user’s raw query before it hits the vector database.Use Cases:

Expanding acronyms (e.g., “RFP” -> “Request for Proposal”).
Correcting domain-specific spelling errors.
Injecting metadata filters based on user role.

onAfterRetrieve(chunks: list[Chunk]) -> list[Chunk]

Purpose: Filter or re-rank the raw results from the vector store.Use Cases:

Removing chunks with low confidence scores (< 0.7).
Deduplicating identical content from different sources.
Redacting PII (Personally Identifiable Information) before context injection.

onBeforeGenerate(prompt: str, chunks: list[Chunk]) -> str

Purpose: Final inspection of the full prompt sent to the LLM.Use Cases:

Checking for prompt injection attempts.
Formatting the context block with specific XML tags.
Truncating the prompt to fit strict token limits.

Implementation Example

def expand_query(query: str, ctx: dict) -> str:
    """
    Middleware to expand common acronyms.
    """
    acronyms = {"RAG": "Retrieval Augmented Generation", "LLM": "Large Language Model"}
    for key, value in acronyms.items():
        query = query.replace(key, value)
    return query

def redact_pii(chunks: list[Schema.Chunk]) -> list[Schema.Chunk]:
    """
    Middleware to scrub email addresses from retrieved content.
    """
    email_pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    for chunk in chunks:
        chunk.content = re.sub(email_pattern, "[REDACTED_EMAIL]", chunk.content)
    return chunks

# Register hooks
pipeline.hooks.register("onBeforeRetrieve", expand_query)
pipeline.hooks.register("onAfterRetrieve", redact_pii)

Advanced RAG

Lifecycle Hooks

Pipeline Lifecycle

Hook Definitions

onBeforeRetrieve(query: str, context: dict) -> str

onAfterRetrieve(chunks: list[Chunk]) -> list[Chunk]

onBeforeGenerate(prompt: str, chunks: list[Chunk]) -> str

Implementation Example

Advanced RAG

​Pipeline Lifecycle

​Hook Definitions

onBeforeRetrieve(query: str, context: dict) -> str

onAfterRetrieve(chunks: list[Chunk]) -> list[Chunk]

onBeforeGenerate(prompt: str, chunks: list[Chunk]) -> str

​Implementation Example

Pipeline Lifecycle

Hook Definitions

Implementation Example