RAG (Retrieval-Augmented Generation)

Phase 8c — Config-driven knowledge base retrieval integrated into the LLM pipeline.

Overview

EDDI's RAG system is a first-class workflow extension that adds contextual knowledge retrieval to LLM conversations. Knowledge bases are versioned configurations — just like behavior rules or httpCalls — managed via REST API and wired into workflows.

At execution time, the LlmTask discovers RAG configurations from the agent's workflow, performs vector similarity search against the user's query, and injects the retrieved context into the LLM system message — all automatically and transparently.

Architecture

User Query


┌─────────────────── LlmTask.executeTask() ───────────────────┐
│                                                              │
│  1. Extract user input from conversation memory              │
│  2. RagContextProvider.retrieveContext()                     │
│     ├── WorkflowTraversal.discoverConfigs() → find RAG steps │
│     ├── Match KBs (explicit refs or auto-discover all)       │
│     ├── EmbeddingModelFactory → cached embedding model       │
│     ├── EmbeddingStoreFactory → cached vector store          │
│     ├── EmbeddingStoreContentRetriever → similarity search   │
│     └── Store audit trace in conversation memory             │
│  3. Inject context: systemMessage += "## Relevant Context"   │
│  4. Build chat messages and call LLM                         │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Configuration

RagConfiguration (Knowledge Base)

A RagConfiguration is a versioned resource at /ragstore/rags/. It defines:

Field
Default
Description

name

Display name / identifier for this knowledge base

embeddingProvider

openai

Provider (see Embedding Providers table below)

embeddingParameters

Provider-specific params (model, apiKey, baseUrl, etc.)

storeType

in-memory

Vector store (see Vector Stores table below)

storeParameters

Store-specific connection params

chunkStrategy

recursive

Document chunking strategy

chunkSize

512

Chunk size in characters

chunkOverlap

64

Chunk overlap in characters

maxResults

5

Default top-K results

minScore

0.6

Default minimum similarity score (0.0–1.0)

LLM Task RAG Configuration

RAG is wired into LLM tasks via three fields on LlmConfiguration.Task:

Option 1: Explicit Knowledge Base References

Each reference names a KB from the workflow and optionally overrides retrieval parameters.

Option 2: Auto-Discovery

When enableWorkflowRag is true, the system discovers all RAG steps from the workflow automatically.

Option 3: httpCall RAG (Phase 8c-0)

Zero-infrastructure RAG: execute a named httpCall and inject its response as ## Search Results: context. The user's input is available as {userInput} in httpCall templates. No vector store needed. Both httpCall RAG and vector RAG can be active simultaneously.

REST API

Configuration Management

Method
Path
Description

GET

/ragstore/rags/jsonSchema

JSON Schema for validation

GET

/ragstore/rags/descriptors

List KB descriptors

GET

/ragstore/rags/{id}?version=N

Read a KB configuration

POST

/ragstore/rags

Create a new KB

PUT

/ragstore/rags/{id}?version=N

Update a KB

POST

/ragstore/rags/{id}?version=N

Duplicate a KB

DELETE

/ragstore/rags/{id}?version=N

Delete a KB

Document Ingestion

Method
Path
Description

POST

/ragstore/rags/{id}/ingest?version=N&kbId=...&documentName=...

Ingest a text document (returns 202 + ingestion ID)

GET

/ragstore/rags/{id}/ingestion/{ingestionId}/status

Poll ingestion status

Example: Ingest a document

Response: 202 Accepted

Poll status:

Response:

Status values: pendingprocessingcompleted | failed: <error message>

Observability

RAG operations write audit traces to conversation memory:

Memory Key
Content

rag:trace:{taskId}

Per-KB retrieval metadata (provider, storeType, maxResults, minScore, retrievedCount)

rag:context:{taskId}

Formatted context string injected into the LLM

rag:httpcall:trace:{taskId}

httpCall RAG execution metadata (httpCall name, context length)

These are visible in the conversation memory snapshot and the audit ledger.

Embedding Providers

Provider
Default Model
Required Parameters
Notes

openai

text-embedding-3-small

apiKey

Use ${eddivault:...} for keys

azure-openai

text-embedding-3-small

endpoint, apiKey, deploymentName

Azure-hosted OpenAI models

ollama

nomic-embed-text

baseUrl (default: localhost:11434)

mistral

mistral-embed

apiKey

Mistral AI embedding model

bedrock

amazon.titan-embed-text-v2:0

Uses AWS credentials chain; region (default: us-east-1)

cohere

embed-english-v3.0

apiKey

Excellent multilingual support

vertex

text-embedding-005

project

location (default: us-central1); uses GCP credentials

Vector Stores

Store Type
Required Parameters
Notes

in-memory

Ephemeral, for dev/test only

pgvector

password

PostgreSQL + pgvector; host, port, database, user, table, dimension

mongodb-atlas

connectionString

MongoDB Atlas Vector Search; databaseName, collectionName, indexName

elasticsearch

serverUrl (default: localhost:9200); optional apiKey or userName+password; indexName

qdrant

host (default: localhost), port (default: 6334); optional apiKey, useTls; collectionName

Status

  • Phase 8c: RAG Foundation — config-driven knowledge base retrieval

  • Phase 8c-0: httpCall-based RAG (zero infrastructure)

  • Phase 8c-β: Persistent vector stores (pgvector)

  • Phase 8c-γ: RAG provider expansion (7 embedding models + 5 vector stores)

  • Phase 8c-M: Manager UI — RAG editor with full provider parity + document ingestion

  • REST ingestion endpoint: POST /ragstore/rags/{id}/ingest

Future Enhancements

  • Advanced retrieval: re-ranking, hybrid search, metadata filtering

  • ONNX in-process embeddings (air-gapped / edge deployments)

Last updated

Was this helpful?