RAG (Retrieval-Augmented Generation)

Phase 8c — Config-driven knowledge base retrieval integrated into the LLM pipeline.

Overview

EDDI's RAG system is a first-class workflow extension that adds contextual knowledge retrieval to LLM conversations. Knowledge bases are versioned configurations — just like behavior rules or httpCalls — managed via REST API and wired into workflows.

At execution time, the LlmTask discovers RAG configurations from the agent's workflow, performs vector similarity search against the user's query, and injects the retrieved context into the LLM system message — all automatically and transparently.

Architecture

User Query
    │
    ▼
┌─────────────────── LlmTask.executeTask() ───────────────────┐
│                                                              │
│  1. Extract user input from conversation memory              │
│  2. RagContextProvider.retrieveContext()                     │
│     ├── WorkflowTraversal.discoverConfigs() → find RAG steps │
│     ├── Match KBs (explicit refs or auto-discover all)       │
│     ├── EmbeddingModelFactory → cached embedding model       │
│     ├── EmbeddingStoreFactory → cached vector store          │
│     ├── EmbeddingStoreContentRetriever → similarity search   │
│     └── Store audit trace in conversation memory             │
│  3. Inject context: systemMessage += "## Relevant Context"   │
│  4. Build chat messages and call LLM                         │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Configuration

RagConfiguration (Knowledge Base)

A RagConfiguration is a versioned resource at /ragstore/rags/. It defines:

{
  "name": "product-docs",
  "embeddingProvider": "openai",
  "embeddingParameters": {
    "model": "text-embedding-3-small",
    "apiKey": "${eddivault:tenant/agent/openai-key}"
  },
  "storeType": "in-memory",
  "storeParameters": {},
  "chunkStrategy": "recursive",
  "chunkSize": 512,
  "chunkOverlap": 64,
  "maxResults": 5,
  "minScore": 0.6
}

Field

Default

Description

name

—

Display name / identifier for this knowledge base

embeddingProvider

openai

Provider (see Embedding Providers table below)

embeddingParameters

—

Provider-specific params (model, apiKey, baseUrl, etc.)

storeType

in-memory

Vector store (see Vector Stores table below)

storeParameters

—

Store-specific connection params

chunkStrategy

recursive

Document chunking strategy

chunkSize

512

Chunk size in characters

chunkOverlap

64

Chunk overlap in characters

maxResults

5

Default top-K results

minScore

0.6

Default minimum similarity score (0.0–1.0)

LLM Task RAG Configuration

RAG is wired into LLM tasks via three fields on LlmConfiguration.Task:

Option 1: Explicit Knowledge Base References

{
  "tasks": [{
    "actions": ["*"],
    "type": "openai",
    "knowledgeBases": [
      { "name": "product-docs", "maxResults": 5, "minScore": 0.7 },
      { "name": "faq", "maxResults": 3 }
    ],
    "parameters": {
      "systemMessage": "You are a helpful assistant."
    }
  }]
}

Each reference names a KB from the workflow and optionally overrides retrieval parameters.

Option 2: Auto-Discovery

{
  "tasks": [{
    "enableWorkflowRag": true,
    "ragDefaults": { "maxResults": 5, "minScore": 0.7 }
  }]
}

When enableWorkflowRag is true, the system discovers all RAG steps from the workflow automatically.

Option 3: httpCall RAG (Phase 8c-0)

{
  "tasks": [{
    "httpCallRag": "search-api"
  }]
}

Zero-infrastructure RAG: execute a named httpCall and inject its response as ## Search Results: context. The user's input is available as {userInput} in httpCall templates. No vector store needed. Both httpCall RAG and vector RAG can be active simultaneously.

REST API

Configuration Management

Method

Path

Description

GET

/ragstore/rags/jsonSchema

JSON Schema for validation

GET

/ragstore/rags/descriptors

List KB descriptors

GET

/ragstore/rags/{id}?version=N

Read a KB configuration

POST

/ragstore/rags

Create a new KB

PUT

/ragstore/rags/{id}?version=N

Update a KB

POST

/ragstore/rags/{id}?version=N

Duplicate a KB

DELETE

/ragstore/rags/{id}?version=N

Delete a KB

Document Ingestion

Method

Path

Description

POST

/ragstore/rags/{id}/ingest?version=N&kbId=...&documentName=...

Ingest a text document (returns 202 + ingestion ID)

GET

/ragstore/rags/{id}/ingestion/{ingestionId}/status

Poll ingestion status

Example: Ingest a document

curl -X POST http://localhost:7070/ragstore/rags/abc123/ingest?version=1\&documentName=readme.txt \
  -H "Content-Type: text/plain" \
  -d "This is the document content to be chunked, embedded, and stored."

Response: 202 Accepted

{
  "ingestionId": "550e8400-e29b-41d4-a716-446655440000",
  "kbId": "product-docs",
  "status": "pending"
}

Poll status:

curl http://localhost:7070/ragstore/rags/abc123/ingestion/550e8400-e29b-41d4-a716-446655440000/status

Response:

{
  "ingestionId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed"
}

Status values: pending → processing → completed | failed: <error message>

Observability

RAG operations write audit traces to conversation memory:

Memory Key

Content

rag:trace:{taskId}

Per-KB retrieval metadata (provider, storeType, maxResults, minScore, retrievedCount)

rag:context:{taskId}

Formatted context string injected into the LLM

rag:httpcall:trace:{taskId}

httpCall RAG execution metadata (httpCall name, context length)

These are visible in the conversation memory snapshot and the audit ledger.

Embedding Providers

Provider

Default Model

Required Parameters

Notes

openai

text-embedding-3-small

apiKey

Use ${eddivault:...} for keys

azure-openai

text-embedding-3-small

endpoint, apiKey, deploymentName

Azure-hosted OpenAI models

ollama

nomic-embed-text

—

baseUrl (default: localhost:11434)

mistral

mistral-embed

apiKey

Mistral AI embedding model

bedrock

amazon.titan-embed-text-v2:0

—

Uses AWS credentials chain; region (default: us-east-1)

cohere

embed-english-v3.0

apiKey

Excellent multilingual support

vertex

text-embedding-005

project

location (default: us-central1); uses GCP credentials

Vector Stores

Store Type

Required Parameters

Notes

in-memory

—

Ephemeral, for dev/test only

pgvector

password

PostgreSQL + pgvector; host, port, database, user, table, dimension

mongodb-atlas

connectionString

MongoDB Atlas Vector Search; databaseName, collectionName, indexName

elasticsearch

—

serverUrl (default: localhost:9200); optional apiKey or userName+password; indexName

qdrant

—

host (default: localhost), port (default: 6334); optional apiKey, useTls; collectionName

Status

✅ Phase 8c: RAG Foundation — config-driven knowledge base retrieval
✅ Phase 8c-0: httpCall-based RAG (zero infrastructure)
✅ Phase 8c-β: Persistent vector stores (pgvector)
✅ Phase 8c-γ: RAG provider expansion (7 embedding models + 5 vector stores)
✅ Phase 8c-M: Manager UI — RAG editor with full provider parity + document ingestion
✅ REST ingestion endpoint: POST /ragstore/rags/{id}/ingest

Future Enhancements

Advanced retrieval: re-ranking, hybrid search, metadata filtering
ONNX in-process embeddings (air-gapped / edge deployments)

PreviousLLM Integration NextModel Cascade

Last updated 17 hours ago

Was this helpful?

hashtagOverview

hashtagArchitecture

hashtagConfiguration

hashtagRagConfiguration (Knowledge Base)

hashtagLLM Task RAG Configuration

hashtagOption 1: Explicit Knowledge Base References

hashtagOption 2: Auto-Discovery

hashtagOption 3: httpCall RAG (Phase 8c-0)

hashtagREST API

hashtagConfiguration Management

hashtagDocument Ingestion

hashtagObservability

hashtagEmbedding Providers

hashtagVector Stores

hashtagStatus

hashtagFuture Enhancements

Overview

Architecture

Configuration

RagConfiguration (Knowledge Base)

LLM Task RAG Configuration

Option 1: Explicit Knowledge Base References

Option 2: Auto-Discovery

Option 3: httpCall RAG (Phase 8c-0)

REST API

Configuration Management

Document Ingestion

Observability

Embedding Providers

Vector Stores

Status

Future Enhancements