RAG System Architecture¶
This document describes the Retrieval-Augmented Generation (RAG) pipeline used by erdos ask (and related retrieval used by erdos loop).
This is intentionally a “how it works today” doc. Wherever possible, it points at the source-of-truth (SSOT) implementation rather than duplicating large code blocks that can drift.
SSOT Pointers¶
- Ask pipeline:
src/erdos/core/ask/ - Search index (BM25/semantic/hybrid):
src/erdos/core/search/ - Search schema:
src/erdos/core/search/db.py(SCHEMA_SQL) - Research workspace paths:
src/erdos/core/research/paths.py - Configuration surface:
docs/developer/configuration.md
High-Level Flow (erdos ask)¶
question
|
v
retrieve_sources (BM25/FTS5, with fallback sources)
|
v
build_prompt (deterministic, citation-grounded)
|
v
execute_llm_if_enabled (external subprocess; optional)
|
v
answer (+ sources, retrieval metadata)
erdos ask currently uses BM25/FTS5 retrieval. Semantic/hybrid search are available via erdos search --semantic/--hybrid, but are not wired into erdos ask today.
Retrieval Layer¶
BM25 / FTS5 (Always Available)¶
- Implementation:
src/erdos/core/search/bm25.py(BM25Search) - Backing store: SQLite index (default:
index/erdos.sqlite) - FTS tokenizer:
porter unicode61(seesrc/erdos/core/search/db.py) - Ranking: SQLite
bm25(chunks_fts)(SQLite returns negative scores; code flips to positive)
Safety: user-entered queries are normalized via:
safe_fts5_query(query, allow_advanced_syntax=True)
This prevents FTS5 parse failures like sum-free sets → OperationalError: no such column: free while still supporting phrase/prefix queries (see integration tests in tests/integration/test_search_index.py).
Semantic Search (Optional)¶
- Implementation:
src/erdos/core/search/embeddings.py(EmbeddingModel) - Storage:
chunk_embeddingstable (BLOBs) +schema_meta(model/dimension metadata) - Similarity: cosine similarity over stored embeddings (brute-force scan; O(n) per query)
- Requires optional deps:
uv sync --extra embeddings
Hybrid Search (Optional)¶
- Implementation:
src/erdos/core/search/hybrid.py - Candidate generation: BM25 top
2 * limit - Re-ranking: cosine similarity over those candidates
- Score:
hybrid_score = (1 - alpha) * bm25_normalized + alpha * semantic_score
semantic_score = (cosine_similarity + 1) / 2
Ask Command Orchestration¶
Entry Points¶
- CLI adapter:
src/erdos/commands/ask.py - Core service:
src/erdos/core/ask/service.py(ask_question)
Source Retrieval (retrieve_sources)¶
Implementation: src/erdos/core/ask/retrieval.py
Behavior:
- If the search index has no chunks, return fallback sources only.
- Otherwise:
- Build a baseline list of fallback sources (statement/notes/synthesis).
- Run BM25 retrieval scoped to
problem_id. - Combine + de-duplicate by
chunk_id, preserving baseline-first order.
FTS Query Construction¶
Implementation: src/erdos/core/ask/retrieval.py (perform_retrieval)
- A “haystack” string is built from
problem.title+ the user question. - It is converted into a safe FTS query via:
safe_fts5_query(haystack, allow_advanced_syntax=False)
We explicitly disable advanced syntax here because the query is programmatically constructed from free text; the user may include stray quotes/operators that should not be treated as FTS5 syntax.
Fallback Sources¶
Fallback sources come from:
- Research synthesis (if present):
research/problems/{problem_id:04d}/SYNTHESIS.md - Problem statement
- Problem notes (if present)
Paths are derived via src/erdos/core/research/paths.py (do not hardcode).
Prompt Construction (build_prompt)¶
Implementation: src/erdos/core/ask/prompt.py
The prompt is deterministic and includes:
- Problem metadata (id/title)
- Full statement
- Notes (or
(none)) - Full retrieved sources, numbered for citation (
[1],[2], …) - The user question
- Explicit instructions to cite sources
Prompt Budgeting (Hard Cap)¶
To avoid downstream LLM context-limit failures when sources contain large blobs (e.g., PDF extractions or long synthesis), erdos ask enforces a hard prompt-size cap:
- Constant:
ASK_PROMPT_MAX_BYTES(src/erdos/core/constants.py) - Behavior: truncates statement/notes/source texts as needed (UTF‑8 byte budget) and appends
(...truncated...)when content is cut
LLM Execution¶
Implementation: src/erdos/core/ask/llm.py
- LLM is an external subprocess (shell-free execution via
shlex.split). - Prompt is passed via stdin; answer is read from stdout.
- Default timeout:
LLM_COMMAND_TIMEOUT = 300seconds (src/erdos/core/constants.py).
Routing:
- CLI layer resolves the command via SPEC-032 routing (
src/erdos/core/llm/). - Users can override with
--llm-cmdor disable with--no-llm.
Loop RAG (erdos loop)¶
erdos loop also supports lightweight RAG context:
- Implementation:
src/erdos/core/loop/service.py(_build_rag_chunks) - Source: per-problem synthesis at
research/problems/{problem_id:04d}/SYNTHESIS.md - The synthesis is split into multiple chunks (best-effort) and passed into the loop prompt.
Storage (Search Index Schema)¶
The schema SSOT is src/erdos/core/search/db.py (SCHEMA_SQL).
Key tables (high level):
problems: problem metadata for indexingchunks: the canonical text chunks (statement/notes/references/research records)chunks_fts: FTS5 virtual table overchunks.text(external content via_rowid)chunk_embeddings: optional embeddings keyed bychunk_idschema_meta: simple key/value metadata (e.g., embedding model + dimension)
Configuration¶
RAG behavior is mostly configuration-free, but these knobs matter:
ERDOS_DATA_PATH: dataset path resolutionERDOS_INDEX_PATH: search index locationERDOS_LLM_COMMAND*: task routing for LLM executionERDOS_LOAD_DOTENV:.envauto-loading toggle (CLI ergonomics)
See docs/developer/configuration.md for the full, authoritative list.
Tests¶
- Ask retrieval:
tests/unit/ask/test_retrieval.py - Search index integration (FTS5 syntax, including hyphen safety):
tests/integration/test_search_index.py - Semantic/hybrid search:
tests/integration/test_search_semantic.py - Embedding model + serialization:
tests/unit/search/test_embeddings.py