Configuration Reference
Complete reference for all AI Psychiatrist configuration options.
Overview
Configuration is managed via Pydantic Settings with three sources (in priority order):
- Environment variables (highest priority)
.envfile (recommended for development)- Code defaults (baseline defaults)
# Copy template and customize
cp .env.example .env
Configuration Groups
LLM Backend Settings
Selects which runtime implementation is used for chat.
| Variable | Type | Default | Description |
|---|---|---|---|
LLM_BACKEND |
string | ollama |
Backend: ollama (local HTTP) or huggingface (Transformers) |
LLM_HF_DEVICE |
string | auto |
HuggingFace device: auto, cpu, cuda, mps |
LLM_HF_QUANTIZATION |
string | (unset) | Optional HuggingFace quantization: int4 or int8 |
LLM_HF_CACHE_DIR |
path | (unset) | Optional HuggingFace cache directory |
LLM_HF_TOKEN |
string | (unset) | Optional HuggingFace token (prefer huggingface-cli login) |
Notes:
- HuggingFace dependencies are optional; install with make dev (repo) or uv sync --extra hf, or pip install 'ai-psychiatrist[hf]'.
- Canonical model names like gemma3:27b are resolved to backend-specific IDs when possible.
- Official MedGemma weights are HuggingFace-only; there is no official MedGemma in the Ollama library.
- The LLM_HF_* settings are used when HuggingFace is selected for either chat (LLM_BACKEND=huggingface) or embeddings (EMBEDDING_BACKEND=huggingface).
Example:
LLM_BACKEND=huggingface
LLM_HF_DEVICE=mps
MODEL_QUANTITATIVE_MODEL=medgemma:27b
Embedding Backend Settings
Selects which runtime implementation is used for embeddings (separate from LLM_BACKEND).
| Variable | Type | Default | Description |
|---|---|---|---|
EMBEDDING_BACKEND |
string | huggingface |
Embedding backend: ollama (fast, local) or huggingface (FP16/BF16 precision) |
Ollama Settings
Connection settings for the Ollama LLM server.
| Variable | Type | Default | Description |
|---|---|---|---|
OLLAMA_HOST |
string | 127.0.0.1 |
Ollama server hostname |
OLLAMA_PORT |
int | 11434 |
Ollama server port |
OLLAMA_TIMEOUT_SECONDS |
int | 600 |
Request timeout (min 10s). Recommend 3600 for slow GPU research runs. |
Derived properties:
- base_url: http://{host}:{port}
- chat_url: {base_url}/api/chat
- embeddings_url: {base_url}/api/embeddings
Timeout Notes:
- Default 600s may still timeout on very slow GPUs / long transcripts; use 3600 for research runs.
- OLLAMA_TIMEOUT_SECONDS applies to the legacy Ollama client and (by default) syncs to the Pydantic AI path if PYDANTIC_AI_TIMEOUT_SECONDS is unset.
- Timeout sync is implemented in Settings.validate_consistency() in src/ai_psychiatrist/config.py.
Example:
# Remote Ollama server with generous timeout
OLLAMA_HOST=192.168.1.100
OLLAMA_PORT=11434
OLLAMA_TIMEOUT_SECONDS=3600 # 1 hour for research runs
Model Settings
LLM model selection and sampling parameters.
| Variable | Type | Default | Paper Reference |
|---|---|---|---|
MODEL_QUALITATIVE_MODEL |
string | gemma3:27b |
Section 2.2 |
MODEL_JUDGE_MODEL |
string | gemma3:27b |
Section 2.2 |
MODEL_META_REVIEW_MODEL |
string | gemma3:27b |
Section 2.2 |
MODEL_QUANTITATIVE_MODEL |
string | gemma3:27b |
Section 2.2 (MedGemma in Appendix F) |
MODEL_EMBEDDING_MODEL |
string | qwen3-embedding:8b |
Section 2.2 |
MODEL_TEMPERATURE |
float | 0.0 |
Clinical AI best practice (Issue #46) |
Sampling Parameters (Evidence-Based):
All agents use temperature=0.0. We do NOT set top_k or top_p because:
1. At temp=0, they're irrelevant (greedy decoding)
2. Best practice: "use temperature only, not both" (Anthropic)
3. Claude APIs error if you set both temp and top_p
See Agent Sampling Registry for full rationale with citations
Model Options:
| Model | Size | Use Case | Performance |
|---|---|---|---|
gemma3:27b-it-qat |
~17GB | All agents (Ollama recommended) | QAT 4-bit variant (same size, better quality/speed vs standard Q4) |
gemma3:27b |
~16GB | All agents (default) | Paper Section 2.2 |
medgemma:27b |
~16GB | Quantitative (HuggingFace only) | Appendix F, 18% better MAE but more N/A |
qwen3-embedding:8b |
~4GB | Embeddings | Paper standard |
Note: gemma3:27b-it-qat is an Ollama tag; use it only with LLM_BACKEND=ollama. For HuggingFace, use canonical gemma3:27b (resolved to google/gemma-3-27b-it).
Note: MedGemma is not available in Ollama officially. Use HuggingFace backend for official weights. See Model Registry for HuggingFace setup.
Precision Comparison (Ollama vs HuggingFace):
| Model | Ollama Precision | HuggingFace Precision | Impact |
|---|---|---|---|
gemma3:27b |
Q4_K_M (4-bit) | FP16/BF16 (16-bit) | Higher quality responses |
qwen3-embedding:8b |
Q4_K_M (4-bit) | FP16/BF16 (16-bit) | More accurate similarity matching |
For best chat quality, use LLM_BACKEND=huggingface.
For best embedding quality (similarity), use EMBEDDING_BACKEND=huggingface (default).
Example:
# Canonical names (recommended): resolved per backend
MODEL_QUALITATIVE_MODEL=gemma3:27b
MODEL_QUANTITATIVE_MODEL=gemma3:27b
# HuggingFace backend + MedGemma (Appendix F evaluation)
LLM_BACKEND=huggingface
MODEL_QUANTITATIVE_MODEL=medgemma:27b
# Clinical AI: temp=0 for reproducibility
MODEL_TEMPERATURE=0.0
Embedding Settings
Few-shot retrieval configuration.
| Variable | Type | Default | Paper Reference |
|---|---|---|---|
EMBEDDING_DIMENSION |
int | 4096 |
Appendix D (optimal) |
EMBEDDING_CHUNK_SIZE |
int | 8 |
Appendix D (optimal) |
EMBEDDING_CHUNK_STEP |
int | 2 |
Section 2.4.2 |
EMBEDDING_TOP_K_REFERENCES |
int | 2 |
Appendix D (optimal) |
EMBEDDING_MIN_EVIDENCE_CHARS |
int | 8 |
Minimum text for embedding |
EMBEDDING_EMBEDDINGS_FILE |
string | huggingface_qwen3_8b_paper_train |
Reference embeddings basename (no extension), resolved under {DATA_BASE_DIR}/embeddings/ |
EMBEDDING_ENABLE_RETRIEVAL_AUDIT |
bool | false |
Spec 32 (retrieval audit logging) |
EMBEDDING_ENABLE_BATCH_QUERY_EMBEDDING |
bool | true |
Spec 37 (batch query embedding; performance-only) |
EMBEDDING_QUERY_EMBED_TIMEOUT_SECONDS |
int | 300 |
Spec 37 (query embedding timeout; stability-only) |
EMBEDDING_MIN_REFERENCE_SIMILARITY |
float | 0.0 |
Spec 33 (drop low-similarity references; 0 disables) |
EMBEDDING_MAX_REFERENCE_CHARS_PER_ITEM |
int | 0 |
Spec 33 (per-item reference context budget; 0 disables) |
EMBEDDING_ENABLE_ITEM_TAG_FILTER |
bool | false |
Spec 34 (filter refs by item tags; requires {name}.tags.json) |
EMBEDDING_REFERENCE_SCORE_SOURCE |
string | participant |
Spec 35: participant (legacy baseline; participant-level scores on chunks) or chunk (recommended; requires .chunk_scores.json) |
EMBEDDING_ALLOW_CHUNK_SCORES_PROMPT_HASH_MISMATCH |
bool | false |
Spec 35 circularity control bypass (unsafe) |
EMBEDDING_ENABLE_REFERENCE_VALIDATION |
bool | false |
Spec 36 (CRAG-style runtime validation; adds LLM calls) |
EMBEDDING_VALIDATION_MODEL |
string | (unset) | Spec 36 validation model (if unset, runners fall back to MODEL_JUDGE_MODEL) |
EMBEDDING_VALIDATION_MAX_REFS_PER_ITEM |
int | 2 |
Spec 36 max accepted refs per item after validation |
Note on artifact naming: scripts/generate_embeddings.py defaults to writing a namespaced artifact like
data/embeddings/{backend}_{model_slug}_{split}.npz. After generating, set EMBEDDING_EMBEDDINGS_FILE to that basename
(or pass --output to write to paper_reference_embeddings.npz).
Recommended (participant-only pipeline): Use a transcript-variant-stamped artifacts to avoid collisions:
DATA_TRANSCRIPTS_DIR=data/transcripts_participant_only
EMBEDDING_EMBEDDINGS_FILE=huggingface_qwen3_8b_paper_train_participant_only
EMBEDDING_REFERENCE_SCORE_SOURCE=chunk
Optional item tags (Spec 34): scripts/generate_embeddings.py --write-item-tags writes a sibling {name}.tags.json
sidecar. At runtime, enable tag-based filtering with EMBEDDING_ENABLE_ITEM_TAG_FILTER=true.
Chunk-level scoring (Spec 35): By default, retrieved chunks carry the participant's overall PHQ-8 score. Set
EMBEDDING_REFERENCE_SCORE_SOURCE=chunk to use per-chunk scores (requires scripts/score_reference_chunks.py output).
This is the recommended configuration for research-honest retrieval; participant is retained as a legacy baseline only.
CRAG validation (Spec 36): Set EMBEDDING_ENABLE_REFERENCE_VALIDATION=true to have the LLM validate each retrieved
reference at runtime (CRAG-style). Adds latency but filters irrelevant references.
Paper optimization results (Appendix D): - Embedding dimension 4096 performed best among the tested dimensions (64, 256, 1024, 4096) - Chunk size 8 optimal for clinical interviews - Top-k=2 references balances context and noise
Example:
# More references for difficult cases
EMBEDDING_TOP_K_REFERENCES=3
# Larger chunks for longer utterances
EMBEDDING_CHUNK_SIZE=10
EMBEDDING_CHUNK_STEP=3
Feedback Loop Settings
Iterative refinement configuration.
| Variable | Type | Default | Paper Reference |
|---|---|---|---|
FEEDBACK_ENABLED |
bool | true |
Enable/disable refinement |
FEEDBACK_MAX_ITERATIONS |
int | 10 |
Section 2.3.1 |
FEEDBACK_SCORE_THRESHOLD |
int | 3 |
Scores ≤3 trigger refinement |
FEEDBACK_TARGET_SCORE |
int | 4 |
Minimum acceptable score |
Threshold logic:
- Score ≤ threshold (default 3) → needs improvement
- Score ≥ target (default 4) → acceptable
Example:
# Disable feedback loop for faster inference
FEEDBACK_ENABLED=false
# More strict quality requirements
FEEDBACK_SCORE_THRESHOLD=3
FEEDBACK_MAX_ITERATIONS=15
Data Settings
File path configuration.
| Variable | Type | Default | Description |
|---|---|---|---|
DATA_BASE_DIR |
path | data |
Base data directory |
DATA_TRANSCRIPTS_DIR |
path | data/transcripts |
Transcript files (raw or preprocessed variants) |
DATA_EMBEDDINGS_PATH |
path | data/embeddings/huggingface_qwen3_8b_paper_train.npz |
Full-path override for reference embeddings (takes precedence over EMBEDDING_EMBEDDINGS_FILE) |
DATA_TRAIN_CSV |
path | data/train_split_Depression_AVEC2017.csv |
Training ground truth |
DATA_DEV_CSV |
path | data/dev_split_Depression_AVEC2017.csv |
Development ground truth |
Directory structure expected:
data/
├── transcripts/
│ ├── 300_P/
│ │ └── 300_TRANSCRIPT.csv
│ └── .../
├── transcripts_participant_only/ # optional (recommended for retrieval/embeddings)
│ ├── 300_P/300_TRANSCRIPT.csv
│ └── ...
├── embeddings/
│ ├── huggingface_qwen3_8b_paper_train_participant_only.npz # participant-only reference KB (paper-train)
│ ├── huggingface_qwen3_8b_paper_train_participant_only.json
│ ├── huggingface_qwen3_8b_paper_train_participant_only.meta.json # provenance metadata (backend/model/dim/chunking)
│ ├── huggingface_qwen3_8b_paper_train_participant_only.tags.json # optional per-chunk PHQ-8 item tags (Spec 34)
│ ├── huggingface_qwen3_8b_paper_train_participant_only.chunk_scores.json
│ ├── huggingface_qwen3_8b_paper_train_participant_only.chunk_scores.meta.json
│ ├── paper_reference_embeddings.npz # legacy/compat filename (paper-train)
│ ├── paper_reference_embeddings.json
│ └── paper_reference_embeddings.meta.json # provenance metadata (legacy/compat)
├── train_split_Depression_AVEC2017.csv
└── dev_split_Depression_AVEC2017.csv
Example:
# Custom data location
DATA_BASE_DIR=/mnt/datasets/daic-woz
DATA_TRANSCRIPTS_DIR=/mnt/datasets/daic-woz/transcripts
Logging Settings
Structured logging configuration.
| Variable | Type | Default | Options |
|---|---|---|---|
LOG_LEVEL |
string | INFO |
DEBUG, INFO, WARNING, ERROR, CRITICAL |
LOG_FORMAT |
string | json |
json, console |
LOG_INCLUDE_TIMESTAMP |
bool | true |
Add timestamp to logs |
LOG_INCLUDE_CALLER |
bool | true |
Add file:line info |
Formats:
- json: Structured JSON for production/parsing
- console: Human-readable for development
Example:
# Debug mode with readable output
LOG_LEVEL=DEBUG
LOG_FORMAT=console
Sample output:
{"event": "Starting qualitative assessment", "participant_id": 300, "word_count": 1234, "level": "info", "timestamp": "2025-12-21T10:00:00Z"}
API Settings
HTTP server configuration.
| Variable | Type | Default | Description |
|---|---|---|---|
API_HOST |
string | 0.0.0.0 |
Bind address |
API_PORT |
int | 8000 |
Server port |
API_RELOAD |
bool | false |
Hot reload (dev only) |
API_WORKERS |
int | 1 |
Worker processes (1-16) |
API_CORS_ORIGINS |
list | ["*"] |
Allowed CORS origins |
API_CORS_ORIGINS exists in configuration, but server.py does not currently install
FastAPI/Starlette CORSMiddleware. If you need CORS today, configure it at a reverse proxy
(recommended) or add CORSMiddleware in server.py.
Example:
# Production settings
API_HOST=0.0.0.0
API_PORT=8080
API_WORKERS=4
API_CORS_ORIGINS=["https://myapp.com"]
# Development settings
API_RELOAD=true
API_WORKERS=1
Quantitative Assessment Settings
These settings control the quantitative assessment behavior (evidence extraction + scoring):
| Variable | Type | Default | Description |
|---|---|---|---|
QUANTITATIVE_TRACK_NA_REASONS |
bool | true |
Track why items return N/A |
QUANTITATIVE_EVIDENCE_QUOTE_VALIDATION_ENABLED |
bool | true |
Enable evidence grounding validation (Spec 053) |
QUANTITATIVE_EVIDENCE_QUOTE_VALIDATION_MODE |
string | substring |
Validation mode: substring (exact) or fuzzy (requires rapidfuzz) |
QUANTITATIVE_EVIDENCE_QUOTE_FUZZY_THRESHOLD |
float | 0.85 |
Fuzzy matching threshold (0.0-1.0) |
QUANTITATIVE_EVIDENCE_QUOTE_FAIL_ON_ALL_REJECTED |
bool | false |
Fail participant if ALL quotes rejected (strict mode) |
QUANTITATIVE_EVIDENCE_QUOTE_LOG_REJECTIONS |
bool | true |
Log rejected quotes for debugging |
Evidence Grounding (Spec 053): Validates that LLM-extracted evidence quotes actually appear in the source transcript. Prevents hallucinated quotes from contaminating few-shot retrieval.
Example:
# Enable fuzzy matching for better recall (requires rapidfuzz)
QUANTITATIVE_EVIDENCE_QUOTE_VALIDATION_MODE="fuzzy"
QUANTITATIVE_EVIDENCE_QUOTE_FUZZY_THRESHOLD=0.85
Consistency Sampling Settings
Multi-sample scoring for agreement-based confidence signals (Spec 050).
| Variable | Type | Default | Description |
|---|---|---|---|
CONSISTENCY_ENABLED |
bool | false |
Enable multi-sample consistency scoring |
CONSISTENCY_N_SAMPLES |
int | 5 |
Number of samples per item |
CONSISTENCY_TEMPERATURE |
float | 0.2 |
Sampling temperature for consistency (must be >0 for variance) |
Temperature Rationale (BUG-027):
| Temperature | Purpose | Use Case |
|---|---|---|
0.0 |
Deterministic | Primary inference (all agents) |
0.2 |
Low-variance | Consistency sampling (clinical best practice) |
0.3+ |
Higher-variance | Not recommended for clinical tasks |
Research Evidence:
- 2025 clinical studies define 0.2 as "low" temperature threshold
- GPT-4 depression study notes performance becomes "unpredictable" at ≥0.3
- Self-consistency requires non-zero temperature for sample diversity
Example:
# Enable consistency scoring (recommended for confidence calibration)
CONSISTENCY_ENABLED=true
CONSISTENCY_N_SAMPLES=5
CONSISTENCY_TEMPERATURE=0.2 # Clinical best practice (BUG-027)
Disabling:
# Disable for faster baseline runs (no confidence signals)
CONSISTENCY_ENABLED=false
See Also: Agent Sampling Registry for full temperature rationale.
Feature Flags
System-wide toggles.
| Variable | Type | Default | Description |
|---|---|---|---|
ENABLE_FEW_SHOT |
bool | true |
Use embedding-based few-shot |
Note: ENABLE_FEW_SHOT=true requires pre-computed embeddings (resolved from DATA_EMBEDDINGS_PATH or EMBEDDING_EMBEDDINGS_FILE).
Pydantic AI Settings
Structured validation + automatic retries for agent outputs (Spec 13).
| Variable | Type | Default | Description |
|---|---|---|---|
PYDANTIC_AI_ENABLED |
bool | true |
Enable Pydantic AI TextOutput validation + retry loop |
PYDANTIC_AI_RETRIES |
int | 5 |
Retry count when validation fails (0 disables retries); increased from 3 per Spec 058 |
PYDANTIC_AI_TIMEOUT_SECONDS |
float | unset | Timeout override for Pydantic AI calls (unset = library default) |
Notes:
- This preserves existing prompt formats (e.g., <thinking>...</thinking> + <answer>...</answer>) and adds validation after generation.
- Legacy parsing fallbacks are disabled (fail-fast research behavior). If PYDANTIC_AI_ENABLED=false, agents will raise because no legacy path exists.
Timeout Notes (BUG-027):
- Unset PYDANTIC_AI_TIMEOUT_SECONDS uses the pydantic_ai library default (600s).
- Set PYDANTIC_AI_TIMEOUT_SECONDS=3600 for 1-hour research runs on throttled GPUs.
- If only one of {PYDANTIC_AI_TIMEOUT_SECONDS, OLLAMA_TIMEOUT_SECONDS} is set, Settings syncs the other to match; if both are set and differ, a warning is emitted.
Nested Delimiter
Most configuration uses the explicit group prefixes shown above (e.g., MODEL_TEMPERATURE,
OLLAMA_HOST). For advanced settings management, Pydantic also supports nested environment
variables using double underscores:
# Set nested values
MODEL__TEMPERATURE=0.5
EMBEDDING__TOP_K_REFERENCES=3
.env.example
See the repo-root .env.example for an up-to-date template, including:
- Separate LLM_BACKEND (chat) and EMBEDDING_BACKEND (embeddings)
- Reference embeddings selection via EMBEDDING_EMBEDDINGS_FILE / DATA_EMBEDDINGS_PATH
Programmatic Access
from ai_psychiatrist.config import get_settings, Settings
# Get singleton settings
settings = get_settings()
# Access nested groups
print(settings.ollama.base_url)
print(settings.model.quantitative_model)
print(settings.embedding.dimension)
print(settings.feedback.max_iterations)
# Direct instantiation (for testing)
custom = Settings(
ollama=OllamaSettings(host="custom-host"),
model=ModelSettings(temperature=0.0),
)
Validation
Settings are validated on load:
# Port range validation
OLLAMA_PORT=99999 # Error: ge=1, le=65535
# Temperature validation
MODEL_TEMPERATURE=3.0 # Error: ge=0.0, le=2.0
# Chunk size validation
EMBEDDING_CHUNK_SIZE=1 # Error: ge=2, le=20
Warnings: - Missing data directories log warnings but don't fail - Few-shot enabled without embeddings logs warning
Environment-Specific Configs
Development
LOG_LEVEL=DEBUG
LOG_FORMAT=console
API_RELOAD=true
FEEDBACK_MAX_ITERATIONS=3 # Faster iteration
Testing
# Tests automatically set TESTING=1 which skips .env loading
# Use code defaults for reproducibility
Production
LOG_LEVEL=INFO
LOG_FORMAT=json
API_WORKERS=4
API_CORS_ORIGINS=["https://production-domain.com"]
OLLAMA_TIMEOUT_SECONDS=600
See Also
- Quickstart - Initial setup
- Architecture - How settings are used
.env.example(repository root) - Environment template