Configuration Reference

Complete reference for all AI Psychiatrist configuration options.

Overview

Configuration is managed via Pydantic Settings with three sources (in priority order):

Environment variables (highest priority)
.env file (recommended for development)
Code defaults (baseline defaults)

# Copy template and customize
cp .env.example .env

Configuration Groups

LLM Backend Settings

Selects which runtime implementation is used for chat.

Variable	Type	Default	Description
`LLM_BACKEND`	string	`ollama`	Backend: `ollama` (local HTTP) or `huggingface` (Transformers)
`LLM_HF_DEVICE`	string	`auto`	HuggingFace device: `auto`, `cpu`, `cuda`, `mps`
`LLM_HF_QUANTIZATION`	string	(unset)	Optional HuggingFace quantization: `int4` or `int8`
`LLM_HF_CACHE_DIR`	path	(unset)	Optional HuggingFace cache directory
`LLM_HF_TOKEN`	string	(unset)	Optional HuggingFace token (prefer `huggingface-cli login`)

Notes: - HuggingFace dependencies are optional; install with make dev (repo) or uv sync --extra hf, or pip install 'ai-psychiatrist[hf]'. - Canonical model names like gemma3:27b are resolved to backend-specific IDs when possible. - Official MedGemma weights are HuggingFace-only; there is no official MedGemma in the Ollama library. - The LLM_HF_* settings are used when HuggingFace is selected for either chat (LLM_BACKEND=huggingface) or embeddings (EMBEDDING_BACKEND=huggingface).

Example:

LLM_BACKEND=huggingface
LLM_HF_DEVICE=mps
MODEL_QUANTITATIVE_MODEL=medgemma:27b

Embedding Backend Settings

Selects which runtime implementation is used for embeddings (separate from LLM_BACKEND).

Variable	Type	Default	Description
`EMBEDDING_BACKEND`	string	`huggingface`	Embedding backend: `ollama` (fast, local) or `huggingface` (FP16/BF16 precision)

Ollama Settings

Connection settings for the Ollama LLM server.

Variable	Type	Default	Description
`OLLAMA_HOST`	string	`127.0.0.1`	Ollama server hostname
`OLLAMA_PORT`	int	`11434`	Ollama server port
`OLLAMA_TIMEOUT_SECONDS`	int	`600`	Request timeout (min 10s). Recommend `3600` for slow GPU research runs.

Derived properties: - base_url: http://{host}:{port} - chat_url: {base_url}/api/chat - embeddings_url: {base_url}/api/embeddings

Timeout Notes: - Default 600s may still timeout on very slow GPUs / long transcripts; use 3600 for research runs. - OLLAMA_TIMEOUT_SECONDS applies to the legacy Ollama client and (by default) syncs to the Pydantic AI path if PYDANTIC_AI_TIMEOUT_SECONDS is unset. - Timeout sync is implemented in Settings.validate_consistency() in src/ai_psychiatrist/config.py.

Example:

# Remote Ollama server with generous timeout
OLLAMA_HOST=192.168.1.100
OLLAMA_PORT=11434
OLLAMA_TIMEOUT_SECONDS=3600  # 1 hour for research runs

Model Settings

LLM model selection and sampling parameters.

Variable	Type	Default	Paper Reference
`MODEL_QUALITATIVE_MODEL`	string	`gemma3:27b`	Section 2.2
`MODEL_JUDGE_MODEL`	string	`gemma3:27b`	Section 2.2
`MODEL_META_REVIEW_MODEL`	string	`gemma3:27b`	Section 2.2
`MODEL_QUANTITATIVE_MODEL`	string	`gemma3:27b`	Section 2.2 (MedGemma in Appendix F)
`MODEL_EMBEDDING_MODEL`	string	`qwen3-embedding:8b`	Section 2.2
`MODEL_TEMPERATURE`	float	`0.0`	Clinical AI best practice (Issue #46)

Sampling Parameters (Evidence-Based):

All agents use temperature=0.0. We do NOT set top_k or top_p because: 1. At temp=0, they're irrelevant (greedy decoding) 2. Best practice: "use temperature only, not both" (Anthropic) 3. Claude APIs error if you set both temp and top_p

See Agent Sampling Registry for full rationale with citations

Model Options:

Model	Size	Use Case	Performance
`gemma3:27b-it-qat`	~17GB	All agents (Ollama recommended)	QAT 4-bit variant (same size, better quality/speed vs standard Q4)
`gemma3:27b`	~16GB	All agents (default)	Paper Section 2.2
`medgemma:27b`	~16GB	Quantitative (HuggingFace only)	Appendix F, 18% better MAE but more N/A
`qwen3-embedding:8b`	~4GB	Embeddings	Paper standard

Note: gemma3:27b-it-qat is an Ollama tag; use it only with LLM_BACKEND=ollama. For HuggingFace, use canonical gemma3:27b (resolved to google/gemma-3-27b-it).

Note: MedGemma is not available in Ollama officially. Use HuggingFace backend for official weights. See Model Registry for HuggingFace setup.

Precision Comparison (Ollama vs HuggingFace):

Model	Ollama Precision	HuggingFace Precision	Impact
`gemma3:27b`	Q4_K_M (4-bit)	FP16/BF16 (16-bit)	Higher quality responses
`qwen3-embedding:8b`	Q4_K_M (4-bit)	FP16/BF16 (16-bit)	More accurate similarity matching

For best chat quality, use LLM_BACKEND=huggingface. For best embedding quality (similarity), use EMBEDDING_BACKEND=huggingface (default).

Example:

# Canonical names (recommended): resolved per backend
MODEL_QUALITATIVE_MODEL=gemma3:27b
MODEL_QUANTITATIVE_MODEL=gemma3:27b

# HuggingFace backend + MedGemma (Appendix F evaluation)
LLM_BACKEND=huggingface
MODEL_QUANTITATIVE_MODEL=medgemma:27b

# Clinical AI: temp=0 for reproducibility
MODEL_TEMPERATURE=0.0

Embedding Settings

Few-shot retrieval configuration.

Variable	Type	Default	Paper Reference
`EMBEDDING_DIMENSION`	int	`4096`	Appendix D (optimal)
`EMBEDDING_CHUNK_SIZE`	int	`8`	Appendix D (optimal)
`EMBEDDING_CHUNK_STEP`	int	`2`	Section 2.4.2
`EMBEDDING_TOP_K_REFERENCES`	int	`2`	Appendix D (optimal)
`EMBEDDING_MIN_EVIDENCE_CHARS`	int	`8`	Minimum text for embedding
`EMBEDDING_EMBEDDINGS_FILE`	string	`huggingface_qwen3_8b_paper_train`	Reference embeddings basename (no extension), resolved under `{DATA_BASE_DIR}/embeddings/`
`EMBEDDING_ENABLE_RETRIEVAL_AUDIT`	bool	`false`	Spec 32 (retrieval audit logging)
`EMBEDDING_ENABLE_BATCH_QUERY_EMBEDDING`	bool	`true`	Spec 37 (batch query embedding; performance-only)
`EMBEDDING_QUERY_EMBED_TIMEOUT_SECONDS`	int	`300`	Spec 37 (query embedding timeout; stability-only)
`EMBEDDING_MIN_REFERENCE_SIMILARITY`	float	`0.0`	Spec 33 (drop low-similarity references; 0 disables)
`EMBEDDING_MAX_REFERENCE_CHARS_PER_ITEM`	int	`0`	Spec 33 (per-item reference context budget; 0 disables)
`EMBEDDING_ENABLE_ITEM_TAG_FILTER`	bool	`false`	Spec 34 (filter refs by item tags; requires `{name}.tags.json`)
`EMBEDDING_REFERENCE_SCORE_SOURCE`	string	`participant`	Spec 35: `participant` (legacy baseline; participant-level scores on chunks) or `chunk` (recommended; requires `.chunk_scores.json`)
`EMBEDDING_ALLOW_CHUNK_SCORES_PROMPT_HASH_MISMATCH`	bool	`false`	Spec 35 circularity control bypass (unsafe)
`EMBEDDING_ENABLE_REFERENCE_VALIDATION`	bool	`false`	Spec 36 (CRAG-style runtime validation; adds LLM calls)
`EMBEDDING_VALIDATION_MODEL`	string	(unset)	Spec 36 validation model (if unset, runners fall back to `MODEL_JUDGE_MODEL`)
`EMBEDDING_VALIDATION_MAX_REFS_PER_ITEM`	int	`2`	Spec 36 max accepted refs per item after validation

Note on artifact naming: scripts/generate_embeddings.py defaults to writing a namespaced artifact like data/embeddings/{backend}_{model_slug}_{split}.npz. After generating, set EMBEDDING_EMBEDDINGS_FILE to that basename (or pass --output to write to paper_reference_embeddings.npz).

Recommended (participant-only pipeline): Use a transcript-variant-stamped artifacts to avoid collisions:

DATA_TRANSCRIPTS_DIR=data/transcripts_participant_only
EMBEDDING_EMBEDDINGS_FILE=huggingface_qwen3_8b_paper_train_participant_only
EMBEDDING_REFERENCE_SCORE_SOURCE=chunk

Optional item tags (Spec 34): scripts/generate_embeddings.py --write-item-tags writes a sibling {name}.tags.json sidecar. At runtime, enable tag-based filtering with EMBEDDING_ENABLE_ITEM_TAG_FILTER=true.

Chunk-level scoring (Spec 35): By default, retrieved chunks carry the participant's overall PHQ-8 score. Set EMBEDDING_REFERENCE_SCORE_SOURCE=chunk to use per-chunk scores (requires scripts/score_reference_chunks.py output). This is the recommended configuration for research-honest retrieval; participant is retained as a legacy baseline only.

CRAG validation (Spec 36): Set EMBEDDING_ENABLE_REFERENCE_VALIDATION=true to have the LLM validate each retrieved reference at runtime (CRAG-style). Adds latency but filters irrelevant references.

Paper optimization results (Appendix D): - Embedding dimension 4096 performed best among the tested dimensions (64, 256, 1024, 4096) - Chunk size 8 optimal for clinical interviews - Top-k=2 references balances context and noise

Example:

# More references for difficult cases
EMBEDDING_TOP_K_REFERENCES=3

# Larger chunks for longer utterances
EMBEDDING_CHUNK_SIZE=10
EMBEDDING_CHUNK_STEP=3

Feedback Loop Settings

Iterative refinement configuration.

Variable	Type	Default	Paper Reference
`FEEDBACK_ENABLED`	bool	`true`	Enable/disable refinement
`FEEDBACK_MAX_ITERATIONS`	int	`10`	Section 2.3.1
`FEEDBACK_SCORE_THRESHOLD`	int	`3`	Scores ≤3 trigger refinement
`FEEDBACK_TARGET_SCORE`	int	`4`	Minimum acceptable score

Threshold logic: - Score ≤ threshold (default 3) → needs improvement - Score ≥ target (default 4) → acceptable

Example:

# Disable feedback loop for faster inference
FEEDBACK_ENABLED=false

# More strict quality requirements
FEEDBACK_SCORE_THRESHOLD=3
FEEDBACK_MAX_ITERATIONS=15

Data Settings

File path configuration.

Variable	Type	Default	Description
`DATA_BASE_DIR`	path	`data`	Base data directory
`DATA_TRANSCRIPTS_DIR`	path	`data/transcripts`	Transcript files (raw or preprocessed variants)
`DATA_EMBEDDINGS_PATH`	path	`data/embeddings/huggingface_qwen3_8b_paper_train.npz`	Full-path override for reference embeddings (takes precedence over `EMBEDDING_EMBEDDINGS_FILE`)
`DATA_TRAIN_CSV`	path	`data/train_split_Depression_AVEC2017.csv`	Training ground truth
`DATA_DEV_CSV`	path	`data/dev_split_Depression_AVEC2017.csv`	Development ground truth

Directory structure expected:

    data/
    ├── transcripts/
    │   ├── 300_P/
    │   │   └── 300_TRANSCRIPT.csv
    │   └── .../
    ├── transcripts_participant_only/                 # optional (recommended for retrieval/embeddings)
    │   ├── 300_P/300_TRANSCRIPT.csv
    │   └── ...
    ├── embeddings/
    │   ├── huggingface_qwen3_8b_paper_train_participant_only.npz         # participant-only reference KB (paper-train)
    │   ├── huggingface_qwen3_8b_paper_train_participant_only.json
    │   ├── huggingface_qwen3_8b_paper_train_participant_only.meta.json   # provenance metadata (backend/model/dim/chunking)
    │   ├── huggingface_qwen3_8b_paper_train_participant_only.tags.json   # optional per-chunk PHQ-8 item tags (Spec 34)
    │   ├── huggingface_qwen3_8b_paper_train_participant_only.chunk_scores.json
    │   ├── huggingface_qwen3_8b_paper_train_participant_only.chunk_scores.meta.json
    │   ├── paper_reference_embeddings.npz               # legacy/compat filename (paper-train)
    │   ├── paper_reference_embeddings.json
    │   └── paper_reference_embeddings.meta.json         # provenance metadata (legacy/compat)
    ├── train_split_Depression_AVEC2017.csv
    └── dev_split_Depression_AVEC2017.csv

Example:

# Custom data location
DATA_BASE_DIR=/mnt/datasets/daic-woz
DATA_TRANSCRIPTS_DIR=/mnt/datasets/daic-woz/transcripts

Logging Settings

Structured logging configuration.

Variable	Type	Default	Options
`LOG_LEVEL`	string	`INFO`	`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
`LOG_FORMAT`	string	`json`	`json`, `console`
`LOG_INCLUDE_TIMESTAMP`	bool	`true`	Add timestamp to logs
`LOG_INCLUDE_CALLER`	bool	`true`	Add file:line info

Formats: - json: Structured JSON for production/parsing - console: Human-readable for development

Example:

# Debug mode with readable output
LOG_LEVEL=DEBUG
LOG_FORMAT=console

Sample output:

{"event": "Starting qualitative assessment", "participant_id": 300, "word_count": 1234, "level": "info", "timestamp": "2025-12-21T10:00:00Z"}

API Settings

HTTP server configuration.

Variable	Type	Default	Description
`API_HOST`	string	`0.0.0.0`	Bind address
`API_PORT`	int	`8000`	Server port
`API_RELOAD`	bool	`false`	Hot reload (dev only)
`API_WORKERS`	int	`1`	Worker processes (1-16)
`API_CORS_ORIGINS`	list	`["*"]`	Allowed CORS origins

API_CORS_ORIGINS exists in configuration, but server.py does not currently install FastAPI/Starlette CORSMiddleware. If you need CORS today, configure it at a reverse proxy (recommended) or add CORSMiddleware in server.py.

Example:

# Production settings
API_HOST=0.0.0.0
API_PORT=8080
API_WORKERS=4
API_CORS_ORIGINS=["https://myapp.com"]

# Development settings
API_RELOAD=true
API_WORKERS=1

Quantitative Assessment Settings

These settings control the quantitative assessment behavior (evidence extraction + scoring):

Variable	Type	Default	Description
`QUANTITATIVE_TRACK_NA_REASONS`	bool	`true`	Track why items return N/A
`QUANTITATIVE_EVIDENCE_QUOTE_VALIDATION_ENABLED`	bool	`true`	Enable evidence grounding validation (Spec 053)
`QUANTITATIVE_EVIDENCE_QUOTE_VALIDATION_MODE`	string	`substring`	Validation mode: `substring` (exact) or `fuzzy` (requires rapidfuzz)
`QUANTITATIVE_EVIDENCE_QUOTE_FUZZY_THRESHOLD`	float	`0.85`	Fuzzy matching threshold (0.0-1.0)
`QUANTITATIVE_EVIDENCE_QUOTE_FAIL_ON_ALL_REJECTED`	bool	`false`	Fail participant if ALL quotes rejected (strict mode)
`QUANTITATIVE_EVIDENCE_QUOTE_LOG_REJECTIONS`	bool	`true`	Log rejected quotes for debugging

Evidence Grounding (Spec 053): Validates that LLM-extracted evidence quotes actually appear in the source transcript. Prevents hallucinated quotes from contaminating few-shot retrieval.

Example:

# Enable fuzzy matching for better recall (requires rapidfuzz)
QUANTITATIVE_EVIDENCE_QUOTE_VALIDATION_MODE="fuzzy"
QUANTITATIVE_EVIDENCE_QUOTE_FUZZY_THRESHOLD=0.85

Consistency Sampling Settings

Multi-sample scoring for agreement-based confidence signals (Spec 050).

Variable	Type	Default	Description
`CONSISTENCY_ENABLED`	bool	`false`	Enable multi-sample consistency scoring
`CONSISTENCY_N_SAMPLES`	int	`5`	Number of samples per item
`CONSISTENCY_TEMPERATURE`	float	`0.2`	Sampling temperature for consistency (must be >0 for variance)

Temperature Rationale (BUG-027):

Temperature	Purpose	Use Case
`0.0`	Deterministic	Primary inference (all agents)
`0.2`	Low-variance	Consistency sampling (clinical best practice)
`0.3+`	Higher-variance	Not recommended for clinical tasks

Research Evidence: - 2025 clinical studies define 0.2 as "low" temperature threshold - GPT-4 depression study notes performance becomes "unpredictable" at ≥0.3 - Self-consistency requires non-zero temperature for sample diversity

Example:

# Enable consistency scoring (recommended for confidence calibration)
CONSISTENCY_ENABLED=true
CONSISTENCY_N_SAMPLES=5
CONSISTENCY_TEMPERATURE=0.2  # Clinical best practice (BUG-027)

Disabling:

# Disable for faster baseline runs (no confidence signals)
CONSISTENCY_ENABLED=false

See Also: Agent Sampling Registry for full temperature rationale.

Feature Flags

System-wide toggles.

Variable	Type	Default	Description
`ENABLE_FEW_SHOT`	bool	`true`	Use embedding-based few-shot

Note: ENABLE_FEW_SHOT=true requires pre-computed embeddings (resolved from DATA_EMBEDDINGS_PATH or EMBEDDING_EMBEDDINGS_FILE).

Pydantic AI Settings

Structured validation + automatic retries for agent outputs (Spec 13).

Variable	Type	Default	Description
`PYDANTIC_AI_ENABLED`	bool	`true`	Enable Pydantic AI `TextOutput` validation + retry loop
`PYDANTIC_AI_RETRIES`	int	`5`	Retry count when validation fails (`0` disables retries); increased from 3 per Spec 058
`PYDANTIC_AI_TIMEOUT_SECONDS`	float	unset	Timeout override for Pydantic AI calls (unset = library default)

Notes: - This preserves existing prompt formats (e.g., <thinking>...</thinking> + <answer>...</answer>) and adds validation after generation. - Legacy parsing fallbacks are disabled (fail-fast research behavior). If PYDANTIC_AI_ENABLED=false, agents will raise because no legacy path exists.

Timeout Notes (BUG-027): - Unset PYDANTIC_AI_TIMEOUT_SECONDS uses the pydantic_ai library default (600s). - Set PYDANTIC_AI_TIMEOUT_SECONDS=3600 for 1-hour research runs on throttled GPUs. - If only one of {PYDANTIC_AI_TIMEOUT_SECONDS, OLLAMA_TIMEOUT_SECONDS} is set, Settings syncs the other to match; if both are set and differ, a warning is emitted.

Nested Delimiter

Most configuration uses the explicit group prefixes shown above (e.g., MODEL_TEMPERATURE, OLLAMA_HOST). For advanced settings management, Pydantic also supports nested environment variables using double underscores:

# Set nested values
MODEL__TEMPERATURE=0.5
EMBEDDING__TOP_K_REFERENCES=3

`.env.example`

See the repo-root .env.example for an up-to-date template, including: - Separate LLM_BACKEND (chat) and EMBEDDING_BACKEND (embeddings) - Reference embeddings selection via EMBEDDING_EMBEDDINGS_FILE / DATA_EMBEDDINGS_PATH

Programmatic Access

from ai_psychiatrist.config import get_settings, Settings

# Get singleton settings
settings = get_settings()

# Access nested groups
print(settings.ollama.base_url)
print(settings.model.quantitative_model)
print(settings.embedding.dimension)
print(settings.feedback.max_iterations)

# Direct instantiation (for testing)
custom = Settings(
    ollama=OllamaSettings(host="custom-host"),
    model=ModelSettings(temperature=0.0),
)

Validation

Settings are validated on load:

# Port range validation
OLLAMA_PORT=99999  # Error: ge=1, le=65535

# Temperature validation
MODEL_TEMPERATURE=3.0  # Error: ge=0.0, le=2.0

# Chunk size validation
EMBEDDING_CHUNK_SIZE=1  # Error: ge=2, le=20

Warnings: - Missing data directories log warnings but don't fail - Few-shot enabled without embeddings logs warning

Environment-Specific Configs

Development

LOG_LEVEL=DEBUG
LOG_FORMAT=console
API_RELOAD=true
FEEDBACK_MAX_ITERATIONS=3  # Faster iteration

Testing

# Tests automatically set TESTING=1 which skips .env loading
# Use code defaults for reproducibility

Production

LOG_LEVEL=INFO
LOG_FORMAT=json
API_WORKERS=4
API_CORS_ORIGINS=["https://production-domain.com"]
OLLAMA_TIMEOUT_SECONDS=600