AI Psychiatrist Model Registry
Last Updated: 2026-01-02 Purpose: Validated, reproducible model configuration for this repo.
Quick Reference: Which Setup Should I Use?
Chat and embeddings can be configured separately via:
- LLM_BACKEND (chat models for agents)
- EMBEDDING_BACKEND (embeddings only)
| Setup | Chat Backend (LLM_BACKEND) |
Embedding Backend (EMBEDDING_BACKEND) |
Quality | Hardware Needed | Use Case |
|---|---|---|---|---|---|
| Default (Recommended) | Ollama | HuggingFace | Better similarity | 16GB+ RAM + HF deps | Validated configuration (recommended) |
| Legacy Baseline (Pure Ollama) | Ollama | Ollama | Good | Any Mac/Linux | No HF deps; lower-quality similarity |
| High Quality (Full HF) | HuggingFace | HuggingFace | Best | 32GB+ RAM, CUDA/MPS | Best possible MAE |
| Development | Ollama | Ollama | Fast | Any | Quick iteration |
Note: The codebase intentionally fails fast when a configured backend can’t run; there is no automatic fallback (see model-wiring.md).
Baseline Models (Paper-Referenced)
These models are referenced by the paper and are the default starting point in this repo.
We recommend the QAT variant (gemma3:27b-it-qat) for faster local runs.
| Role | Model family | Params | Ollama tag | Paper reference | Notes |
|---|---|---|---|---|---|
| Qualitative Agent | Gemma 3 | 27B | gemma3:27b or gemma3:27b-it-qat |
Section 2.2 | Used for qualitative assessment |
| Judge Agent | Gemma 3 | 27B | gemma3:27b or gemma3:27b-it-qat |
Section 2.2 | Used for feedback loop |
| Meta-Review Agent | Gemma 3 | 27B | gemma3:27b or gemma3:27b-it-qat |
Section 2.2 | Used for final review |
| Quantitative Agent | Gemma 3 | 27B | gemma3:27b or gemma3:27b-it-qat |
Section 2.2 | Default (see MedGemma note below) |
| Embedding | Qwen3 Embedding | 8B | qwen3-embedding:8b |
Section 2.2 | 4096-dim embeddings (Appendix D) |
Quantization Note
The paper authors likely used full-precision BF16 weights. Both Ollama variants are quantized:
- gemma3:27b - Standard Ollama GGUF quantization (Q4_K_M)
- gemma3:27b-it-qat - QAT (Quantization-Aware Training) optimized, faster inference
Both are acceptable for reproduction. Use -it-qat for faster runs, or 27b for closer naming parity with the paper.
Approximate disk for baseline pulls: ~32 GB.
MedGemma Note (Appendix F)
The paper's Appendix F evaluates MedGemma 27B as an alternative for the quantitative agent: - Better item-level MAE: 0.505 vs 0.619 (18% improvement) - BUT produces more N/A: "fewer predictions overall" - conservative on uncertain evidence
⚠️ Warning: There is NO official MedGemma in Ollama. The alibayram/medgemma:27b is a community upload with Q4_K_M quantization that may behave differently from official weights.
For official MedGemma, use HuggingFace (see below).
Ollama Compatibility Notes
qwen3-embedding:8bsupports/api/embeddingsand returns 4096 dimensions.- The legacy tag
dengcao/Qwen3-Embedding-8B:Q8_0does not support/api/embeddingsin current Ollama. Avoid it for production. - If you switch embedding models, update
EMBEDDING_DIMENSIONto match the model output.
Development / Local Alternatives (Optional)
Use these for fast local testing only. They do not reproduce paper metrics.
| Role | Model | Params | Ollama tag | Embedding dim |
|---|---|---|---|---|
| All Agents (chat) | Gemma 2 | 9B | gemma2:9b |
- |
| Embedding (fast) | mxbai-embed-large | 335M | mxbai-embed-large |
1024 |
| Embedding (small) | Nomic Embed Text | 137M | nomic-embed-text |
768 |
Installation Commands
Ollama (Baseline)
# Recommended (QAT-optimized, faster):
ollama pull gemma3:27b-it-qat
ollama pull qwen3-embedding:8b
# Alternative (standard quantization):
ollama pull gemma3:27b
Ollama (Development - smaller/faster)
ollama pull gemma2:9b
ollama pull mxbai-embed-large
ollama pull nomic-embed-text
HuggingFace Backend (Official Models)
For accessing official Google models (including MedGemma), use HuggingFace Transformers.
Official Model IDs
| Canonical Name | HuggingFace Model ID | Access | Notes |
|---|---|---|---|
gemma3:27b |
google/gemma-3-27b-it |
Open | Instruction-tuned; loaded via Transformers AutoModelForCausalLM in this repo |
medgemma:27b |
google/medgemma-27b-text-it |
Gated | Text-only, use AutoModelForCausalLM |
qwen3-embedding:8b |
Qwen/Qwen3-Embedding-8B |
Open | Use SentenceTransformer (see model card for evaluation details) |
HuggingFace Installation
# Install the optional HuggingFace backend dependencies:
make dev
# Or, if installing via pip:
pip install "ai-psychiatrist[hf]"
Optional (quantization):
- int8 quantization requires bitsandbytes support on your platform.
MedGemma Access (Gated Model)
MedGemma requires accepting Google's Health AI Developer Foundations terms:
- Go to: https://huggingface.co/google/medgemma-27b-text-it
- Log in to HuggingFace
- Click "Accept" on the terms (instant approval)
- Login via CLI:
huggingface-cli login
HuggingFace Usage Examples
Chat Model (MedGemma/Gemma):
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"google/medgemma-27b-text-it",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/medgemma-27b-text-it")
Embedding Model (Qwen3):
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Qwen/Qwen3-Embedding-8B")
embeddings = model.encode(["Your text here"])
Configuration (.env)
Few-shot Embeddings Artifact Selection
Few-shot retrieval loads a precomputed artifact from {DATA_BASE_DIR}/embeddings/:
EMBEDDING_EMBEDDINGS_FILEselects{name}.npz+{name}.json(+ optional{name}.meta.json).DATA_EMBEDDINGS_PATHoverrides with a full.npzpath.
If {name}.meta.json exists (all newly generated artifacts have it), the server validates backend/model/dimension/chunking against current config and fails fast on mismatch.
Default (Recommended)
# Backend selection (defaults to Ollama chat + HuggingFace embeddings)
LLM_BACKEND=ollama
EMBEDDING_BACKEND=huggingface
# Models (all default to gemma3:27b for chat, qwen3-embedding:8b for embeddings)
MODEL_QUALITATIVE_MODEL=gemma3:27b-it-qat
MODEL_JUDGE_MODEL=gemma3:27b-it-qat
MODEL_META_REVIEW_MODEL=gemma3:27b-it-qat
MODEL_QUANTITATIVE_MODEL=gemma3:27b-it-qat
MODEL_EMBEDDING_MODEL=qwen3-embedding:8b
EMBEDDING_DIMENSION=4096
# Embeddings artifact (recommended: participant-only)
# DATA_TRANSCRIPTS_DIR=data/transcripts_participant_only
# EMBEDDING_EMBEDDINGS_FILE=huggingface_qwen3_8b_paper_train_participant_only
#
# Only set if you want to override the default HF embeddings
# EMBEDDING_EMBEDDINGS_FILE=ollama_qwen3_8b_paper_train_participant_only
Legacy Baseline (Pure Ollama)
LLM_BACKEND=ollama
EMBEDDING_BACKEND=ollama
EMBEDDING_EMBEDDINGS_FILE=ollama_qwen3_8b_paper_train_participant_only
With MedGemma (Appendix F - HuggingFace backend required)
# Use the HuggingFace backend to access official MedGemma weights.
LLM_BACKEND=huggingface
MODEL_QUANTITATIVE_MODEL=medgemma:27b
High-Quality Setup (Recommended for Production)
For users with capable hardware (32GB+ RAM, Apple Silicon or NVIDIA GPU), use HuggingFace for best quality:
Why HuggingFace is Better
| Component | Ollama | HuggingFace | Improvement |
|---|---|---|---|
| Chat (Quantitative) | gemma3:27b (Q4_K_M) |
google/medgemma-27b-text-it (FP16) |
18% better MAE (Appendix F) |
| Embeddings | qwen3-embedding:8b (Q4_K_M) |
Qwen/Qwen3-Embedding-8B (FP16) |
Higher precision similarity |
Key insight: Ollama models are quantized (typically 4-bit GGUF; e.g., Q4_K_M for gemma3:27b and qwen3-embedding:8b, and QAT for gemma3:27b-it-qat). HuggingFace provides FP16/BF16 (16-bit) - 4x more precision.
High-Quality Configuration
# Option A: FP16 embeddings (keep chat on Ollama)
LLM_BACKEND=ollama
EMBEDDING_BACKEND=huggingface
MODEL_EMBEDDING_MODEL=qwen3-embedding:8b # → Qwen/Qwen3-Embedding-8B
# Option B: Full HuggingFace (chat + embeddings)
# LLM_BACKEND=huggingface
# EMBEDDING_BACKEND=huggingface
# MODEL_QUANTITATIVE_MODEL=medgemma:27b # → google/medgemma-27b-text-it (18% better MAE)
Requirements
- Hardware: 32GB+ unified memory (Apple Silicon) or 24GB+ VRAM (NVIDIA)
- Dependencies:
pip install 'ai-psychiatrist[hf]' - MedGemma access: Accept terms at HuggingFace
Pending: Graceful Fallback
Issue #42 will add automatic fallback to Ollama if HuggingFace fails (missing deps, OOM, etc.).
Sources
Paper
_literature/markdown/ai_psychiatrist/ai_psychiatrist.md
Ollama
- https://ollama.com/library/gemma3
- https://ollama.com/library/qwen3-embedding
- https://ollama.com/library/mxbai-embed-large
- https://ollama.com/library/nomic-embed-text
HuggingFace (Official)
- https://huggingface.co/google/gemma-3-27b-it
- https://huggingface.co/google/medgemma-27b-text-it (Gated)
- https://huggingface.co/Qwen/Qwen3-Embedding-8B