AI Psychiatrist Model Registry

Last Updated: 2026-01-02 Purpose: Validated, reproducible model configuration for this repo.

Quick Reference: Which Setup Should I Use?

Chat and embeddings can be configured separately via: - LLM_BACKEND (chat models for agents) - EMBEDDING_BACKEND (embeddings only)

Setup	Chat Backend (`LLM_BACKEND`)	Embedding Backend (`EMBEDDING_BACKEND`)	Quality	Hardware Needed	Use Case
Default (Recommended)	Ollama	HuggingFace	Better similarity	16GB+ RAM + HF deps	Validated configuration (recommended)
Legacy Baseline (Pure Ollama)	Ollama	Ollama	Good	Any Mac/Linux	No HF deps; lower-quality similarity
High Quality (Full HF)	HuggingFace	HuggingFace	Best	32GB+ RAM, CUDA/MPS	Best possible MAE
Development	Ollama	Ollama	Fast	Any	Quick iteration

Note: The codebase intentionally fails fast when a configured backend can’t run; there is no automatic fallback (see model-wiring.md).

Baseline Models (Paper-Referenced)

These models are referenced by the paper and are the default starting point in this repo. We recommend the QAT variant (gemma3:27b-it-qat) for faster local runs.

Role	Model family	Params	Ollama tag	Paper reference	Notes
Qualitative Agent	Gemma 3	27B	`gemma3:27b` or `gemma3:27b-it-qat`	Section 2.2	Used for qualitative assessment
Judge Agent	Gemma 3	27B	`gemma3:27b` or `gemma3:27b-it-qat`	Section 2.2	Used for feedback loop
Meta-Review Agent	Gemma 3	27B	`gemma3:27b` or `gemma3:27b-it-qat`	Section 2.2	Used for final review
Quantitative Agent	Gemma 3	27B	`gemma3:27b` or `gemma3:27b-it-qat`	Section 2.2	Default (see MedGemma note below)
Embedding	Qwen3 Embedding	8B	`qwen3-embedding:8b`	Section 2.2	4096-dim embeddings (Appendix D)

Quantization Note

The paper authors likely used full-precision BF16 weights. Both Ollama variants are quantized: - gemma3:27b - Standard Ollama GGUF quantization (Q4_K_M) - gemma3:27b-it-qat - QAT (Quantization-Aware Training) optimized, faster inference

Both are acceptable for reproduction. Use -it-qat for faster runs, or 27b for closer naming parity with the paper.

Approximate disk for baseline pulls: ~32 GB.

MedGemma Note (Appendix F)

The paper's Appendix F evaluates MedGemma 27B as an alternative for the quantitative agent: - Better item-level MAE: 0.505 vs 0.619 (18% improvement) - BUT produces more N/A: "fewer predictions overall" - conservative on uncertain evidence

⚠️ Warning: There is NO official MedGemma in Ollama. The alibayram/medgemma:27b is a community upload with Q4_K_M quantization that may behave differently from official weights.

For official MedGemma, use HuggingFace (see below).

Ollama Compatibility Notes

qwen3-embedding:8b supports /api/embeddings and returns 4096 dimensions.
The legacy tag dengcao/Qwen3-Embedding-8B:Q8_0 does not support /api/embeddings in current Ollama. Avoid it for production.
If you switch embedding models, update EMBEDDING_DIMENSION to match the model output.

Development / Local Alternatives (Optional)

Use these for fast local testing only. They do not reproduce paper metrics.

Role	Model	Params	Ollama tag	Embedding dim
All Agents (chat)	Gemma 2	9B	`gemma2:9b`	-
Embedding (fast)	mxbai-embed-large	335M	`mxbai-embed-large`	1024
Embedding (small)	Nomic Embed Text	137M	`nomic-embed-text`	768

Installation Commands

Ollama (Baseline)

# Recommended (QAT-optimized, faster):
ollama pull gemma3:27b-it-qat
ollama pull qwen3-embedding:8b

# Alternative (standard quantization):
ollama pull gemma3:27b

Ollama (Development - smaller/faster)

ollama pull gemma2:9b
ollama pull mxbai-embed-large
ollama pull nomic-embed-text

HuggingFace Backend (Official Models)

For accessing official Google models (including MedGemma), use HuggingFace Transformers.

Official Model IDs

Canonical Name	HuggingFace Model ID	Access	Notes
`gemma3:27b`	`google/gemma-3-27b-it`	Open	Instruction-tuned; loaded via Transformers `AutoModelForCausalLM` in this repo
`medgemma:27b`	`google/medgemma-27b-text-it`	Gated	Text-only, use `AutoModelForCausalLM`
`qwen3-embedding:8b`	`Qwen/Qwen3-Embedding-8B`	Open	Use `SentenceTransformer` (see model card for evaluation details)

HuggingFace Installation

# Install the optional HuggingFace backend dependencies:
make dev
# Or, if installing via pip:
pip install "ai-psychiatrist[hf]"

Optional (quantization): - int8 quantization requires bitsandbytes support on your platform.

MedGemma Access (Gated Model)

MedGemma requires accepting Google's Health AI Developer Foundations terms:

Go to: https://huggingface.co/google/medgemma-27b-text-it
Log in to HuggingFace
Click "Accept" on the terms (instant approval)
Login via CLI: huggingface-cli login

HuggingFace Usage Examples

Chat Model (MedGemma/Gemma):

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "google/medgemma-27b-text-it",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/medgemma-27b-text-it")

Embedding Model (Qwen3):

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Qwen/Qwen3-Embedding-8B")
embeddings = model.encode(["Your text here"])

Configuration (.env)

Few-shot Embeddings Artifact Selection

Few-shot retrieval loads a precomputed artifact from {DATA_BASE_DIR}/embeddings/:

EMBEDDING_EMBEDDINGS_FILE selects {name}.npz + {name}.json (+ optional {name}.meta.json).
DATA_EMBEDDINGS_PATH overrides with a full .npz path.

If {name}.meta.json exists (all newly generated artifacts have it), the server validates backend/model/dimension/chunking against current config and fails fast on mismatch.

Default (Recommended)

# Backend selection (defaults to Ollama chat + HuggingFace embeddings)
LLM_BACKEND=ollama
EMBEDDING_BACKEND=huggingface

# Models (all default to gemma3:27b for chat, qwen3-embedding:8b for embeddings)
MODEL_QUALITATIVE_MODEL=gemma3:27b-it-qat
MODEL_JUDGE_MODEL=gemma3:27b-it-qat
MODEL_META_REVIEW_MODEL=gemma3:27b-it-qat
MODEL_QUANTITATIVE_MODEL=gemma3:27b-it-qat
MODEL_EMBEDDING_MODEL=qwen3-embedding:8b
EMBEDDING_DIMENSION=4096

# Embeddings artifact (recommended: participant-only)
# DATA_TRANSCRIPTS_DIR=data/transcripts_participant_only
# EMBEDDING_EMBEDDINGS_FILE=huggingface_qwen3_8b_paper_train_participant_only
#
# Only set if you want to override the default HF embeddings
# EMBEDDING_EMBEDDINGS_FILE=ollama_qwen3_8b_paper_train_participant_only

Legacy Baseline (Pure Ollama)

LLM_BACKEND=ollama
EMBEDDING_BACKEND=ollama
EMBEDDING_EMBEDDINGS_FILE=ollama_qwen3_8b_paper_train_participant_only

With MedGemma (Appendix F - HuggingFace backend required)

# Use the HuggingFace backend to access official MedGemma weights.
LLM_BACKEND=huggingface
MODEL_QUANTITATIVE_MODEL=medgemma:27b

High-Quality Setup (Recommended for Production)

For users with capable hardware (32GB+ RAM, Apple Silicon or NVIDIA GPU), use HuggingFace for best quality:

Why HuggingFace is Better

Component	Ollama	HuggingFace	Improvement
Chat (Quantitative)	`gemma3:27b` (Q4_K_M)	`google/medgemma-27b-text-it` (FP16)	18% better MAE (Appendix F)
Embeddings	`qwen3-embedding:8b` (Q4_K_M)	`Qwen/Qwen3-Embedding-8B` (FP16)	Higher precision similarity

Key insight: Ollama models are quantized (typically 4-bit GGUF; e.g., Q4_K_M for gemma3:27b and qwen3-embedding:8b, and QAT for gemma3:27b-it-qat). HuggingFace provides FP16/BF16 (16-bit) - 4x more precision.

High-Quality Configuration

# Option A: FP16 embeddings (keep chat on Ollama)
LLM_BACKEND=ollama
EMBEDDING_BACKEND=huggingface
MODEL_EMBEDDING_MODEL=qwen3-embedding:8b  # → Qwen/Qwen3-Embedding-8B

# Option B: Full HuggingFace (chat + embeddings)
# LLM_BACKEND=huggingface
# EMBEDDING_BACKEND=huggingface
# MODEL_QUANTITATIVE_MODEL=medgemma:27b    # → google/medgemma-27b-text-it (18% better MAE)

Requirements

Hardware: 32GB+ unified memory (Apple Silicon) or 24GB+ VRAM (NVIDIA)
Dependencies: pip install 'ai-psychiatrist[hf]'
MedGemma access: Accept terms at HuggingFace

Pending: Graceful Fallback

Issue #42 will add automatic fallback to Ollama if HuggingFace fails (missing deps, OOM, etc.).

Sources

Paper

_literature/markdown/ai_psychiatrist/ai_psychiatrist.md

Ollama

https://ollama.com/library/gemma3
https://ollama.com/library/qwen3-embedding
https://ollama.com/library/mxbai-embed-large
https://ollama.com/library/nomic-embed-text

HuggingFace (Official)

https://huggingface.co/google/gemma-3-27b-it
https://huggingface.co/google/medgemma-27b-text-it (Gated)
https://huggingface.co/Qwen/Qwen3-Embedding-8B