Spec 057: Embedding Dimension Invariants (Fail Fast)

Status: Implemented (PR #92, 2026-01-03) Priority: Medium Complexity: Low Related: PIPELINE-BRITTLENESS.md, Spec 055

SSOT (Implemented)

Code: src/ai_psychiatrist/config.py (EmbeddingSettings.allow_insufficient_dimension_embeddings)
Wire-up (load-time): src/ai_psychiatrist/services/reference_store.py (ReferenceStore._combine_and_normalize())
Wire-up (generation-time): scripts/generate_embeddings.py (strict len(embedding) == dimension, skip reasons in --allow-partial)
Tests: tests/unit/services/test_reference_store.py, tests/unit/scripts/test_generate_embeddings_fail_fast.py, tests/unit/services/test_embedding.py

Problem Statement

When an embedding backend returns vectors with fewer dimensions than the configured EMBEDDING_DIMENSION (default: 4096), few-shot retrieval can degrade in ways that are hard to diagnose:

Reference chunks may be skipped (reducing the reference corpus)
Similarity rankings may become unstable across runs
Few-shot can “quietly” behave like zero-shot on affected items (fewer usable references)

This spec enforces dimension invariants so these failures become explicit and actionable.

Previous Behavior (Fixed)

Generation-time (`scripts/generate_embeddings.py`)

The script requests a target dimension via EmbeddingRequest(dimension=...).
Backends truncate with slicing (e.g., embedding = embedding[:dimension]).
If the backend returns fewer dims than requested, the slice returns a shorter vector.
The script does not currently assert len(embedding) == dimension before writing .npz.
The .meta.json currently stores "dimension": config.dimension even if a backend returns fewer dims.

Net effect: it is possible to generate an artifact whose metadata says “4096” while some/all vectors are shorter.

Load-time (`src/ai_psychiatrist/services/reference_store.py`)

In ReferenceStore._combine_and_normalize():

If embedding_len < expected_dim:
If “alignment is required” (tag filtering enabled or chunk-score source is chunk): raise EmbeddingDimensionMismatchError(expected, actual).
Otherwise: log a warning and skip the chunk, and later log an error summary if any were skipped.
If all chunks were skipped, it raises (BUG-009 safeguard).

Net effect: partial dimension mismatches can reduce the reference corpus without failing.

Implemented Solution

Enforce these invariants:

Artifacts generated by our scripts must contain vectors of exactly EMBEDDING_DIMENSION.
Runtime loading must fail if any reference chunk has embedding_len < EMBEDDING_DIMENSION, unless the user explicitly opts into a debugging escape hatch.

This is consistent with the repo’s “fail loudly over silent corruption” posture (ANALYSIS-026).

Implementation

1) Add a config escape hatch (EmbeddingSettings)

Add to src/ai_psychiatrist/config.py:EmbeddingSettings:

allow_insufficient_dimension_embeddings: bool = Field(default=False, ...)

Semantics: - False (default): raise on any embedding_len < expected_dim. - True: allow “skip chunk with warning” behavior for debugging/forensics only.

Environment variable: - EMBEDDING_ALLOW_INSUFFICIENT_DIMENSION_EMBEDDINGS=false

2) Enforce invariant at generation time (`scripts/generate_embeddings.py`)

After each embedding generation call:

If len(embedding) != config.dimension:
In strict mode (default): raise EmbeddingGenerationError(...) and abort without writing artifacts.
In --allow-partial mode: skip that chunk, increment skip counters, and record the skip reason in the .partial.json manifest as dimension_mismatch.

Also add metadata diagnostics: - actual_dimension_min - actual_dimension_max - dimension_mismatch_count

This makes it impossible to produce a “dimension-lied” artifact without explicitly opting in to partial mode.

3) Enforce invariant at load time (`ReferenceStore`)

In ReferenceStore._combine_and_normalize():

Keep the existing behavior that raises immediately when alignment is required.
When alignment is not required, change the behavior:
If allow_insufficient_dimension_embeddings is false: raise EmbeddingDimensionMismatchError.
If true: keep the existing skip-with-warning behavior.

4) Logging / Privacy

Never log chunk text or transcript content.
Log only: participant_id, chunk_index, expected_dim, actual_dim, and artifact identifiers/paths.

Testing (TDD)

Unit: generation-time enforcement

Add tests around scripts/generate_embeddings.py helpers:

A mock embedding client returning vectors shorter than dimension:
strict mode: script errors and no final .npz/.json/.meta.json is produced
--allow-partial: produces .partial.json listing dimension_mismatch skips and exits with code 2

Unit: load-time enforcement

Add tests for ReferenceStore._combine_and_normalize() using a temp .npz + .json fixture:

One correct vector and one short vector:
default config: raises EmbeddingDimensionMismatchError
with allow_insufficient_dimension_embeddings=true: loads only the valid chunk and logs a warning

Regression

Fully matching artifacts load normally (no behavior change).

Migration Guide

If you hit this failure:

Confirm you are using the intended embedding backend/model (HuggingFace FP16 vs Ollama).
Regenerate embeddings:
uv run python scripts/generate_embeddings.py --split paper-train --backend huggingface
Only if you are debugging legacy artifacts:
set EMBEDDING_ALLOW_INSUFFICIENT_DIMENSION_EMBEDDINGS=true temporarily

Success Criteria

No reference chunks are silently skipped due to insufficient embedding dimension in default configuration.
Dimension mismatches fail fast at artifact generation time (before writing .npz).
When the escape hatch is enabled, skips are explicit (warnings + .partial.json reason codes).