Spec 057: Embedding Dimension Invariants (Fail Fast)
Status: Implemented (PR #92, 2026-01-03)
Priority: Medium
Complexity: Low
Related: PIPELINE-BRITTLENESS.md, Spec 055
SSOT (Implemented)
- Code:
src/ai_psychiatrist/config.py(EmbeddingSettings.allow_insufficient_dimension_embeddings) - Wire-up (load-time):
src/ai_psychiatrist/services/reference_store.py(ReferenceStore._combine_and_normalize()) - Wire-up (generation-time):
scripts/generate_embeddings.py(strictlen(embedding) == dimension, skip reasons in--allow-partial) - Tests:
tests/unit/services/test_reference_store.py,tests/unit/scripts/test_generate_embeddings_fail_fast.py,tests/unit/services/test_embedding.py
Problem Statement
When an embedding backend returns vectors with fewer dimensions than the configured
EMBEDDING_DIMENSION (default: 4096), few-shot retrieval can degrade in ways that are
hard to diagnose:
- Reference chunks may be skipped (reducing the reference corpus)
- Similarity rankings may become unstable across runs
- Few-shot can “quietly” behave like zero-shot on affected items (fewer usable references)
This spec enforces dimension invariants so these failures become explicit and actionable.
Previous Behavior (Fixed)
Generation-time (scripts/generate_embeddings.py)
- The script requests a target dimension via
EmbeddingRequest(dimension=...). - Backends truncate with slicing (e.g.,
embedding = embedding[:dimension]). - If the backend returns fewer dims than requested, the slice returns a shorter vector.
- The script does not currently assert
len(embedding) == dimensionbefore writing.npz. - The
.meta.jsoncurrently stores"dimension": config.dimensioneven if a backend returns fewer dims.
Net effect: it is possible to generate an artifact whose metadata says “4096” while some/all vectors are shorter.
Load-time (src/ai_psychiatrist/services/reference_store.py)
In ReferenceStore._combine_and_normalize():
- If
embedding_len < expected_dim: - If “alignment is required” (tag filtering enabled or chunk-score source is
chunk): raiseEmbeddingDimensionMismatchError(expected, actual). - Otherwise: log a warning and skip the chunk, and later log an error summary if any were skipped.
- If all chunks were skipped, it raises (BUG-009 safeguard).
Net effect: partial dimension mismatches can reduce the reference corpus without failing.
Implemented Solution
Enforce these invariants:
- Artifacts generated by our scripts must contain vectors of exactly
EMBEDDING_DIMENSION. - Runtime loading must fail if any reference chunk has
embedding_len < EMBEDDING_DIMENSION, unless the user explicitly opts into a debugging escape hatch.
This is consistent with the repo’s “fail loudly over silent corruption” posture (ANALYSIS-026).
Implementation
1) Add a config escape hatch (EmbeddingSettings)
Add to src/ai_psychiatrist/config.py:EmbeddingSettings:
allow_insufficient_dimension_embeddings: bool = Field(default=False, ...)
Semantics:
- False (default): raise on any embedding_len < expected_dim.
- True: allow “skip chunk with warning” behavior for debugging/forensics only.
Environment variable:
- EMBEDDING_ALLOW_INSUFFICIENT_DIMENSION_EMBEDDINGS=false
2) Enforce invariant at generation time (scripts/generate_embeddings.py)
After each embedding generation call:
- If
len(embedding) != config.dimension: - In strict mode (default): raise
EmbeddingGenerationError(...)and abort without writing artifacts. - In
--allow-partialmode: skip that chunk, increment skip counters, and record the skip reason in the.partial.jsonmanifest asdimension_mismatch.
Also add metadata diagnostics:
- actual_dimension_min
- actual_dimension_max
- dimension_mismatch_count
This makes it impossible to produce a “dimension-lied” artifact without explicitly opting in to partial mode.
3) Enforce invariant at load time (ReferenceStore)
In ReferenceStore._combine_and_normalize():
- Keep the existing behavior that raises immediately when alignment is required.
- When alignment is not required, change the behavior:
- If
allow_insufficient_dimension_embeddingsis false: raiseEmbeddingDimensionMismatchError. - If true: keep the existing skip-with-warning behavior.
4) Logging / Privacy
- Never log chunk text or transcript content.
- Log only:
participant_id,chunk_index,expected_dim,actual_dim, and artifact identifiers/paths.
Testing (TDD)
Unit: generation-time enforcement
Add tests around scripts/generate_embeddings.py helpers:
- A mock embedding client returning vectors shorter than
dimension: - strict mode: script errors and no final
.npz/.json/.meta.jsonis produced --allow-partial: produces.partial.jsonlistingdimension_mismatchskips and exits with code 2
Unit: load-time enforcement
Add tests for ReferenceStore._combine_and_normalize() using a temp .npz + .json fixture:
- One correct vector and one short vector:
- default config: raises
EmbeddingDimensionMismatchError - with
allow_insufficient_dimension_embeddings=true: loads only the valid chunk and logs a warning
Regression
- Fully matching artifacts load normally (no behavior change).
Migration Guide
If you hit this failure:
- Confirm you are using the intended embedding backend/model (HuggingFace FP16 vs Ollama).
- Regenerate embeddings:
uv run python scripts/generate_embeddings.py --split paper-train --backend huggingface- Only if you are debugging legacy artifacts:
- set
EMBEDDING_ALLOW_INSUFFICIENT_DIMENSION_EMBEDDINGS=truetemporarily
Success Criteria
- No reference chunks are silently skipped due to insufficient embedding dimension in default configuration.
- Dimension mismatches fail fast at artifact generation time (before writing
.npz). - When the escape hatch is enabled, skips are explicit (warnings +
.partial.jsonreason codes).