Skip to content

Error Handling and Fail-Fast Philosophy

Audience: Maintainers and researchers Last Updated: 2026-01-04

This repo prioritizes research-honest behavior: - broken features must not silently degrade - failures must be diagnosable - optional features must be truly optional (no hidden I/O)


Core Principles

1) Skip If Disabled, Crash If Broken (Spec 38)

If a feature is disabled: - do not read its files - do not validate its artifacts - do not warn about missing artifacts (because the feature is off)

If a feature is enabled: - missing artifacts → crash with a clear error - invalid artifacts → crash with a clear error

This prevents “runs that look successful” but silently used a different method.

2) Preserve Exception Types (Spec 39)

Do not catch Exception and rethrow ValueError(...). That masks whether a failure was: - a timeout - invalid JSON - missing file - schema mismatch

Instead: - log the error and error_type - re-raise the original exception

3) Fail-Fast Artifact Generation (Spec 40)

Embedding artifacts must be complete or the run is scientifically corrupted. Therefore: - embedding generation is strict by default - “partial output” is an explicit debug mode only


Pipeline Robustness (Specs 053-057)

These specs enforce fail-fast behavior at critical pipeline stages:

Spec What It Validates Where Failure Mode
053 Evidence grounding _extract_evidence() EvidenceGroundingError if all quotes ungrounded
054 Evidence schema _extract_evidence() EvidenceSchemaError on wrong types
055 Embedding validity Query/reference generation, similarity EmbeddingValidationError on NaN/Inf/zero
056 Failure observability Per-run failures_{run_id}.json artifact
057 Dimension invariants Reference store load EmbeddingDimensionMismatchError by default

SSOT: - Evidence validation: src/ai_psychiatrist/services/evidence_validation.py - Embedding validation: src/ai_psychiatrist/infrastructure/validation.py - Failure registry: src/ai_psychiatrist/infrastructure/observability.py


Failure Pattern Observability (Spec 056)

The FailureRegistry captures all failures with: - consistent taxonomy (by category, severity, stage) - per-run JSON artifacts (data/outputs/failures_{run_id}.json) - privacy-safe context (hashes + counts, never transcript text)

Initialization:

from ai_psychiatrist.infrastructure.observability import init_failure_registry
registry = init_failure_registry(run_id)

At end of run:

registry.print_summary()
registry.save(Path("data/outputs"))

Retry Telemetry (Spec 060)

The failure registry captures terminal failures (e.g., retry exhaustion), but runs can still be brittle even when they succeed.

Spec 060 adds a privacy-safe per-run telemetry artifact:

  • data/outputs/telemetry_{run_id}.json

It records: - PydanticAI retry triggers (ModelRetry) by extractor (extract_quantitative, etc.) - JSON repair path usage (tolerant_json_fixups, python-literal fallback, json-repair)

The telemetry file includes a capped event list (default cap: 5,000) and reports dropped_events if the cap is exceeded.

Telemetry must not include transcript text or raw LLM outputs (hashes + counts only).


Where Silent Fallbacks Are Allowed

Silent fallbacks are generally treated as research corruption.

The only allowed exceptions should be: - explicit debug modes (e.g., scripts/generate_embeddings.py --allow-partial) - explicitly documented, narrow "best-effort" helpers that cannot affect evaluation outputs

If a fallback changes an experiment's method, it must not be silent.


Practical Debugging Guidance

When a run fails: 1. Identify the highest-level failure boundary (script vs service vs agent). 2. Group by error_type in logs. 3. Check whether the failure is “enabled feature broken” (should crash) vs “disabled feature” (should not touch files).

See: RAG Debugging.