Spec 064: Retrieval Audit Redaction (No Transcript Text in Logs)
Status: IMPLEMENTED Created: 2026-01-06 Implemented: 2026-01-06 Priority: P1 (privacy/compliance + shareable artifacts)
Problem
When retrieval audit logging is enabled (EMBEDDING_ENABLE_RETRIEVAL_AUDIT=true), the pipeline
currently logs a chunk_preview field derived from the reference chunk text. If those references
come from DAIC-WOZ, this can leak restricted transcript content into logs and run artifacts.
This is an observability feature, but it must be privacy-safe by construction.
Requirements
- No raw transcript text in retrieval audit logs
- Remove
chunk_preview(or any equivalent preview) from theretrieved_referencelog event. -
Never emit any field containing raw chunk text.
-
Keep audit usefulness via safe identifiers
- Log
chunk_hash(stable short SHA-256 prefix of the chunk text). - Keep
chunk_chars(length only). -
Keep existing metadata:
participant_id,item,rank,similarity,reference_score. -
Backwards compatibility
- The log event name (
retrieved_reference) stays the same. -
Downstream tooling/docs updated to reference the new fields.
-
Deterministic and idempotent
- Hashing must be stable across runs and machines (same text → same hash).
Implementation Plan (TDD)
Step 1: Unit test (RED)
Update tests/unit/services/test_embedding.py:
TestEmbeddingService::test_build_reference_bundle_logs_audit_when_enabled- Assert
chunk_hash/chunk_charsare present andchunk_previewis absent. - Assert raw chunk text does not appear in structured log fields.
Step 2: Code change (GREEN)
In src/ai_psychiatrist/services/embedding.py:
- Replace
chunk_preview=match.chunk.text[:160]with: chunk_hash=stable_text_hash(match.chunk.text)- Keep
chunk_chars=len(match.chunk.text)
Step 3: Doc updates
Update any non-archive docs that mention chunk_preview to match the new safe fields:
docs/rag/debugging.mddocs/configs/configuration-philosophy.md(if it enumerates audit fields)
Step 4: Verification
make ciuv run mkdocs build --strict
Definition of Done
- Retrieval audit logs contain no raw chunk text.
chunk_hashis present and stable (SHA-256 prefix viastable_text_hash).- All tests pass; MkDocs strict build produces no new warnings in non-archive docs.