LLM-based Multi-Agent System for Depression Assessment from Clinical Interviews
What is AI Psychiatrist?
AI Psychiatrist is an engineering-focused, reproducible implementation of a research paper that uses large language models (LLMs) in a multi-agent architecture to assess depression severity from clinical interview transcripts. The system analyzes interview transcripts and selectively predicts PHQ-8 item scores (0–3) when supported by evidence, otherwise abstaining (N/A), using a four-agent pipeline.
Clinical disclaimer: This repository is intended for paper reproduction and experimentation. It is not a medical device and should not be used for clinical diagnosis or treatment decisions.
Task validity note: PHQ-8 is a 2-week frequency self-report instrument, while DAIC-WOZ transcripts are not structured as PHQ administration. Transcript-only item-level scoring is often underdetermined; the system may return N/A and must be evaluated with coverage-aware metrics (AURC/AUGRC). See: Task Validity.
Key Features
Four-Agent Pipeline: Qualitative, Judge, Quantitative, and Meta-Review agents collaborate for comprehensive assessment
Embedding-Based Few-Shot Retrieval: Optional few-shot references; retrieval quality is controlled by guardrails, item-tag filtering, chunk-level score attachment, and CRAG validation (see results docs)
See CLAUDE.md in the repository root for development guidelines and commands.
# Quick development setup
makedev# Install dependencies + pre-commit hooks
maketest# Run all tests with coverage
makeci# Full CI pipeline (format, lint, typecheck, test)
License
Licensed under Apache 2.0. See LICENSE and NOTICE in the repository root for details and attribution.
This project is a clean-room reimplementation based on research from Georgia State University. See the paper for academic citation.