Keywords: depression screening, PHQ-8, zero-shot learning, large language models, RISEN framework, DAIC-WOZ, digital mental health
Keynote: Zero-shot large language models can infer PHQ-8 item scores from interview transcripts with encouraging accuracy, offering a practical complement to traditional self-report screening.
Mental health screening often begins with self-report questionnaires such as the PHQ-8. These tools are widely used in clinics; however, responses can be influenced by stigma, recall issues, and varying levels of self-awareness. Large language models (LLMs) introduce a different route: analysing natural language from clinical interviews to estimate symptom patterns in a way that is scalable and easy to deploy across digital platforms.
In this study, seven out-of-the-box models from the GPT, Llama, Cohere and Gemini families were asked to predict PHQ-8 items directly from semi-structured interview transcripts. A structured prompt design (RISEN: Role, Instruction, Steps, End goal, Narrowing) guided each model to produce either Likert-scale predictions (0–3) or binary decisions indicating the presence or absence of each symptom. Crucially, no task-specific fine-tuning was used, keeping the workflow closer to how services might adopt LLMs in practice.
Across 100 interview sessions, GPT-4o delivered the strongest overall results, with a particularly solid performance on items reflecting emotional and cognitive states, such as low mood, sleep disturbance, tiredness, and difficulties concentrating. Llama 3 showed a relative advantage for anhedonia, while Cohere stood out on psychomotor changes. This pattern hints that a model ensemble could capture a broader picture of depressive features than any single system alone.
Evaluation went beyond raw accuracy to include F1 score and Matthews correlation, acknowledging class imbalance in real-world mental health data. The findings suggest that zero-shot LLMs can approximate item-level screening with reasonable fidelity, which could aid in triaging cases for follow-up and reduce patient burden when repeated questionnaires are impractical.
Caveats remain. Interview transcripts are subjective, and the PHQ-8 itself is a self-report scale; therefore, model outputs should be treated as decision support, rather than a diagnosis. Performance can vary across demographics and contexts, so bias monitoring, privacy safeguards and clinician oversight are essential. Explainability methods—such as attention heatmaps or feature attributions—would help practitioners understand what cues drive predictions and increase trust in deployment.
Bottom line: LLMs show promise as a lightweight, scalable aid for automated depression screening from text. Used thoughtfully within a human-in-the-loop pathway, they could speed access to care, prioritise follow-up, and complement existing assessments while keeping clinical judgement at the centre.
Reference: Teferra, B.G., Perivolaris, A., Hsiang, W.-N., Sidharta, C.K., Rueda, A., Parkington, K., Wu, Y., Soni, A., Samavi, R., Jetly, R., Zhang, Y., Cao, B., Rambhatla, S., Krishnan, S., & Bhat, V. (2025). Leveraging large language models for automated depression screening. PLOS Digital Health, 4(7), e0000943 https://doi.org/10.1371/journal.pdig.0000943
Disclaimer
The information presented in this article is for research communication and educational purposes only. It summarises findings from the study Leveraging large language models for automated depression screening (Teferra et al., 2025) and is not intended to provide medical advice, diagnosis, or treatment. Large language model (LLM) outputs described here are experimental and should not be relied upon as a substitute for professional mental health assessment or care. Any depression screening—whether via self-report questionnaires such as the PHQ-8, AI-based methods, or other tools—must be interpreted by qualified clinicians within the context of a comprehensive evaluation. Individuals who are experiencing low mood, distress, or other mental health concerns should seek help from a licensed healthcare provider. If you are in crisis, contact your local emergency number or a suicide prevention helpline immediately.