Objective To update a previous individual participant data meta-analysis and determine the accuracy of the Patient Health Questionnaire-9 (PHQ-9), the most commonly used depression screening tool in general practice, for detecting major depression overall and by study or participant subgroups.
Design Systematic review and individual participant data meta-analysis.
Data sources Medline, Medline In-Process, and Other Non-Indexed Citations via Ovid, PsycINFO, Web of Science searched through 9 May 2018.
Review methods Eligible studies administered the PHQ-9 and classified current major depression status using a validated semistructured diagnostic interview (designed for clinician administration), fully structured interview (designed for lay administration), or the Mini International Neuropsychiatric Interview (MINI; a brief interview designed for lay administration). A bivariate random effects meta-analytic model was used to obtain point and interval estimates of pooled PHQ-9 sensitivity and specificity at cut-off values 5-15, separately, among studies that used semistructured diagnostic interviews (eg, Structured Clinical Interview for Diagnostic and Statistical Manual), fully structured interviews (eg, Composite International Diagnostic Interview), and the MINI. Meta-regression was used to investigate whether PHQ-9 accuracy correlated with reference standard categories and participant characteristics.
Results Data from 44 503 total participants (27 146 additional from the update) were obtained from 100 of 127 eligible studies (42 additional studies; 79% eligible studies; 86% eligible participants). Among studies with a semistructured interview reference standard, pooled PHQ-9 sensitivity and specificity (95% confidence interval) at the standard cut-off value of ≥10, which maximised combined sensitivity and specificity, were 0.85 (0.79 to 0.89) and 0.85 (0.82 to 0.87), respectively. Specificity was similar across reference standards, but sensitivity in studies with semistructured interviews was 7-24% (median 21%) higher than with fully structured reference standards and 2-14% (median 11%) higher than with the MINI across cut-off values. Across reference standards and cut-off values, specificity was 0-10% (median 3%) higher for men and 0-12 (median 5%) higher for people aged 60 or older.
Conclusions Researchers and clinicians could use results to determine outcomes, such as total number of positive screens and false positive screens, at different PHQ-9 cut-off values for different clinical settings using the knowledge translation tool at www.depressionscreening100.com/phq.