Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy

Deeks, Jonathan J. / Bossuyt, Patrick M. / Leeflang, Mariska M. / Takwoingi, Yemisi (Editor)

Wiley Cochrane Series

1. Edition July 2023
432 Pages, Hardcover
Wiley & Sons Ltd

ISBN: 978-1-119-75616-3

John Wiley & Sons

Wiley Online Library Sample Chapter

Further versions

Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy

A guide to conducting systematic reviews of test accuracy

In Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, a team of distinguished researchers deliver the official guide to preparing and maintaining systematic reviews of test accuracy in healthcare. This first edition of the Handbook contains guidance on understanding test accuracy measures, search strategies and study selection, understanding meta-analysis and risk of bias and applicability assessments, presentation of findings, and drawing conclusions.

Readers will also find:
* An introduction to test evaluation, including the purposes of medical testing, test accuracy and the impact of tests on patient outcomes
* Comprehensive explorations of the design of test accuracy studies, including discussions of reference standards and comparative test accuracy studies
* Considerations of the methods and presentation of systematic reviews of test accuracy
* Elaboration of study selection, data collection, and undertaking risk of bias and applicability assessments

Perfect for medical practitioners and clinicians, Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy will also benefit professionals in epidemiology and students in related fields.

Contributors xv

Preface xix

Part One About Cochrane Reviews of diagnostic test accuracy 1

1 Planning a Cochrane Review of diagnostic test accuracy 3

1.1 Introduction 4

1.2 Why do a systematic review of test accuracy? 4

1.3 Undertaking a Cochrane Review of diagnostic test accuracy 5

1.3.1 The role of the Diagnostic Test Accuracy Editorial Team 5

1.3.2 Expectations for the conduct and reporting of Cochrane Reviews of diagnostic test accuracy 5

1.3.3 Data management and quality assurance 6

1.3.4 Keeping the Review up to date 6

1.4 Proposing a new Cochrane Review of diagnostic test accuracy 6

1.5 Cochrane Protocols 7

1.6 The author team 11

1.6.1 The importance of the team 11

1.6.2 Criteria for authorship 12

1.6.3 Incorporating relevant perspectives and stakeholder involvement 12

1.7 Resources and support 13

1.7.1 Identifying resources and support 13

1.7.2 Funding and conflicts of interest 14

1.7.3 Training 14

1.7.4 Software 15

1.8 Chapter information 15

1.9 References 16

Part Two Introducing test accuracy 19

2 Evaluating medical tests 21

2.1 Introduction 21

2.2 Types of medical tests 22

2.3 Test accuracy 23

2.4 How do diagnostic tests affect patient outcomes? 24

2.4.1 Direct test effects 25

2.4.2 Altering clinical decisions and actions 25

2.4.3 Changes to time frames and populations 25

2.4.4 Influencing patient and clinician perceptions 26

2.5 Evaluations of test accuracy during test development 26

2.5.1 Evaluations of accuracy during biomarker discovery 26

2.5.2 Early evaluations of test accuracy 27

2.5.3 Clinical evaluations of test accuracy 28

2.6 Other purposes of medical testing 28

2.6.1 Predisposition 29

2.6.2 Risk stratification 29

2.6.3 Screening 29

2.6.4 Staging 29

2.6.5 Prognosis 30

2.6.6 Treatment selection 30

2.6.7 Treatment efficacy 31

2.6.8 Therapeutic monitoring 31

2.6.9 Surveillance for progression or recurrence 31

2.7 Chapter information 32

2.8 References 32

3 Understanding the design of test accuracy studies 35

3.1 Introduction 35

3.2 The basic design for a test accuracy study 36

3.3 Multiple groups of participants 39

3.4 Multiple reference standards 42

3.5 More on reference standards 44

3.5.1 Delayed verification 44

3.5.2 Composite reference standard 44

3.5.3 Panel- based reference 44

3.5.4 Latent class analysis 45

3.5.5 Gold standard 45

3.5.6 Clinical reference standard 45

3.6 Comparative test accuracy studies 45

3.6.1 Paired comparative accuracy study 46

3.6.2 Randomized comparative accuracy study 46

3.6.3 Non- randomized comparative accuracy study 47

3.7 Additional aspects of study designs 47

3.7.1 Prospective versus retrospective 48

3.7.2 Pragmatic versus explanatory 48

3.8 Concluding remarks 49

3.9 Chapter information 49

3.10 References 50

4 Understanding test accuracy measures 53

4.1 Introduction 53

4.2 Types of test data 54

4.3 Inconclusive index test results 55

4.4 Target condition 56

4.5 Analysis of a primary test accuracy study 56

4.5.1 Sensitivity and specificity 57

4.5.2 Predictive values 58

4.5.3 Proportion with the target condition 58

4.5.4 Pre- test and post- test probabilities 59

4.5.5 Interpretation of sensitivity, specificity and predictive values 59

4.5.6 Confidence intervals 60

4.5.7 Other test accuracy measures 61

4.6 Positivity thresholds 64

4.7 Receiver operating characteristic curves 66

4.8 Analysis of a comparative accuracy study 68

4.9 Chapter information 71

4.10 References 72

Part Three Methods and presentation of systematic reviews of test accuracy 73

5 Defining the review question 75

5.1 Introduction 75

5.2 Aims of systematic reviews of test accuracy 76

5.2.1 Investigations of heterogeneity 77

5.3 Identifying the clinical problem 77

5.3.1 Role of a new test 77

5.3.2 Defining the clinical pathway 80

5.3.3 Unclear and multiple clinical pathways 83

5.4 Defining the review question 84

5.4.1 Population 84

5.4.2 Index test(s) 85

5.4.3 Target condition 85

5.4.4 The review question: PIT 86

5.4.5 From review question to objectives 86

5.4.6 Broad versus narrow questions 87

5.5 Defining eligibility criteria 88

5.5.1 Types of studies 88

5.5.2 Participants 89

5.5.3 Index test(s) 90

5.5.4 Target condition 91

5.5.5 Reference standard 92

5.6 Chapter information 93

5.7 References 93

6 Searching for and selecting studies 97

6.1 Introduction 98

6.2 Searching for studies 98

6.2.1 Working in partnership 100

6.2.2 Advice for review teams that do not include an information specialist 101

6.3 Sources to search 101

6.3.1 Bibliographic databases 101

6.3.1.1 MEDLINE, PubMed and Embase 102

6.3.1.2 National and regional databases 103

6.3.1.3 Subject- specific databases 103

6.3.1.4 Dissertations and theses databases 104

6.3.2 Additional sources to search 104

6.3.2.1 Related reviews, guidelines and reference lists as sources of studies 105

6.3.2.2 Handsearching 105

6.3.2.3 Forward citation searching and co- citation searching 105

6.3.2.4 Web searching 106

6.3.2.5 Grey literature databases 107

6.3.2.6 Trial registries 107

6.3.2.7 Contacting colleagues, study authors and manufacturers 108

6.4 Designing search strategies 108

6.4.1 Structuring the search strategy 109

6.4.2 Controlled vocabulary and text words 110

6.4.3 Text word or keyword searching 112

6.4.4 Search filters 113

6.4.5 Language, date and type of document restrictions 113

6.4.6 Identifying fraudulent studies, other retracted publications, errata and comments 114

6.4.7 Minimizing the risk of bias through search methods 114

6.5 Documenting and reporting the search process 115

6.5.1 Documenting the search process 116

6.5.2 Reporting the search process 116

6.5.2.1 Reporting the search process in the protocol 116

6.5.2.2 Reporting the search process in the review 117

6.6 Selecting relevant studies 119

6.6.1 Examine full- text reports for compliance of studies with eligibility criteria 120

6.7 Future developments in literature searching and selection 121

6.8 Chapter information 121

6.9 References 122

7 Collecting data 131

7.1 Introduction 132

7.2 Sources of data 132

7.2.1 Studies (not reports) as the unit of interest 133

7.2.2 Correspondence with investigators 134

7.3 What data to collect 135

7.3.1 What are data? 135

7.3.2 Study methods (participant recruitment and sampling) 137

7.3.3 Participant characteristics and setting 138

7.3.4 Index test(s) 139

7.3.5 Target condition and reference standard 140

7.3.6 Flow and timing 140

7.3.7 Extracting study results and converting to the desired format 141

7.3.7.1 Obtaining 2×2 data from accuracy measures 141

7.3.7.2 Using global measures 144

7.3.7.3 Challenges defining reference standard positive and negative: strategies when there are more than two categories 145

7.3.7.4 Challenges defining index test positive and negative: inconclusive results 145

7.3.7.5 Challenges defining index test positive and negative: test failures 147

7.3.7.6 Challenges defining index test positive and negative: dealing with multiple thresholds and extracting data from ROC curves or other graphics 147

7.3.7.7 Extracting data from figures with software 148

7.3.7.8 Corrections for missing data: adjusting for partial verification bias 148

7.3.7.9 Multiple index tests from the same study 148

7.3.7.10 Subgroups of patients 150

7.3.7.11 Individual patient data 150

7.3.7.12 Extracting covariates 151

7.3.8 Other information to collect 151

7.4 Data collection tools 152

7.4.1 Rationale for data collection forms 152

7.4.2 Considerations in selecting data collection tools 152

7.4.3 Design of a data collection form 154

7.5 Extracting data from reports 157

7.5.1 Introduction 157

7.5.2 Who should extract data? 157

7.5.3 Training data extractors 158

7.5.4 Extracting data from multiple reports of the same study 158

7.5.5 Reliability and reaching consensus 159

7.5.6 Suspicions of scientific misconduct 159

7.5.7 Key points in planning and reporting data extraction 160

7.6 Managing and sharing data and tools 160

7.7 Chapter information 163

7.8 References 164

8 Assessing risk of bias and applicability 169

8.1 Introduction 170

8.2 Understanding bias and applicability 171

8.2.1 Bias and imprecision 171

8.2.2 Bias versus applicability 171

8.2.3 Biases in test accuracy studies: empirical evidence 172

8.3 Quadas- 2 173

8.3.1 Background 173

8.3.2 Risk- of- bias assessment 173

8.3.3 Applicability assessment 174

8.3.4 Using and tailoring QUADAS- 2 174

8.3.5 Flow diagram 174

8.3.6 Performing the QUADAS- 2 assessment 175

8.4 Domain 1: Participant selection 176

8.4.1 Participant selection: risk- of- bias signalling questions (QUADAS- 2) 176

8.4.2 Participant selection: additional signalling questions for comparative accuracy studies (QUADAS- C) 178

8.4.3 Participant selection: concerns regarding applicability 181

8.5 Domain 2: Index test 182

8.5.1 Index test: risk- of- bias signalling questions (QUADAS- 2) 182

8.5.2 Index test: additional signalling questions for comparative accuracy studies (QUADAS- C) 183

8.5.3 Index test: concerns regarding applicability 186

8.6 Domain 3: Reference standard 187

8.6.1 Reference standard: risk- of- bias signalling questions (QUADAS- 2) 187

8.6.2 Reference standard: additional signalling questions for comparative accuracy studies (QUADAS- C) 188

8.6.3 Reference standard: concerns regarding applicability 189

8.7 Domain 4: Flow and timing 191

8.7.1 Flow and timing: risk- of- bias signalling questions (QUADAS- 2) 191

8.7.2 Flow and timing: additional signalling questions for comparative accuracy studies (QUADAS- C) 193

8.8 Presentation of risk- of- bias and applicability assessments 196

8.9 Narrative summary of risk- of- bias and applicability assessments 197

8.10 Chapter information 197

8.11 References 198

9 Understanding meta- analysis 203

9.1 Introduction 203

9.1.1 Aims of meta- analysis for systematic reviews of test accuracy 204

9.1.2 When not to use a meta- analysis in a review 204

9.1.3 How does meta- analysis of diagnostic test accuracy differ from metaanalysis of interventions? 205

9.1.4 Questions that can be addressed in test accuracy analyses 206

9.1.4.1 What is the accuracy of a test? 206

9.1.4.2 How does the accuracy vary with clinical and methodological characteristics? 206

9.1.4.3 How does the accuracy of two or more tests compare? 206

9.1.5 Planning the analysis 207

9.2 Graphical and tabular presentation 208

9.2.1 Coupled forest plots 208

9.2.2 Summary ROC plots 208

9.2.3 Linked SROC plots 210

9.2.3.1 Example 1: Anti- CCP for the diagnosis of rheumatoid arthritis - descriptive plots 210

9.2.4 Tables of results 211

9.3 Meta- analytical summaries 211

9.3.1 Should I estimate an SROC curve or a summary point? 212

9.3.2 Heterogeneity 214

9.4 Fitting hierarchical models 215

9.4.1 Bivariate model 216

9.4.2 Example 1 continued: anti- CCP for the diagnosis of rheumatoid arthritis 217

9.4.3 The Rutter and Gatsonis HSROC model 219

9.4.4 Example 2: Rheumatoid factor as a marker for rheumatoid arthritis 220

9.4.5 Data reported at multiple thresholds per study 221

9.4.6 Investigating heterogeneity 222

9.4.6.1 Criteria for model selection 223

9.4.6.2 Heterogeneity and regression analysis using the bivariate model 223

9.4.6.3 Example 1 continued: Investigation of heterogeneity in diagnostic performance of anti- CCP 224

9.4.6.4 Heterogeneity and regression analysis using the Rutter and Gatsonis HSROC model 227

9.4.6.5 Example 2 continued: Investigating heterogeneity in diagnostic accuracy of rheumatoid factor (RF) 228

9.4.7 Comparing index tests 230

9.4.7.1 Test comparisons based on all available studies 230

9.4.7.2 Test comparisons using the bivariate model 231

9.4.7.3 Example 3: CT versus MRI for the diagnosis of coronary artery disease 232

9.4.7.4 Test comparisons using the Rutter and Gatsonis HSROC model 234

9.4.7.5 Test comparison based on studies that directly compare tests 235

9.4.7.6 Example 3 continued: CT versus MRI for the diagnosis of coronary artery disease 236

9.4.8 Approaches to analysis with small numbers of studies 238

9.4.9 Sensitivity analysis 239

9.5 Special topics 241

9.5.1 Imperfect reference standard 241

9.5.2 Investigating and handling verification bias 241

9.5.3 Investigating and handling publication bias 242

9.5.4 Developments in meta- analysis for systematic reviews of test accuracy 243

9.6 Chapter information 243

9.7 References 244

10 Undertaking meta- analysis 249

10.1 Introduction 249

10.2 Estimation of a summary point 251

10.2.1 Fitting the bivariate model using SAS 251

10.2.2 Fitting the bivariate model using Stata 253

10.2.3 Fitting the bivariate model using R 256

10.2.4 Bayesian estimation of the bivariate model 261

10.2.4.1 Specification of the bivariate model in rjags 261

10.2.4.2 Monitoring convergence 263

10.2.4.3 Summary statistics 264

10.2.4.4 Generating an SROC plot 265

10.2.4.5 Sensitivity analyses 266

10.3 Estimation of a summary curve 266

10.3.1 Fitting the HSROC model using SAS 268

10.3.2 Bayesian estimation of the HSROC model 268

10.3.2.1 Specification of the HSROC model in rjags 268

10.3.2.2 Monitoring convergence 270

10.3.2.3 Summary statistics and SROC plot 271

10.3.2.4 Sensitivity analyses 272

10.4 Comparison of summary points 272

10.4.1 Fitting the bivariate model in SAS to compare summary points 274

10.4.2 Fitting the bivariate model in Stata to compare summary points 280

10.4.3 Fitting the bivariate model in R to compare summary points 284

10.4.4 Bayesian inference for comparing summary points 287

10.4.4.1 Summary statistics 289

10.5 Comparison of summary curves 291

10.5.1 Fitting the HSROC model in SAS to compare summary curves 292

10.5.2 Bayesian estimation of the HSROC model for comparing summary curves 294

10.5.2.1 Monitoring convergence 295

10.5.2.2 Summary statistics 295

10.6 Meta- analysis of sparse data and a typical data sets 296

10.6.1 Facilitating convergence 297

10.6.2 Simplifying hierarchical models 301

10.7 Meta- analysis with multiple thresholds per study 305

10.7.1 Meta- analysis of multiple thresholds with R 306

10.7.2 Meta- analysis of multiple thresholds with rjags 311

10.8 Meta- analysis with imperfect reference standard: latent class meta- analysis 316

10.8.1 Specification of the latent class bivariate meta- analysis model in rjags 316

10.8.2 Monitoring convergence 317

10.8.3 Summary statistics and summary ROC plot 317

10.8.4 Sensitivity analyses 320

10.9 Concluding remarks 321

10.10 Chapter information 321

10.11 References 322

11 Presenting findings 327

11.1 Introduction 327

11.2 Results of the search 328

11.3 Description of included studies 328

11.4 Methodological quality of included studies 329

11.5 Individual and summary estimates of test accuracy 329

11.5.1 Presenting results from included studies 330

11.5.2 Presenting summary estimates of sensitivity and specificity 330

11.5.3 Presenting SROC curves 330

11.5.4 Describing uncertainty in summary statistics 332

11.5.5 Describing heterogeneity in summary statistics 333

11.6 Comparisons of test accuracy 333

11.6.1 Comparing tests using summary points 333

11.6.2 Comparing tests using SROC curves 334

11.6.3 Interpretation of confidence intervals for differences in test accuracy 336

11.7 Investigations of sources of heterogeneity 336

11.8 Re- expressing summary estimates numerically 340

11.8.1 Frequencies 340

11.8.2 Predictive values 341

11.8.3 Likelihood ratios 344

11.9 Presenting findings when meta- analysis cannot be performed 344

11.10 Chapter information 346

11.11 References 347

12 Drawing conclusions 349

12.1 Introduction 349

12.2 'Summary of findings' tables 350

12.3 Assessing the strength of the evidence 352

12.3.1 Key issues to consider when assessing the strength of the evidence 352

12.3.1.1 How valid are the summary estimates? 359

12.3.1.2 How applicable are the summary estimates? 359

12.3.1.3 How heterogeneous are the individual study estimates? 359

12.3.1.4 How precise are the summary estimates? 360

12.3.1.5 How complete is the body of evidence? 361

12.3.1.6 Were index test comparisons made between or within primary studies? 362

12.4 GRADE approach for assessing the certainty of evidence 362

12.4.1 GRADE domains for assessing certainty of evidence for test accuracy 363

12.4.1.1 Risk of bias 363

12.4.1.2 Indirectness (applicability) 363

12.4.1.3 Inconsistency (heterogeneity) 364

12.4.1.4 Imprecision 365

12.4.1.5 Publication bias 365

12.5 Summary of main results in the Discussion section 365

12.6 Strengths and weaknesses of the review 366

12.6.1 Strengths and weaknesses of included studies 366

12.6.2 Strengths and weaknesses of the review 367

12.6.2.1 Strengths and weaknesses due to the search and selection process 367

12.6.2.2 Strengths and weaknesses due to methodological quality assessment and data extraction 367

12.6.2.3 Weaknesses due to the review analyses 368

12.6.2.4 Direct and indirect comparisons 368

12.6.3 Comparisons with previous research 369

12.7 Applicability of findings to the review question 369

12.8 Drawing conclusions 369

12.8.1 Implications for practice 370

12.8.2 Implications for research 373

12.9 Chapter information 374

12.10 References 374

13 Writing a plain language summary 377

13.1 Introduction 377

13.2 Audience and writing style 378

13.3 Contents and structure of a plain language summary 379

13.3.1 Title 380

13.3.2 Key messages 380

13.3.3 'Why is improving [ ] diagnosis important?' 381

13.3.4 'What is the [ ] test?' 382

13.3.5 What did we want to find out? 382

13.3.6 What did we do? 383

13.3.7 What did we find? 383

13.3.7.1 Describing the included studies 383

13.3.7.2 Presenting information on test accuracy 384

13.3.7.3 Presenting single estimates of accuracy 385

13.3.7.4 Presenting multiple estimates of accuracy: two index tests 386

13.3.7.5 Presenting multiple estimates of accuracy: more than two index tests 387

13.3.7.6 When presenting a numerical summary of test accuracy is not appropriate 387

13.3.7.7 Graphical illustration of test accuracy results 388

13.3.8 What are the limitations of the evidence? 391

13.3.9 How up to date is this evidence? 392

13.4 Chapter information 392

13.5 References 393

13.6 Appendix: Additional example plain language summary 394

Index 399

Jonathan J. Deeks is Professor at the Institute of Applied Health Research at the University of Birmingham in the United Kingdom. He is a member of Cochrane's Diagnostic Test Accuracy Editorial team.

Patrick M. Bossuyt is Professor at the Amsterdam UMC, University of Amsterdam, Department of Epidemiology and Data Science in the Netherlands.

Mariska M. Leeflang is Associate Professor at the Amsterdam UMC, University of Amsterdam, Department of Epidemiology and Data Science in the Netherlands. She is a Convenor of Cochrane's Screening and Diagnostic Tests Methods Group.

Yemisi Takwoingi is Professor at the Institute of Applied Health Research at the University of Birmingham, UK and a Convenor of Cochrane's Screening and Diagnostic Tests Methods Group.

J. J. Deeks, University of Birmingham, UK; P. M. Bossuyt, University of Amsterdam, Netherlands; M. M. Leeflang, University of Amsterdam, Netherlands; Y. Takwoingi, University of Birmingham, UK