Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy
Wiley Cochrane Series

1. Auflage August 2023
432 Seiten, Hardcover
Wiley & Sons Ltd
Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy
A guide to conducting systematic reviews of test accuracy
In Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, a team of distinguished researchers deliver the official guide to preparing and maintaining systematic reviews of test accuracy in healthcare. This first edition of the Handbook contains guidance on understanding test accuracy measures, search strategies and study selection, understanding meta-analysis and risk of bias and applicability assessments, presentation of findings, and drawing conclusions.
Readers will also find:
* An introduction to test evaluation, including the purposes of medical testing, test accuracy and the impact of tests on patient outcomes
* Comprehensive explorations of the design of test accuracy studies, including discussions of reference standards and comparative test accuracy studies
* Considerations of the methods and presentation of systematic reviews of test accuracy
* Elaboration of study selection, data collection, and undertaking risk of bias and applicability assessments
Perfect for medical practitioners and clinicians, Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy will also benefit professionals in epidemiology and students in related fields.
Preface
Part 1: About Cochrane Reviews of diagnostic test accuracy
1 Planning a Cochrane Review of diagnostic test accuracy
1.1 Introduction
1.2 Why do a systematic review of test accuracy?
1.3 Undertaking a Cochrane Review of diagnostic test accuracy
1.3.1 The role of the Diagnostic Test Accuracy Editorial Team
1.3.2 Expectations for the conduct and reporting of Cochrane Reviews of diagnostic test accuracy
1.3.3 Data management and quality assurance
1.3.4 Keeping the Review up to date
1.4 Proposing a new Cochrane Review of diagnostic test accuracy
1.5 Cochrane Protocols
1.6 The author team
1.6.1 The importance of the team
1.6.2 Criteria for authorship
1.6.3 Incorporating relevant perspectives and stakeholder involvement
1.7 Resources and support
1.7.1 Identifying resources and support
1.7.2 Funding and conflicts of interest
1.7.3 Training
1.7.4 Software
1.8 Chapter information
1.9 References
Part 2: Introducing test accuracy
2 Evaluating medical tests
2.1 Introduction
2.2 Types of medical tests
2.3 Test accuracy
2.4 How do diagnostic tests affect patient outcomes?
2.4.1 Direct test effects
2.4.2 Altering clinical decisions and actions
2.4.3 Changes to time frames and populations
2.4.4 Influencing patient and clinician perceptions
2.5 Evaluations of test accuracy during test development
2.5.1 Evaluations of accuracy during biomarker discovery
2.5.2 Early evaluations of test accuracy
2.5.3 Clinical evaluations of test accuracy
2.6 Other purposes of medical testing
2.6.1 Predisposition
2.6.2 Risk stratification
2.6.3 Screening
2.6.4 Staging
2.6.5 Prognosis
2.6.6 Treatment selection
2.6.7 Treatment efficacy
2.6.8 Therapeutic monitoring
2.6.9 Surveillance for progression or recurrence
2.7 Chapter information
2.8 References
3 Understanding the design of test accuracy studies
3.1 Introduction
3.2 The basic design for a test accuracy study
3.3 Multiple groups of participants
3.4 Multiple reference standards
3.5 More on reference standards
3.5.1 Delayed verification
3.5.2 Composite reference standard
3.5.3 Panel-based reference
3.5.4 Latent class analysis
3.5.5 Gold standard
3.5.6 Clinical reference standard
3.6 Comparative test accuracy studies
3.6.1 Paired comparative accuracy study
3.6.2 Randomized comparative accuracy study
3.6.3 Non-randomized comparative accuracy study
3.7 Additional aspects of study designs
3.7.1 Prospective versus retrospective
3.7.2 Pragmatic versus explanatory
3.8 Concluding remarks
3.9 Chapter information
3.10 References
4 Understanding test accuracy measures
4.1 Introduction
4.2 Types of test data
4.3 Inconclusive index test results
4.4 Target condition
4.5 Analysis of a primary test accuracy study
4.5.1 Sensitivity and specificity
4.5.2 Predictive values
4.5.3 Proportion with the target condition
4.5.4 Pre-test and post-test probabilities
4.5.5 Interpretation of sensitivity, specificity and predictive values
4.5.6 Confidence intervals
4.5.7 Other test accuracy measures
4.6 Positivity thresholds
4.7 Receiver operating characteristic curves
4.8 Analysis of a comparative accuracy study
4.9 Chapter information
4.10 References
Part 3: Methods and presentation of systematic reviews of test accuracy
5 Defining the review question
5.1 Introduction
5.2 Aims of systematic reviews of test accuracy
5.2.1 Investigations of heterogeneity
5.3 Identifying the clinical problem
5.3.1 Role of a new test
5.3.2 Defining the clinical pathway
5.3.3 Unclear and multiple clinical pathways
5.4 Defining the review question
5.4.1 Population
5.4.2 Index test(s)
5.4.3 Target condition
5.4.4 The review question: PIT
5.4.5 From review question to objectives
5.4.6 Broad versus narrow questions
5.5 Defining eligibility criteria
5.5.1 Types of studies
5.5.2 Participants
5.5.3 Index test(s)
5.5.4 Target condition
5.5.5 Reference standard
5.6 Chapter information
5.7 References
6 Searching for and selecting studies
6.1 Introduction
6.2 Searching for studies
6.2.1 Working in partnership
6.2.2 Advice for review teams that do not include an information specialist
6.3 Sources to search
6.3.1 Bibliographic databases
6.3.1.1 MEDLINE, PubMed and Embase
6.3.1.2 National and regional databases
6.3.1.3 Subject-specific databases
6.3.1.4 Dissertations and theses databases
6.3.2 Additional sources to search
6.3.2.1 Related reviews, guidelines and reference lists as sources of studies
6.3.2.2 Handsearching
6.3.2.3 Forward citation searching and co-citation searching
6.3.2.4 Web searching
6.3.2.5 Grey literature databases
6.3.2.6 Trial registries
6.3.2.7 Contacting colleagues, study authors, and manufacturers
6.4 Designing search strategies
6.4.1 Structuring the search strategy
6.4.2 Controlled vocabulary and text words
6.4.3 Text word or keyword searching
6.4.4 Search filters
6.4.5 Language, date and type of document restrictions
6.4.6 Identifying fraudulent studies, other retracted publications, errata and comments
6.4.7 Minimizing the risk of bias through search methods
6.5 Documenting and reporting the search process
6.5.1 Documenting the search process
6.5.2 Reporting the search process
6.5.2.1 Reporting the search process in the protocol
6.5.2.2 Reporting the search process in the review
6.6 Selecting relevant studies
6.6.1 Examine full-text reports for compliance of studies with eligibility criteria
6.7 Future developments in literature searching and selection
6.8 Chapter information
6.9 References
7 Collecting data
7.1 Introduction
7.2 Sources of data
7.2.1 Studies (not reports) as the unit of interest
7.2.2 Correspondence with investigators
7.3 What data to collect
7.3.1 What are data?
7.3.2 Study methods (participant recruitment and sampling)
7.3.3 Participant characteristics and setting
7.3.4 Index test(s)
7.3.5 Target condition and reference standard
7.3.6 Flow and timing
7.3.7 Extracting study results and converting to the desired format
7.3.7.1 Obtaining 2×2 data from accuracy measures
7.3.7.2 Using global measures
7.3.7.3 Challenges defining reference standard positive and negative: strategies when there are more than two categories
7.3.7.4 Challenges defining index test positive and negative: inconclusive results
7.3.7.5 Challenges defining index test positive and negative: failures
7.3.7.6 Challenges defining index test positive and negative: dealing with multiple thresholds and extracting data from ROC curves or other graphics
7.3.7.7 Extracting data from figures with software
7.3.7.8 Corrections for missing data: adjusting for partial verification bias
7.3.7.9 Multiple index tests from the same study
7.3.7.10 Subgroups of patients
7.3.7.11 Individual patient data
7.3.7.12 Extracting covariates
7.3.8 Other information to collect
7.4 Data collection tools
7.4.1 Rationale for data collection forms
7.4.2 Considerations in selecting data collection tools
7.4.3 Design of a data collection form
7.5 Extracting data from reports
7.5.1 Introduction
7.5.2 Who should extract data?
7.5.3 Training data extractors
7.5.4 Extracting data from multiple reports of the same study
7.5.5 Reliability and reaching consensus
7.5.6 Suspicions of scientific misconduct
7.5.7 Key points in planning and reporting data extraction
7.6 Managing and sharing data and tools
7.7 Chapter information
7.8 References
8 Assessing risk of bias and applicability
8.1 Introduction
8.2 Understanding bias and applicability
8.2.1 Bias and imprecision
8.2.2 Bias versus applicability
8.2.3 Biases in test accuracy studies: empirical evidence
8.3 QUADAS-2
8.3.1 Background
8.3.2 Risk-of-bias assessment
8.3.3 Applicability assessment
8.3.4 Using and tailoring QUADAS-2
8.3.5 Flow diagram
8.3.6 Performing the QUADAS-2 assessment
8.4 Domain 1: Participant selection
8.4.1 Participant selection: risk-of-bias signalling questions (QUADAS-2)
8.4.2 Participant selection: additional signalling questions for comparative accuracy studies (QUADAS-C)
8.4.3 Participant selection: concerns regarding applicability
8.5 Domain 2: Index test
8.5.1 Index test: risk-of-bias signalling questions (QUADAS-2)
8.5.2 Index test: additional signalling questions for comparative accuracy studies (QUADAS-C)
8.5.3 Index test: concerns regarding applicability
8.6 Domain 3: Reference standard
8.6.1 Reference standard: risk-of-bias signalling questions (QUADAS-2)
8.6.2 Reference standard: additional signalling questions for comparative accuracy studies (QUADAS-C)
8.6.3 Reference standard: concerns regarding applicability
8.7 Domain 4: Flow and timing
8.7.1 Flow and timing: risk-of-bias signalling questions (QUADAS-2)
8.7.2 Flow and timing: additional signalling questions for comparative accuracy studies (QUADAS-C)
8.8 Presentation of risk-of-bias and applicability assessments
8.9 Narrative summary of risk-of-bias and applicability assessments
8.10 Chapter information
8.11 References
9 Understanding meta-analysis
9.1 Introduction
9.1.1 Aims of meta-analysis for systematic reviews of test accuracy
9.1.2 When not to use a meta-analysis in a review
9.1.3 How does meta-analysis of diagnostic test accuracy differ from meta-analysis of interventions?
9.1.4 Questions that can be addressed in test accuracy analyses
9.1.4.1 What is the accuracy of a test?
9.1.4.2 How does the accuracy vary with clinical and methodological characteristics?
9.1.4.3 How does the accuracy of two or more tests compare?
9.1.5 Planning the analysis
9.2 Graphical and tabular presentation
9.2.1 Coupled forest plots
9.2.2 Summary ROC plots
9.2.3 Linked SROC plots
9.2.3.1 Example 1: Anti-CCP for the diagnosis of rheumatoid arthritis - descriptive plots
9.2.4 Tables of results
9.3 Meta-analytical summaries
9.3.1 Should I estimate a SROC curve or a summary point?
9.3.2 Heterogeneity
9.4 Fitting hierarchical models
9.4.1 Bivariate model
9.4.2 Example 1 continued: Anti-CCP for the diagnosis of rheumatoid arthritis
9.4.3 The Rutter and Gatsonis HSROC model
9.4.4 Example 2: Rheumatoid factor as a marker for rheumatoid arthritis
9.4.5 Data reported at multiple thresholds per study
9.4.6 Investigating heterogeneity
9.4.6.1 Criteria for model selection
9.4.6.2 Heterogeneity and regression analysis using the bivariate model
9.4.6.3 Example 1 continued: Investigation of heterogeneity in diagnostic performance of anti-CCP
9.4.6.4 Heterogeneity and regression analysis using the Rutter and Gatsonis HSROC model
9.4.6.5 Example 2 continued: Investigating heterogeneity in diagnostic accuracy of rheumatoid factor (RF)
9.4.7 Comparing index tests
9.4.7.1 Test comparisons based on all available studies
9.4.7.2 Test comparisons using the bivariate model
9.4.7.3 Example 3: CT versus MRI for the diagnosis of coronary artery disease
9.4.7.4 Test comparisons using the Rutter and Gatsonis HSROC model
9.4.7.5 Test comparison based on studies that directly compare tests
9.4.7.6 Example 3 continued: CT versus MRI for the diagnosis of coronary artery disease
9.4.8 Approaches to analysis with small numbers of studies
9.4.9 Sensitivity analysis
9.5 Special topics
9.5.1 Imperfect reference standard
9.5.2 Investigating and handling verification bias
9.5.3 Investigating and handling publication bias
9.5.4 Developments in meta-analysis for systematic reviews of test accuracy
9.6 Chapter information
9.7 References
10 Undertaking meta-analysis
10.1 Introduction
10.2 Estimation of a summary point
10.2.1 Fitting the bivariate model using SAS
10.2.2 Fitting the bivariate model using Stata
10.2.3 Fitting the bivariate model using R
10.2.4 Bayesian estimation of the bivariate model
10.2.4.1 Specification of the bivariate model in rjags
10.2.4.2 Monitoring convergence
10.2.4.3 Summary statistics
10.2.4.4 Generating a SROC plot
10.2.4.5 Sensitivity analyses
10.3 Estimation of a summary curve
10.3.1 Fitting the HSROC model using SAS
10.3.2 Bayesian estimation of the HSROC model
10.3.2.1 Specification of the HSROC model in rjags
10.3.2.2 Monitoring convergence
10.3.2.3 Summary statistics and HSROC plot
10.3.2.4 Sensitivity analyses
10.4 Comparison of summary points
10.4.1 Fitting the bivariate model in SAS to compare summary points
10.4.2 Fitting the bivariate model in Stata to compare summary points
10.4.3 Fitting the bivariate model in R to compare summary points
10.4.4 Bayesian inference for comparing summary points
10.4.4.1 Summary statistics
10.5 Comparison of summary curves 11:48
10.5.1 Fitting the HSROC model in SAS to compare summary curves
10.5.2 Bayesian estimation of the HSROC model for comparing summary curves
10.5.2.1 Monitoring convergence
10.5.2.2 Summary statistics
10.6 Meta-analysis of sparse data and atypical datasets
10.6.1 Facilitating convergence
10.6.2 Simplifying hierarchical models
10.7 Meta-analysis with multiple thresholds per study
10.7.1 Meta-analysis of multiple thresholds with R
10.7.2 Meta-analysis of multiple thresholds with rjags
10.8 Meta-analysis with imperfect reference standard: latent class meta-analysis
10.8.1 Specification of the latent class bivariate meta-analysis model in rjags
10.8.2 Monitoring convergence
10.8.3 Summary statistics and summary receiver-operating characteristic plot
10.8.4 Sensitivity analyses
10.9 Concluding remarks
10.10 Chapter information
10.11 References
11 Presenting findings
11.1 Introduction
11.2 Results of the search
11.3 Description of included studies
11.4 Methodological quality of included studies
11.5 Individual and summary estimates of test accuracy
11.5.1 Presenting results from included studies
11.5.2 Presenting summary estimates of sensitivity and specificity
11.5.3 Presenting SROC curves
11.5.4 Describing uncertainty in summary statistics
11.5.5 Describing heterogeneity in summary statistics
11.6 Comparisons of test accuracy
11.6.1 Comparing tests using summary points
11.6.2 Comparing tests using SROC curves
11.6.3 Interpretation of confidence intervals for differences in test accuracy
11.7 Investigations of sources of heterogeneity
11.8 Re-expressing summary estimates numerically
11.8.1 Frequencies
11.8.2 Predictive values
11.8.3 Likelihood ratios
11.9 Presenting findings when meta-analysis cannot be performed
11.10 Chapter information
11.11 References
12 Drawing conclusions
12.1 Introduction
12.2 'Summary of findings' tables
12.3 Assessing the strength of the evidence
12.3.1 Key issues to consider when assessing the strength of the evidence
12.3.1.1 How valid are the summary estimates?
12.3.1.2 How applicable are the summary estimates?
12.3.1.3 How heterogeneous are the individual study estimates?
12.3.1.4 How precise are the summary estimates?
12.3.1.5 How complete is the body of evidence?
12.3.1.6 Were index test comparisons made between or within primary studies?
12.4 GRADE approach for assessing the certainty of evidence
12.4.1 GRADE domains for assessing certainty of evidence for test accuracy
12.4.1.1 Risk of bias
12.4.1.2 Indirectness (applicability)
12.4.1.3 Inconsistency (heterogeneity)
12.4.1.4 Imprecision
12.4.1.5 Publication bias
12.5 Summary of main results in the Discussion section
12.6 Strengths and weaknesses of the review
12.6.1 Strengths and weaknesses of included studies
12.6.2 Strengths and weaknesses of the review
12.6.2.1 Strengths and weaknesses due to the search and selection process
12.6.2.2 Strengths and weaknesses due to methodological quality assessment and data extraction
12.6.2.3 Weaknesses due to the review analyses
12.6.2.4 Direct and indirect comparisons
12.6.3 Comparisons with previous research
12.7 Applicability of findings to the review question
12.8 Drawing conclusions
12.8.1 Implications for practice
12.8.2 Implications for research
12.9 Chapter information
12.10 References
13 Writing a plain language summary
13.1 Introduction
13.2 Audience and writing style
13.3 Contents and structure of a plain language summary
13.3.1 Title
13.3.2 Key messages
13.3.3 Why is improving [...] diagnosis important?
13.3.4 What is the [...] test?
13.3.5 What did we want to find out?
13.3.6 What did we do?
13.3.7 What did we find?
13.3.7.1 Describing the included studies
13.3.7.2 Presenting information on test accuracy
13.3.7.3 Presenting single estimates of accuracy
13.3.7.4 Presenting multiple estimates of accuracy: two index tests
13.3.7.5 Presenting multiple estimates of accuracy: more than two index tests
13.3.7.6 When presenting a numerical summary of test accuracy is not appropriate
13.3.7.7 Graphical illustration of test accuracy results
13.3.8 What are the limitations of the evidence?
13.3.9 How up to date is this evidence?
13.4 Chapter information
13.5 References
13.6 Appendix: Additional example plain language summary
Patrick M. Bossuyt is Professor at the Amsterdam UMC, University of Amsterdam, Department of Epidemiology and Data Science in the Netherlands.
Mariska M. Leeflang is Associate Professor at the Amsterdam UMC, University of Amsterdam, Department of Epidemiology and Data Science in the Netherlands. She is a Convenor of Cochrane's Screening and Diagnostic Tests Methods Group.
Yemisi Takwoingi is Professor at the Institute of Applied Health Research at the University of Birmingham, UK and a Convenor of Cochrane's Screening and Diagnostic Tests Methods Group.