John Wiley & Sons Multiple Imputation and its Application Cover Multiple Imputation and its Application The most up-to-date edition of a bestselling guide to analy.. Product #: 978-1-119-75608-8 Regular price: $74.67 $74.67 Auf Lager

Multiple Imputation and its Application

Carpenter, James R. / Bartlett, Jonathan W. / Morris, Tim P. / Wood, Angela M. / Quartagno, Matteo / Kenward, Michael G.

Statistics in Practice (Band Nr. 1)

Cover

2. Auflage August 2023
464 Seiten, Hardcover
Praktikerbuch

ISBN: 978-1-119-75608-8
John Wiley & Sons

Jetzt kaufen

Preis: 79,90 €

Preis inkl. MwSt, zzgl. Versand

Weitere Versionen

epubmobipdf

Multiple Imputation and its Application

The most up-to-date edition of a bestselling guide to analyzing partially observed data

In this comprehensively revised Second Edition of Multiple Imputation and its Application, a team of distinguished statisticians delivers an overview of the issues raised by missing data, the rationale for multiple imputation as a solution, and the practicalities of applying it in a multitude of settings.

With an accessible and carefully structured presentation aimed at quantitative researchers, Multiple Imputation and its Application is illustrated with a range of examples and offers key mathematical details. The book includes a wide range of theoretical and computer-based exercises, tested in the classroom, which are especially useful for users of R or Stata. Readers will find:
* A comprehensive overview of one of the most effective and popular methodologies for dealing with incomplete data sets
* Careful discussion of key concepts
* A range of examples illustrating the key ideas
* Practical advice on using multiple imputation
* Exercises and examples designed for use in the classroom and/or private study

Written for applied researchers looking to use multiple imputation with confidence, and for methods researchers seeking an accessible overview of the topic, Multiple Imputation and its Application will also earn a place in the libraries of graduate students undertaking quantitative analyses.

Preface to the second edition xiii

Data acknowledgements xv

Acknowledgements xvii

Glossary xix

Part I Foundations 1

1 Introduction 3

1.1 Reasons for missing data 5

1.2 Examples 6

1.3 Patterns of missing data 7

1.3.1 Consequences of missing data 9

1.4 Inferential framework and notation 10

1.4.1 Missing completely at random (MCAR) 12

1.4.2 Missing at random (MAR) 13

1.4.3 Missing not at random (MNAR) 17

1.4.4 Ignorability 21

1.5 Using observed data to inform assumptions about the missingness mechanism 21

1.6 Implications of missing data mechanisms for regression analyses 24

1.6.1 Partially observed response 24

1.6.2 Missing covariates 27

1.6.3 Missing covariates and response 30

1.6.4 Subtle issues I: the odds ratio 30

1.6.5 Implication for linear regression 32

1.6.6 Subtle issues II: sub-sample ignorability 33

1.6.7 Summary: when restricting to complete records is valid 34

1.7 Summary 34

Exercises 35

2 The Multiple Imputation Procedure and Its Justification 39

2.1 Introduction 39

2.2 Intuitive outline of the MI procedure 40

2.3 The generic MI procedure 45

2.4 Bayesian justification of mi 48

2.5 Frequentist inference 50

2.5.1 Large number of imputations 50

2.5.2 Small number of imputations 51

2.5.3 Inference for vector ß 53

2.5.4 Combining likelihood ratio tests 54

2.6 Choosing the number of imputations 55

2.7 Some simple examples 56

2.7.1 Estimating the mean with sigma 2 known by the imputer and analyst 57

2.7.2 Estimating the mean with sigma 2 known only by the imputer 59

2.7.3 Estimating the mean with sigma 2 unknown 59

2.7.4 General linear regression with sigma 2 known 61

2.8 mi in more general settings 64

2.8.1 Proper imputation 64

2.8.2 Congenial imputation and substantive model 64

2.8.3 Uncongenial imputation and substantive models 65

2.8.4 Survey sample settings 71

2.9 Constructing congenial imputation models 72

2.10 Discussion 73

Exercises 73

Part II Multiple Imputation for Simple Data Structures 79

3 Multiple Imputation of Quantitative Data 81

3.1 Regression imputation with a monotone missingness pattern 81

3.1.1 MAR mechanisms consistent with a monotone pattern 83

3.1.2 Justification 84

3.2 Joint modelling 85

3.2.1 Fitting the imputation model 85

3.2.2 Adding covariates 89

3.3 Full conditional specification 90

3.3.1 Justification 91

3.4 Full conditional specification versus joint modelling 92

3.5 Software for multivariate normal imputation 93

3.6 Discussion 93

Exercises 94

4 Multiple Imputation of Binary and Ordinal Data 96

4.1 Sequential imputation with monotone missingness pattern 96

4.2 Joint modelling with the multivariate normal distribution 98

4.3 Modelling binary data using latent normal variables 100

4.3.1 Latent normal model for ordinal data 104

4.4 General location model 108

4.5 Full conditional specification 108

4.5.1 Justification 109

4.6 Issues with over-fitting 110

4.7 Pros and cons of the various approaches 114

4.8 Software 116

4.9 Discussion 116

Exercises 117

5 Imputation of Unordered Categorical Data 119

5.1 Monotone missing data 119

5.2 Multivariate normal imputation for categorical data 121

5.3 Maximum indicant model 121

5.3.1 Continuous and categorical variable 123

5.3.2 Imputing missing data 125

5.4 General location model 125

5.5 FCS with categorical data 128

5.6 Perfect prediction issues with categorical data 130

5.7 Software 130

5.8 Discussion 130

Exercises 131

Part III Multiple Imputation in Practice 133

6 Non-linear Relationships, Interactions, and Other Derived Variables 135

6.1 Introduction 135

6.1.1 Interactions 137

6.1.2 Squares 137

6.1.3 Ratios 138

6.1.4 Sum scores 139

6.1.5 Composite endpoints 140

6.2 No missing data in derived variables 141

6.3 Simple methods 143

6.3.1 Impute then transform 143

6.3.2 Transform then impute/just another variable 143

6.3.3 Adapting standard imputation models and passive imputation 145

6.3.4 Predictive mean matching 146

6.3.5 Imputation separately by groups for interactions 148

6.4 Substantive-model-compatible imputation 152

6.4.1 The basic idea 152

6.4.2 Latent-normal joint model SMC imputation 157

6.4.3 Factorised conditional model SMC imputation 160

6.4.4 Substantive model compatible fully conditional specification 161

6.4.5 Auxiliary variables 162

6.4.6 Missing outcome values 163

6.4.7 Congeniality versus compatibility 163

6.4.8 Discussion of SMC imputation 164

6.5 Returning to the problems 165

6.5.1 Ratios 165

6.5.2 Splines 165

6.5.3 Fractional polynomials 166

6.5.4 Multiple imputation with conditional questions or 'skips' 169

Exercises 172

7 Survival Data 175

7.1 Missing covariates in time-to-event data 175

7.1.1 Approximately compatible approaches 176

7.1.2 Substantive model compatible approaches 181

7.2 Imputing censored event times 186

7.3 Non-parametric, or 'hot deck' imputation 188

7.3.1 Non-parametric imputation for time-to-event data 189

7.4 Case-cohort designs 191

7.4.1 Standard analysis of case-cohort studies 192

7.4.2 Multiple imputation for case-cohort studies 193

7.4.3 Full cohort 193

7.4.4 Intermediate approaches 193

7.4.5 Sub-study approach 194

7.5 Discussion 197

Exercises 197

8 Prognostic Models, Missing Data, and Multiple Imputation 200

8.1 Introduction 200

8.2 Motivating example 201

8.3 Missing data at model implementation 201

8.4 Multiple imputation for prognostic modelling 202

8.5 Model building 202

8.5.1 Model building with missing data 202

8.5.2 Imputing predictors when model building is to be performed 204

8.6 Model performance 204

8.6.1 How should we pool MI results for estimation of performance? 205

8.6.2 Calibration 205

8.6.3 Discrimination 206

8.6.4 Model performance measures with clinical interpretability 206

8.7 Model validation 206

8.7.1 Internal model validation 207

8.7.2 External model validation 208

8.8 Incomplete data at implementation 208

8.8.1 MI for incomplete data at implementation 208

8.8.2 Alternatives to multiple imputation 210

Exercises 212

9 Multi-level Multiple Imputation 213

9.1 Multi-level imputation model 213

9.1.1 Imputation of level-1 variables 216

9.1.2 Imputation of level 2 variables 219

9.1.3 Accommodating the substantive model 223

9.2 MCMC algorithm for imputation model 224

9.2.1 Ordered and unordered categorical data 226

9.2.2 Imputing missing values 227

9.2.3 Substantive model compatible imputation 227

9.2.4 Checking model convergence 229

9.3 Extensions 231

9.3.1 Cross-classification and three-level data 231

9.3.2 Random level 1 covariance matrices 232

9.3.3 Model fit 234

9.4 Other imputation methods 234

9.4.1 One-step and two-step FCS 234

9.4.2 Substantive model compatible imputation 235

9.4.3 Non-parametric methods 236

9.4.4 Comparisons of different methods 236

9.5 Individual participant data meta-analysis 237

9.5.1 Different measurement scales 239

9.5.2 When to apply Rubin's rules 239

9.5.3 Homoscedastic versus heteroscedastic imputation model 240

9.6 Software 241

9.7 Discussion 241

Exercises 242

10 Sensitivity Analysis: MI Unleashed 245

10.1 Review of MNAR modelling 246

10.2 Framing sensitivity analysis: estimands 249

10.2.1 Definition of the estimand 249

10.2.2 Two common estimands 250

10.3 Pattern mixture modelling with mi 251

10.3.1 Missing covariates 256

10.3.2 Sensitivity with multiple variables: the NAR FCS procedure 258

10.3.3 Application to survival analysis 260

10.4 Pattern mixture approach with longitudinal data via mi 263

10.4.1 Change in slope post-deviation 264

10.5 Reference based imputation 267

10.5.1 Constructing joint distributions of pre- and post-intercurrent event data 268

10.5.2 Technical details 269

10.5.3 Software 271

10.5.4 Information anchoring 275

10.6 Approximating a selection model by importance weighting 279

10.6.1 Weighting the imputations 281

10.6.2 Stacking the imputations and applying the weights 282

10.7 Discussion 289

Exercises 290

11 Multiple Imputation for Measurement Error and Misclassification 294

11.1 Introduction 294

11.2 Multiple imputation with validation data 296

11.2.1 Measurement error 297

11.2.2 Misclassification 297

11.2.3 Imputing assuming error is non-differential 299

11.2.4 Non-linear outcome models 299

11.3 Multiple imputation with replication data 301

11.3.1 Measurement error 302

11.3.2 Misclassification 306

11.4 External information on the measurement process 307

11.5 Discussion 308

Exercises 309

12 Multiple Imputation with Weights 312

12.1 Using model-based predictions in strata 313

12.2 Bias in the MI variance estimator 314

12.3 MI with weights 317

12.3.1 Conditions for the consistency of theta MI 317

12.3.2 Conditions for the consistency of V MI 318

12.4 A multi-level approach 320

12.4.1 Evaluation of the multi-level multiple imputation approach for handling survey weights 322

12.4.2 Results 325

12.5 Further topics 328

12.5.1 Estimation in domains 328

12.5.2 Two-stage analysis 328

12.5.3 Missing values in the weight model 329

12.6 Discussion 329

Exercises 330

13 Multiple Imputation for Causal Inference 333

13.1 Multiple imputation for causal inference in point exposure studies 333

13.1.1 Randomised trials 335

13.1.2 Observational studies 335

13.2 Multiple imputation and propensity scores 338

13.2.1 Propensity scores for confounder adjustment 338

13.2.2 Multiple imputation of confounders 340

13.2.3 Imputation model specification 342

13.3 Principal stratification via multiple imputation 343

13.3.1 Principal strata effects 344

13.3.2 Estimation 345

13.4 Multiple imputation for IV analysis 346

13.4.1 Instrumental variable analysis for non-adherence 346

13.4.2 Instrumental variable analysis via multiple imputation 348

13.5 Discussion 350

Exercises 351

14 Using Multiple Imputation in Practice 355

14.1 A general approach 355

14.1.1 Explore the proportions and patterns of missing data 356

14.1.2 Consider plausible missing data mechanisms 356

14.1.3 Consider whether missing at random is plausible 356

14.1.4 Choose the variables for the imputation model 357

14.1.5 Choose an appropriate imputation strategy and model/s 357

14.1.6 Set and record the seed of the pseudo-random number generator 357

14.1.7 Fit the imputation model 358

14.1.8 Iterate and revise the imputation model if necessary 358

14.1.9 Estimate monte carlo error 358

14.1.10 Sensitivity analysis 359

14.2 Objections to multiple imputation 359

14.3 Reporting of analyses with incomplete data 363

14.4 Presenting incomplete baseline data 364

14.5 Model diagnostics 365

14.6 How many imputations? 366

14.6.1 Using the jack-knife estimate of the Monte-Carlo standard error 368

14.7 Multiple imputation for each substantive model, project, or dataset? 369

14.8 Large datasets 370

14.8.1 Large datasets and joint modelling 371

14.8.2 Shrinkage by constraining parameters 372

14.8.3 Comparison of the two approaches 375

14.9 Multiple imputation and record linkage 375

14.10 Setting random number seeds for multiple imputation analyses 377

14.11 Simulation studies including multiple imputation 377

14.11.1 Random number seeds for simulation studies including multiple imputation 377

14.11.2 Repeated simulation of all data or only the missingness mechanism? 378

14.11.3 How many imputations for simulation studies? 379

14.11.4 Multiple imputation for data simulation 380

14.12 Discussion 381

Exercises 381

Appendix A Markov Chain Monte Carlo 384

A.1 Metropolis Hastings sampler 385

A.2 Gibbs sampler 386

A.3 Missing data 387

Appendix B Probability Distributions 388

B.1 Posterior for the multivariate normal distribution 391

Appendix C Overview of Multiple Imputation in R, Stata 394

C.1 Basic multiple imputation using R 394

C.2 Basic MI using Stata 395

References 398

Author Index 419

Index of Examples 429

Subject Index 431
JAMES R. CARPENTER is Professor of Medical Statistics at the London School of Hygiene & Tropical Medicine and Programme Leader in Methodology at the MRC Clinical Trials Unit at UCL, UK.

JONATHAN W. BARTLETT is a Professor of Medical Statistics at the London School of Hygiene & Tropical Medicine, UK.

TIM P. MORRIS is Principal Research Fellow in Medical Statistics at the MRC Clinical Trials Unit at UCL, UK.

ANGELA M. WOOD is Professor of Health Data Science in the Department of Public Health and Primary Care, University of Cambridge, UK.

MATTEO QUARTAGNO is Senior Research Fellow in Medical Statistics at the MRC Clinical Trials Unit at UCL, UK.

MICHAEL G. KENWARD retired in 2016 after sixteen years as GlaxoSmithKline Professor of Biostatistics at the London School of Hygiene & Tropical Medicine, UK.