# The Statistical Analysis of Doubly Truncated Data

## With Applications in R

Wiley Series in Probability and Statistics

1. Edition December 2021

192 Pages, Hardcover*Wiley & Sons Ltd*

**978-1-119-95137-7**

A thorough treatment of the statistical methods used to analyze doubly truncated data

In The Statistical Analysis of Doubly Truncated Data, an expert team of statisticians delivers an up-to-date review of existing methods used to deal with randomly truncated data, with a focus on the challenging problem of random double truncation. The authors comprehensively introduce doubly truncated data before moving on to discussions of the latest developments in the field.

The book offers readers examples with R code along with real data from astronomy, engineering, and the biomedical sciences to illustrate and highlight the methods described within. Linear regression models for doubly truncated responses are provided and the influence of the bandwidth in the performance of kernel-type estimators, as well as guidelines for the selection of the smoothing parameter, are explored.

Fully nonparametric and semiparametric estimators are explored and illustrated with real data. R code for reproducing the data examples is also provided. The book also offers:

* A thorough introduction to the existing methods that deal with randomly truncated data

* Comprehensive explorations of linear regression models for doubly truncated responses

* Practical discussions of the influence of bandwidth in the performance of kernel-type estimators and guidelines for the selection of the smoothing parameter

* In-depth examinations of nonparametric and semiparametric estimators

Perfect for statistical professionals with some background in mathematical statistics, biostatisticians, and mathematicians with an interest in survival analysis and epidemiology, The Statistical Analysis of Doubly Truncated Data is also an invaluable addition to the libraries of biomedical scientists and practitioners, as well as postgraduate students studying survival analysis.

List of Abbreviations 1

Notation 3

1 Introduction 7

1.1 Random Truncation 7

1.2 One-Sided Truncation and Double Truncation 8

1.2.1 Left-Truncation 8

1.2.2 Right-Truncation 8

1.2.3 Truncation vs. Censoring 9

1.3 Double Truncation 9

1.4 Real Data Examples 11

1.4.1 Childhood Cancer Data 11

1.4.2 AIDS Blood Transfusion Data 12

1.4.3 Equipment-S Rounded Failure Time Data 13

1.4.4 Quasars Data 13

1.4.5 Parkinson's Disease Data 14

1.4.6 Acute Coronary Syndrome Data 15

References 16

2 One Sample Problems 19

2.1 Nonparametric Estimation of a Distribution Function 19

2.1.1 The NPMLE 20

2.1.2 Numerical Algorithms for Computing the NPMLE 26

2.1.3 Theoretical Properties of the NPMLE 31

2.1.4 Standard Errors and Confidence Limits 43

2.2 Semiparametric and Parametric Approaches 49

2.2.1 Semiparametric Approach 51

2.2.2 Parametric Approach 61

2.3 R Code for the Examples 64

2.3.1 Code for Example 2.1.8 64

2.3.2 Code for Examples 2.1.11 and 2.1.13 64

2.3.3 Code for Example 2.1.14 66

2.3.4 Code for Example 2.1.15 67

2.3.5 Code for Example 2.1.22 67

2.3.6 Code for Example 2.2.6 68

2.3.7 Code for Example 2.2.8 69

References 71

3 Smoothing Methods 73

3.1 Some Background in Kernel Estimation 73

3.2 Estimating the Density Function 75

3.3 Asymptotic Properties 75

3.4 Data-Driven Bandwidth Selection 81

3.4.1 Normal Reference Bandwidth Selection 82

3.4.2 Plug-in Bandwidth Selection 83

3.4.3 Least-Squares Cross-Validation Bandwidth Selection 85

3.4.4 Smoothed Bootstrap Bandwidth Selection 86

3.4.5 Bandwidth Selectors in Practice 86

3.5 Further Issues in Kernel Density Estimation 93

3.6 Estimating the Hazard Function 96

3.7 R Code for the Examples 103

3.7.1 Code for Example 3.2.1 104

3.7.2 Code for Examples 3.3.4 and 3.3.5 105

3.7.3 Code for Examples 3.4.2 and 3.4.3 106

3.7.4 Code for Example 3.5.1 107

3.7.5 Code for Example 3.6.4 109

3.7.6 Code for Example 3.6.5 110

References 111

4 Regression Analysis 113

4.1 Observational Bias in Regression 113

4.2 Proportional Hazards Regression 119

4.3 Accelerated Failure Time Regression 122

4.4 Nonparametric Regression 125

4.5 R Code for the Examples 130

4.5.1 Code for Example 4.1.1 131

4.5.2 Code for Example 4.1.4 131

4.5.3 Code for Example 4.2.4 132

4.5.4 Code for Example 4.3.2 132

4.5.5 Code for Example 4.4.2 132

References 133

5 Further Topics 135

5.1 Two Sample Problems 135

5.2 Competing Risks 140

5.2.1 Cumulative Incidences 143

5.2.2 Regression Models for Competing Risks 146

5.3 Testing for Quasi-Independence 150

5.4 Dependent Truncation 153

5.5 R Code for the Examples 161

5.5.1 Code for Example 5.1.3 161

5.5.2 Code for Example 5.2.4 164

5.5.3 Code for Example 5.2.6 165

5.5.4 Code for Example 5.3.1 166

5.5.5 Code for Example 5.4.3 166

References 167

A Packages and Functions in R 169

A.1 Computing the NPMLE and Standard Errors 170

A.2 Assessing the Existence and Uniqueness of the NPMLE 171

A.3 Semiparametric and Parametric Estimation 171

A.4 Kernel Estimation 172

A.5 Regression Analysis 172

A.6 Competing Risks 172

A.7 Simulating Data 173

A.8 Testing Quasi-Independence 173

A.9 Dependent truncation 173

References 173

Index 175

Carla Moreira is Associate Researcher at the Centre of Mathematics, School of Sciences, University of Minho in Portugal. She is also affiliated to the Statistical Inference, Decision and Operations Research group, University of Vigo, Spain, and to the Epidemiology Research unit, Institute of Public Health, University of Porto, Portugal.

Rosa M. Crujeiras is Associate Professor at the Department of Statistics, Mathematical Analysis and Optimization, University of Santiago de Compostela, Spain.