John Wiley & Sons Predictive Analytics Cover Provides a foundation in classical parametric methods of regression and classification essential for.. Product #: 978-1-118-94889-7 Regular price: $116.19 $116.19 Auf Lager

Predictive Analytics

Parametric Models for Regression and Classification Using R

Tamhane, Ajit C. / Malthouse, Edward C.

Wiley Series in Probability and Statistics

Cover

1. Auflage Januar 2021
384 Seiten, Hardcover
Wiley & Sons Ltd

ISBN: 978-1-118-94889-7
John Wiley & Sons

Jetzt kaufen

Preis: 122,00 €

ca.-Preis

Preis inkl. MwSt, zzgl. Versand

Weitere Versionen

epubmobipdf

Provides a foundation in classical parametric methods of regression and classification essential for pursuing advanced topics in predictive analytics and statistical learning

This book covers a broad range of topics in parametric regression and classification including multiple regression, logistic regression (binary and multinomial), discriminant analysis, Bayesian classification, generalized linear models and Cox regression for survival data. The book also gives brief introductions to some modern computer-intensive methods such as classification and regression trees (CART), neural networks and support vector machines.

The book is organized so that it can be used by both advanced undergraduate or masters students with applied interests and by doctoral students who also want to learn the underlying theory. This is done by devoting the main body of the text of each chapter with basic statistical methodology illustrated by real data examples. Derivations, proofs and extensions are relegated to the Technical Notes section of each chapter, Exercises are also divided into theoretical and applied. Answers to selected exercises are provided. A solution manual is available to instructors who adopt the text.

Data sets of moderate to large sizes are used in examples and exercises. They come from a variety of disciplines including business (finance, marketing and sales), economics, education, engineering and sciences (biological, health, physical and social). All data sets are available at the book's web site. Open source software R is used for all data analyses. R codes and outputs are provided for most examples. R codes are also available at the book's web site.

Predictive Analytics: Parametric Models for Regression and Classification Using R is ideal for a one-semester upper-level undergraduate and/or beginning level graduate course in regression for students in business, economics, finance, marketing, engineering, and computer science. It is also an excellent resource for practitioners in these fields.

Preface xiii

Acknowledgments xvii

1 Introduction 1

1.1 Supervised Versus Unsupervised Learning 2

1.2 Parametric Versus Nonparametric Models 2

1.3 Types of Data 3

1.4 Overview of Parametric Predictive Analytics 4

2 Simple Linear Regression and Correlation 7

2.1 Fitting a Straight Line 8

2.1.1 Least Squares Method 8

2.1.2 Linearizing Transformations 10

2.1.3 Fitted Values and Residuals 12

2.1.4 Assessing Goodness of Fit 13

2.2 Statistical Inferences for Simple Linear Regression 15

2.2.1 Simple Linear Regression Model 15

2.2.2 Inferences on _0 and _1 17

2.2.3 Analysis of Variance for Simple Linear Regression 18

2.2.4 Pure Error versus Model Error 20

2.2.5 Prediction of Future Observations 20

2.3 Correlation Analysis 22

2.3.1 Bivariate Normal Distribution_ 24

2.3.2 Inferences on on Correlation Coefficient_ 25

2.4 Modern Extensions_ 26

2.5 Technical Notes_ 27

2.5.1 Derivation of the LS Estimators 27

2.5.2 Sums of Squares 28

2.5.3 Distribution of the LS Estimators 28

2.5.4 Prediction Interval 29

Exercises 29

3 Multiple Linear Regression: Basics 33

3.1 Multiple Linear Regression Model 34

3.1.1 Model in Scalar Notation 34

3.1.2 Model in Matrix Notation 35

3.2 Fitting a Multiple Regression Model 36

3.2.1 Least Squares (LS) Method 36

3.2.2 Interpretation of Regression Coefficients 39

3.2.3 Fitted Values and Residuals 40

3.2.4 Measures of Goodness of Fit 41

3.2.5 Linearizing Transformations 42

3.3 Statistical Inferences for Multiple Regression 42

3.3.1 Analysis of Variance for Multiple Regression 42

3.3.2 Inferences on Regression Coefficients 44

3.3.3 Confidence Ellipsoid for the _ Vector_ 45

3.3.4 Extra Sum of Squares Method 47

3.3.5 Prediction of Future Observations 50

3.4 Weighted and Generalized Least Squares 51

3.4.1 Weighted Least Squares 51

3.4.2 Generalized Least Squares_ 53

3.4.3 Statistical Inference on GLS Estimator_ 53

3.5 Partial Correlation Coefficients_ 53

3.6 Special Topics 56

3.6.1 Dummy Variables 56

3.6.2 Interactions 58

3.6.3 Standardized Regression 62

3.7 Modern Extensions_ 63

3.7.1 Regression Trees 63

3.7.2 Neural Nets 66

3.8 Technical Notes_ 68

3.8.1 Derivation of the LS Estimators 68

3.8.2 Distribution of the LS Estimators 68

3.8.3 GaussMarkov

Theorem 69

3.8.4 Properties of Fitted Values and Residuals 69

3.8.5 Geometric Interpretation of Least Squares 70

3.8.6 Confidence Ellipsoid for _ 71

3.8.7 Population Partial Correlation Coefficient 71

Exercises 72

4 Multiple Linear Regression: Model Diagnostics 79

4.1 Model Assumptions and Distribution of Residuals 79

4.2 Checking Normality 80

4.3 Checking Homoscedasticity 82

4.3.1 Variance Stabilizing Transformations 83

4.3.2 BoxCox

Transformation_ 85

4.4 Detecting Outliers 87

4.5 Checking Model Misspecification_ 89

4.6 Checking Independence 90

4.6.1 Runs Test 90

4.6.2 DurbinWatson

Test_ 93

4.7 Checking Influential Observations 94

4.7.1 Leverage 94

4.7.2 Cook's Distance 95

4.8 Checking Multicollinearity 97

4.8.1 Multicollinearity: Causes and Consequences 97

4.8.2 Multicollinearity Diagnostics 98

Exercises 101

5 Multiple Linear Regression: Shrinkage and Dimension Reduction

Methods 107

5.1 Ridge Regression 108

5.2 Lasso Regression 110

5.3 Principal Components Analysis and Regression_ 114

5.3.1 Principal Components Analysis (PCA) 114

5.3.2 Principal Components Regression (PCR) 122

5.4 Partial Least Squares (PLS)_ 126

5.5 Technical Notes_ 132

5.5.1 Properties of Ridge Estimator 132

5.5.2 Derivation of Principal Components 133

Exercises 134

6 Multiple Linear Regression: Variable Selection and Model Building 137

6.1 Best Subset Selection 138

6.1.1 Model Selection Criteria 138

6.2 Stepwise Regression 142

6.3 Model Building 149

6.4 Technical Notes_ 151

6.4.1 Derivation of the Cp Statistic 151

Exercises 152

7 Logistic Regression and Classification 155

7.1 Simple Logistic Regression 157

7.1.1 Model 157

7.1.2 Parameter Estimation 159

7.1.3 Inferences on Parameters 163

7.2 Multiple Logistic Regression 163

7.2.1 Model and Inference 163

7.3 Likelihood Ratio (LR) Test 166

7.3.1 Deviance 167

7.3.2 Akaike information criterion (AIC) 169

7.3.3 Model Selection and Diagnostics 169

7.4 Binary Classification Using Logistic Regression 172

7.4.1 Measures of Correct Classification 172

7.4.2 Receiver Operating Characteristic (ROC) Curve 175

7.5 Polytomous Logistic Regression 178

7.5.1 Nominal Logistic Regression 178

7.5.2 Ordinal Logistic Regression 181

7.6 Modern Extensions_ 184

7.6.1 Classification Trees 184

7.6.2 Support Vector Machines 186

7.7 Technical Notes_ 190

Exercises 192

8 Discriminant Analysis 199

8.1 Linear Discriminant Analysis (LDA) Based on Mahalnobis Distance 200

8.1.1 Mahalnobis Distance 200

8.1.2 Bayesian Classification 201

8.2 Fisher's Linear Discriminant Function (LD) 203

8.2.1 Two Groups 203

8.2.2 Multiple Groups_ 205

8.3 Naive Bayes 207

8.4 Technical Notes_ 208

8.4.1 Calculation of Pooled Sample Covariance Matrix 208

8.4.2 Derivation of Fisher's Linear Discriminant Functions 208

8.4.3 Bayes Rule 209

Exercises 210

9 Generalized Linear Models 213

9.1 Exponential Family and Link Function 214

9.1.1 Exponential Family 214

9.1.2 Link Function 216

9.2 Estimation of Parameters of GLM 216

9.2.1 Maximum Likelihood Estimation 216

9.2.2 Iteratively Reweighted Least Squares (IRWLS) Algorithm_ 217

9.3 Deviance and AIC 218

9.4 Poisson Regression 222

9.4.1 Poisson Regression for Rates 226

9.5 Gamma Regression_ 228

9.6 Technical Notes_ 232

9.6.1 Mean and Variance of the Exponential Family of Distributions 232

9.6.2 MLE of _ and Its Evaluation Using the IRWLS Algorithm 232

Exercises 234

10 Survival Analysis 239

10.1 Hazard Rate and Survival Distribution 240

10.2 KaplanMeier

Estimator 241

10.3 Log Rank Test 243

10.4 Cox's Proportional Hazards Model 246

10.4.1 Estimation 246

10.4.2 Examples 247

10.4.3 TimeDependent

Covariates 251

10.5 Technical Notes_ 255

10.5.1 ML Estimation of the Cox Proportional Hazards Model 255

Exercises 256

A Primer on Matrix Algebra and Multivariate Distributions 261

A.1 Review of Matrix Algebra 261

A.2 Review of Multivariate Distributions 263

A.3 Multivariate Normal Distribution 264

B Primer on Maximum Likelihood Estimation 267

B.1 Maximum Likelihood Estimation 267

B.2 Large Sample Inference on MLEs 269

B.3 NewtonRaphson

and Fisher Scoring Algorithms 270

B.4 Technical Notes_ 271

C Projects 275

C.1 Project 1 278

C.2 Project 2 279

C.3 Project 3 280

D References 283

E Statistical Tables 287

Answers to Selected Exercises 289
Ajit C. Tamhane, PhD, is Professor of Industrial Engineering & Management Sciences with a courtesy appointment in Statistics at Northwestern University. He is a fellow of the American Statistical Association, Institute of Mathematical Statistics, American Association for Advancement of Science and an elected member of the International Statistical Institute.

A. C. Tamhane, Northwestern University