|Narsky, Ilya / Porter, Frank C.|
Statistical Analysis Techniques in Particle Physics
Fits, Density Estimation and Supervised Learning
1. Edition November 2013
2013. 459 Pages, Softcover
100 Fig., 70 Tab.
- Practical Approach Book -
ISBN 978-3-527-41086-6 - Wiley-VCH, Berlin
Homepage to the book
E-Books are also available on all known E-Book shops.
Modern analysis of HEP data needs advanced statistical tools to separate signal from background. This is the first book which focuses on machine learning techniques. Of interest to almost every High Energy physicist, and due to its coverage suitable for students.
From the contents
1 Why We Wrote This Book and How You Should Read It2 Parametric Likelihood Fits2.1 Preliminaries2.2 Parametric Likelihood Fits2.3 Fits for Small Statistics2.4 Results Near the Boundary of a Physical Region2.5 Likelihood Ratio Test for Presence of Signal2.6 sPlots2.7 Exercises3 Goodness of Fit3.1 Binned Goodness of Fit Tests3.2 Statistics Converging to Chi-Square3.3 Univariate Unbinned Goodness of Fit Tests3.4 Multivariate Tests3.5 Exercises4 Resampling Techniques4.1 Permutation Sampling4.2 Bootstrap4.3 Jackknife4.4 BCa Confidence Intervals4.5 Cross-Validation4.6 _Resampling Weighted Observations4.7 Exercises5 Density Estimation5.1 Empirical Density Estimate5.2 Histograms5.3 Kernel Estimation5.4 Ideogram5.5 Parametric vs. Nonparametric Density Estimation5.6 Optimization5.7 Estimating Errors5.8 The Curse of Dimensionality5.9 Adaptive Kernel Estimation5.10 Naive Bayes Classification5.11 Multivariate Kernel Estimation5.12 Estimation Using Orthogonal Series5.13 Using Monte Carlo Models5.14 Unfolding5.14.1 Unfolding: Regularization6 Basic Concepts and Definitions of Machine Learning6.1 Supervised, Unsupervised, and Semi-Supervised6.2 Tall and Wide Data6.3 Batch and Online Learning6.4 Parallel Learning6.5 Classification and Regression7 Data Preprocessing7.1 Categorical Variables7.2 Missing Values7.3 Outliers7.4 Exercises8 Linear Transformations and Dimensionality Reduction8.1 Centering, Scaling, Reflection and Rotation8.2 Rotation and Dimensionality Reduction8.3 Principal Component Analysis (PCA)of Components8.4 Independent Component Analysis (ICA)8.4.1 Theory8.5 Exercises9 Introduction to Classification9.1 Loss Functions: Hard Labels and Soft Scores9.2 Bias, Variance, and Noise9.3 Training, Validating and Testing: The Optimal Splitting Rule9.4 Resampling Techniques: Cross-Validation and Bootstrap9.5 Data with Unbalanced Classes9.6 Learning with Cost9.7 Exercises10 Assessing Classifier Performance10.1 Classification Error and Other Measures of Predictive Power10.2 Receiver Operating Characteristic (ROC) and Other Curves10.3 Testing Equivalence of Two Classification Models10.4 Comparing Several Classifiers10.5 Exercises11 Linear and Quadratic Discriminant Analysis, Logistic Regression,and Partial Least Squares Regression11.1 Discriminant Analysis11.2 Logistic Regression11.3 Classification by Linear Regression11.4 Partial Least Squares Regression11.5 Example: Linear Models for MAGIC Telescope Data11.6 Choosing a Linear Classifier for Your Analysis11.7 Exercises12 Neural Networks12.1 Perceptrons12.2 The Feed-Forward Neural Network12.3 Backpropagation12.4 Bayes Neural Networks12.5 Genetic Algorithms12.6 Exercises13 Local Learning and Kernel Expansion13.1 From Input Variables to the Feature Space13.2 Regularization13.3 Making and Choosing Kernels13.4 Radial Basis Functions13.5 Support Vector Machines (SVM)13.6 Empirical Local Methods13.7 Kernel Methods: The Good, the Bad and the Curse of Dimensionality13.8 Exercises14 Decision Trees14.1 Growing Trees14.2 Predicting by Decision Trees14.3 Stopping Rules14.4 Pruning Trees14.5 Trees for Multiple Classes14.6 Splits on Categorical Variables14.7 Surrogate Splits14.8 Missing Values14.9 Variable importance14.10 Why Are Decision Trees Good (or Bad)?14.11 Exercises15 Ensemble Learning15.1 Boosting15.2 Diversifying theWeak Learner: Bagging, Random Subspace and Random Forest15.3 Choosing an Ensemble for Your Analysis15.4 Exercises16 Reducing Multiclass to Binary16.1 Encoding16.2 Decoding16.3 Summary: Choosing the Right Design17 How to Choose the Right Classifier for Your Analysis and Apply It Correctly17.1 Predictive Performance and Interpretability17.2 Matching Classifiers and Variables17.3 Using Classifier Predictions17.4 Optimizing Accuracy17.5 CPU and Memory Requirements18 Methods for Variable Ranking and Selection18.1 Definitions18.2 Variable RankingElimination (SBE), and Feature-based Sensitivity of Posterior Probabilities (FSPP)18.3 Variable Selection (BECM)18.4 Exercises19 Bump Hunting in Multivariate Data19.1 Voronoi Tessellation and SLEUTH Algorithm19.2 Identifying Box Regions by PRIM and Other Algorithms19.3 Bump Hunting Through Supervised Learning20 Software Packages for Machine Learning20.1 Tools Developed in HEP20.2 R20.3 MATLAB20.4 Tools for Java and Python20.5 What Software Tool Is Right for You?Appendix A: Optimization AlgorithmsA.1 Line SearchA.2 Linear Programming (LP)