# An Introduction to Optimization

## With Applications to Machine Learning

5. Edition September 2023

672 Pages, Hardcover*Textbook*

**978-1-119-87763-9**

An Introduction to Optimization

Accessible introductory textbook on optimization theory and methods, with an emphasis on engineering design, featuring MATLAB(r) exercises and worked examples

Fully updated to reflect modern developments in the field, the Fifth Edition of An Introduction to Optimization fills the need for an accessible, yet rigorous, introduction to optimization theory and methods, featuring innovative coverage and a straightforward approach. The book begins with a review of basic definitions and notations while also providing the related fundamental background of linear algebra, geometry, and calculus.

With this foundation, the authors explore the essential topics of unconstrained optimization problems, linear programming problems, and nonlinear constrained optimization. In addition, the book includes an introduction to artificial neural networks, convex optimization, multi-objective optimization, and applications of optimization in machine learning.

Numerous diagrams and figures found throughout the book complement the written presentation of key concepts, and each chapter is followed by MATLAB(r) exercises and practice problems that reinforce the discussed theory and algorithms.

The Fifth Edition features a new chapter on Lagrangian (nonlinear) duality, expanded coverage on matrix games, projected gradient algorithms, machine learning, and numerous new exercises at the end of each chapter.

An Introduction to Optimization includes information on:

* The mathematical definitions, notations, and relations from linear algebra, geometry, and calculus used in optimization

* Optimization algorithms, covering one-dimensional search, randomized search, and gradient, Newton, conjugate direction, and quasi-Newton methods

* Linear programming methods, covering the simplex algorithm, interior point methods, and duality

* Nonlinear constrained optimization, covering theory and algorithms, convex optimization, and Lagrangian duality

* Applications of optimization in machine learning, including neural network training, classification, stochastic gradient descent, linear regression, logistic regression, support vector machines, and clustering.

An Introduction to Optimization is an ideal textbook for a one- or two-semester senior undergraduate or beginning graduate course in optimization theory and methods. The text is also of value for researchers and professionals in mathematics, operations research, electrical engineering, economics, statistics, and business.

About the Companion Website xviii

Part I Mathematical Review 1

1 Methods of Proof and Some Notation 3

1.1 Methods of Proof 3

1.2 Notation 5

Exercises 5

2 Vector Spaces and Matrices 7

2.1 Vector and Matrix 7

2.2 Rank of a Matrix 11

2.3 Linear Equations 16

2.4 Inner Products and Norms 18

Exercises 20

3 Transformations 23

3.1 Linear Transformations 23

3.2 Eigenvalues and Eigenvectors 24

3.3 Orthogonal Projections 26

3.4 Quadratic Forms 27

3.5 Matrix Norms 32

Exercises 35

4 Concepts from Geometry 39

4.1 Line Segments 39

4.2 Hyperplanes and Linear Varieties 39

4.3 Convex Sets 41

4.4 Neighborhoods 43

4.5 Polytopes and Polyhedra 44

Exercises 45

5 Elements of Calculus 47

5.1 Sequences and Limits 47

5.2 Differentiability 52

5.3 The Derivative Matrix 54

5.4 Differentiation Rules 57

5.5 Level Sets and Gradients 58

5.6 Taylor Series 61

Exercises 65

Part II Unconstrained Optimization 67

6 Basics of Set-Constrained and Unconstrained Optimization 69

6.1 Introduction 69

6.2 Conditions for Local Minimizers 70

Exercises 78

7 One-Dimensional Search Methods 87

7.1 Introduction 87

7.2 Golden Section Search 87

7.3 Fibonacci Method 91

7.4 Bisection Method 97

7.5 Newton's Method 98

7.6 Secant Method 101

7.7 Bracketing 103

7.8 Line Search in Multidimensional Optimization 103

Exercises 105

8 Gradient Methods 109

8.1 Introduction 109

8.2 Steepest Descent Method 110

8.3 Analysis of Gradient Methods 117

Exercises 126

9 Newton's Method 133

9.1 Introduction 133

9.2 Analysis of Newton's Method 135

9.3 Levenberg-Marquardt Modification 138

9.4 Newton's Method for Nonlinear Least Squares 139

Exercises 142

10 Conjugate Direction Methods 145

10.1 Introduction 145

10.2 Conjugate Direction Algorithm 146

10.2.1 Basic Conjugate Direction Algorithm 146

10.3 Conjugate Gradient Algorithm 151

10.4 Conjugate Gradient Algorithm for Nonquadratic Problems 154

Exercises 156

11 Quasi-Newton Methods 159

11.1 Introduction 159

11.2 Approximating the Inverse Hessian 160

11.3 Rank One Correction Formula 162

11.4 DFP Algorithm 166

11.5 BFGS Algorithm 170

Exercises 173

12 Solving Linear Equations 179

12.1 Least-Squares Analysis 179

12.2 Recursive Least-Squares Algorithm 187

12.3 Solution to a Linear Equation with Minimum Norm 190

12.4 Kaczmarz's Algorithm 191

12.5 Solving Linear Equations in General 194

Exercises 201

13 Unconstrained Optimization and Neural Networks 209

13.1 Introduction 209

13.2 Single-Neuron Training 211

13.3 Backpropagation Algorithm 213

Exercises 222

14 Global Search Algorithms 225

14.1 Introduction 225

14.2 Nelder-Mead Simplex Algorithm 225

14.3 Simulated Annealing 229

14.3.1 Randomized Search 229

14.3.2 Simulated Annealing Algorithm 229

14.4 Particle Swarm Optimization 231

14.4.1 Basic PSO Algorithm 232

14.4.2 Variations 233

14.5 Genetic Algorithms 233

14.5.1 Basic Description 233

14.5.1.1 Chromosomes and Representation Schemes 234

14.5.1.2 Selection and Evolution 234

14.5.2 Analysis of Genetic Algorithms 238

14.5.3 Real-Number Genetic Algorithms 243

Exercises 244

Part III Linear Programming 247

15 Introduction to Linear Programming 249

15.1 Brief History of Linear Programming 249

15.2 Simple Examples of Linear Programs 250

15.3 Two-Dimensional Linear Programs 256

15.4 Convex Polyhedra and Linear Programming 258

15.5 Standard Form Linear Programs 260

15.6 Basic Solutions 264

15.7 Properties of Basic Solutions 267

15.8 Geometric View of Linear Programs 269

Exercises 273

16 Simplex Method 277

16.1 Solving Linear Equations Using Row Operations 277

16.2 The Canonical Augmented Matrix 283

16.3 Updating the Augmented Matrix 284

16.4 The Simplex Algorithm 285

16.5 Matrix Form of the Simplex Method 291

16.6 Two-Phase Simplex Method 294

16.7 Revised Simplex Method 297

Exercises 301

17 Duality 309

17.1 Dual Linear Programs 309

17.2 Properties of Dual Problems 316

17.3 Matrix Games 321

Exercises 324

18 Nonsimplex Methods 331

18.1 Introduction 331

18.2 Khachiyan's Method 332

18.3 Affine Scaling Method 334

18.3.1 Basic Algorithm 334

18.3.2 Two-Phase Method 337

18.4 Karmarkar's Method 339

18.4.1 Basic Ideas 339

18.4.2 Karmarkar's Canonical Form 339

18.4.3 Karmarkar's Restricted Problem 341

18.4.4 From General Form to Karmarkar's Canonical Form 342

18.4.5 The Algorithm 345

Exercises 349

19 Integer Linear Programming 351

19.1 Introduction 351

19.2 Unimodular Matrices 351

19.3 The Gomory Cutting-Plane Method 358

Exercises 366

Part IV Nonlinear Constrained Optimization 369

20 Problems with Equality Constraints 371

20.1 Introduction 371

20.2 Problem Formulation 373

20.3 Tangent and Normal Spaces 374

20.4 Lagrange Condition 379

20.5 Second-Order Conditions 387

20.6 Minimizing Quadratics Subject to Linear Constraints 390

Exercises 394

21 Problems with Inequality Constraints 399

21.1 Karush-Kuhn-Tucker Condition 399

21.2 Second-Order Conditions 406

Exercises 410

22 Convex Optimization Problems 417

22.1 Introduction 417

22.2 Convex Functions 419

22.3 Convex Optimization Problems 426

22.4 Semidefinite Programming 431

22.4.1 Linear Matrix Inequalities and Their Properties 431

22.4.2 LMI Solvers 435

22.4.2.1 Finding a Feasible Solution Under LMI Constraints 436

22.4.2.2 Minimizing a Linear Objective Under LMI Constraints 438

22.4.2.3 Minimizing a Generalized Eigenvalue Under LMI Constraints 440

Exercises 442

23 Lagrangian Duality 449

23.1 Overview 449

23.2 Notation 449

23.3 Primal-Dual Pair 450

23.4 General Duality Properties 451

23.4.1 Convexity of Dual Problem 451

23.4.2 Primal Objective in Terms of Lagrangian 451

23.4.3 Minimax Inequality Chain 452

23.4.4 Optimality of Saddle Point 452

23.4.5 Weak Duality 453

23.4.6 Duality Gap 453

23.5 Strong Duality 454

23.5.1 Strong Duality <==> Minimax Equals Maximin 454

23.5.2 Strong Duality ==> Primal Unconstrained Minimization 455

23.5.3 Strong Duality ==> Optimality 455

23.5.4 Strong Duality ==> KKT (Including Complementary Slackness) 455

23.5.5 Strong Duality ==> Saddle Point 456

23.6 Convex Case 456

23.6.1 Convex Case: KKT ==> Strong Duality 456

23.6.2 Convex Case: Regular Optimal Primal ==> Strong Duality 457

23.6.3 Convex Case: Slater's Condition ==> Strong Duality 457

23.7 Summary of Key Results 457

Exercises 458

24 Algorithms for Constrained Optimization 459

24.1 Introduction 459

24.2 Projections 459

24.3 Projected Gradient Methods with Linear Constraints 462

24.4 Convergence of Projected Gradient Algorithms 465

24.4.1 Fixed Points and First-Order Necessary Conditions 466

24.4.2 Convergence with Fixed Step Size 468

24.4.3 Some Properties of Projections 469

24.4.4 Armijo Condition 470

24.4.5 Accumulation Points 471

24.4.6 Projections in the Convex Case 472

24.4.7 Armijo Condition in the Convex Case 474

24.4.8 Convergence in the Convex Case 480

24.4.9 Convergence Rate with Line-Search Step Size 481

24.5 Lagrangian Algorithms 483

24.5.1 Lagrangian Algorithm for Equality Constraints 484

24.5.2 Lagrangian Algorithm for Inequality Constraints 486

24.6 Penalty Methods 489

Exercises 495

25 Multiobjective Optimization 499

25.1 Introduction 499

25.2 Pareto Solutions 499

25.3 Computing the Pareto Front 501

25.4 From Multiobjective to Single-Objective Optimization 505

25.5 Uncertain Linear Programming Problems 508

25.5.1 Uncertain Constraints 508

25.5.2 Uncertain Objective Function Coefficients 511

25.5.3 Uncertain Constraint Coefficients 513

25.5.4 General Uncertainties 513

Exercises 513

Part V Optimization in Machine Learning 517

26 Machine Learning Problems and Feature Engineering 519

26.1 Machine Learning Problems 519

26.1.1 Data with Labels and Supervised Learning 519

26.1.2 Data Without Labels and Unsupervised Learning 521

26.2 Data Normalization 522

26.3 Histogram of Oriented Gradients 524

26.4 Principal Component Analysis and Linear Autoencoder 526

26.4.1 Singular Value Decomposition 526

26.4.2 Principal Axes and Principal Components of a Data Set 527

26.4.3 Linear Autoencoder 529

Exercises 530

27 Stochastic Gradient Descent Algorithms 537

27.1 Stochastic Gradient Descent Algorithm 537

27.2 Stochastic Variance Reduced Gradient Algorithm 540

27.3 Distributed Stochastic Variance Reduced Gradient 542

27.3.1 Distributed Learning Environment 542

27.3.2 SVRG in Distributed Optimization 543

27.3.3 Communication Versus Computation 545

27.3.4 Data Security 545

Exercises 546

28 Linear Regression and Its Variants 553

28.1 Least-Squares Linear Regression 553

28.1.1 A Linear Model for Prediction 553

28.1.2 Training the Model 554

28.1.3 Computing Optimal w 554

28.1.4 Optimal Predictor and Performance Evaluation 555

28.1.5 Least-Squares Linear Regression for Data Sets with Vector Labels 556

28.2 Model Selection by Cross-Validation 559

28.3 Model Selection by Regularization 562

Exercises 564

29 Logistic Regression for Classification 569

29.1 Logistic Regression for Binary Classification 569

29.1.1 Least-Squares Linear Regression for Binary Classification 569

29.1.2 Logistic Regression for Binary Classification 570

29.1.3 Interpreting Logistic Regression by Log Error 572

29.1.4 Confusion Matrix for Binary Classification 573

29.2 Nonlinear Decision Boundary via Linear Regression 575

29.2.1 Least-Squares Linear Regression with Nonlinear Transformation 576

29.2.2 Logistic Regression with Nonlinear Transformation 578

29.3 Multicategory Classification 580

29.3.1 One-Versus-All Multicategory Classification 580

29.3.2 Softmax Regression for Multicategory Classification 581

Exercises 584

30 Support Vector Machines 589

30.1 Hinge-Loss Functions 589

30.1.1 Geometric Interpretation of the Linear Model 589

30.1.2 Hinge Loss for Binary Data Sets 590

30.1.3 Hinge Loss for Multicategory Data Sets 592

30.2 Classification by Minimizing Hinge Loss 593

30.2.1 Binary Classification by Minimizing Average Hinge Loss 593

30.2.2 Multicategory Classification by Minimizing E hww or E hcs 594

30.3 Support Vector Machines for Binary Classification 596

30.3.1 Hard-Margin Support Vector Machines 596

30.3.2 Support Vectors 598

30.3.3 Soft-Margin Support Vector Machines 599

30.3.4 Connection to Hinge-Loss Minimization 602

30.4 Support Vector Machines for Multicategory Classification 602

30.5 Kernel Trick 603

30.5.1 Kernels 603

30.5.2 Kernel Trick 604

30.5.3 Learning with Kernels 605

30.5.3.1 Regularized Logistic Regression with Nonlinear Transformation for Binary Classification 605

30.5.3.2 Regularized Hinge-Loss Minimization for Binary Classification 606

Exercises 607

31 K-Means Clustering 611

31.1 K-Means Clustering 611

31.2 K-Means++ forCenterInitialization 615

31.3 Variants of K-Means Clustering 617

31.3.1 K-Means Clustering Based on 1-Norm Regularization 617

31.3.2 PCA-Guided K-Means Clustering 619

31.4 Image Compression by Vector Quantization and K-Means Clustering 622

Exercises 623

References 627

Index 635

Wu-Sheng Lu, PhD, is Professor Emeritus of Electrical and Computer Engineering at the University of Victoria, Canada. He is a Fellow of the IEEE and former Associate Editor of the IEEE Transactions on Circuits and??Systems.

Stanislaw H. {ak, PhD, is Professor in the School of Electrical and Computer Engineering at Purdue University. He is former Associate Editor of Dynamics and Control and the IEEE Transactions on Neural Networks.