2  Optimization

The optimization plays an important role in statistical computing, especially in the context of maximum likelihood estimation (MLE) and other statistical inference methods. This chapter will cover various optimization techniques used in statistical computing.

For instance, for a linear regression \[ y = X\boldsymbol{\beta}+ \varepsilon. \] From regression class, we know that the (ordinary) least-squares estimation (OLE) for \(\boldsymbol{\beta}\) is given by \(\hat{\boldsymbol{\beta}}=(X^\top X)^{-1} X^\top y\). It is convenient as the solution is in the closed-form! However, in the most case, the closed-form solutions will not be available.

For GLMs or non-linear regression, we need to do this iterativelly!

2.1 Speed comparison

set.seed(2017-07-13)
X <- matrix(rnorm(5000 * 100), 5000, 100)
y <- rnorm(5000)
library(microbenchmark)
microbenchmark(solve(t(X) %*% X) %*% t(X) %*% y)
Warning in microbenchmark(solve(t(X) %*% X) %*% t(X) %*% y): less accurate
nanosecond times to avoid potential integer overflows
Unit: milliseconds
                             expr      min       lq     mean   median       uq
 solve(t(X) %*% X) %*% t(X) %*% y 28.57097 29.23575 30.04752 29.87713 30.62448
      max neval
 32.21641   100
microbenchmark(solve(t(X) %*% X) %*% t(X) %*% y,
               solve(crossprod(X), crossprod(X, y)))
Unit: milliseconds
                                 expr      min       lq     mean   median
     solve(t(X) %*% X) %*% t(X) %*% y 28.39886 29.03231 29.73657 29.41512
 solve(crossprod(X), crossprod(X, y)) 24.96937 25.01504 25.25663 25.05457
       uq      max neval
 30.23379 32.94231   100
 25.16457 28.50168   100

2.2 Type of Optimization Algorithms

There are in general two types of the optimization algorithms: (1). deterministic and (2). metaheuristic. Deterministic and metaheuristic algorithms represent two distinct paradigms in optimization. Deterministic methods, such as gradient descent, produce the same solution for a given input and follow a predictable path toward an optimum. In contrast, metaheuristic approaches—like genetic algorithms—incorporate randomness and do not guarantee the best possible solution. However, they are often more effective at avoiding local optima and exploring complex search spaces.

2.3 Heuristic Algorithms

Many of the heuristic algorithms are inspired by the nature, such as the genetic algorithm (GA) and particle swarm optimization (PSO). These algorithms are often used for complex optimization problems where traditional methods may struggle to find a solution. Some of the popular heuristic algorithms include:

  • Genetic Algorithm (GA)
  • Particle Swarm Optimization (PSO)
  • Simulated Annealing (SA)
  • Ant Colony Optimization (ACO)

2.4 Deterministic Algorithms

Numerical approximation, what you learned in the mathmeatical optimization course. Some of the algorithms include:

  • Gradient Descent
  • Newton’s Method
  • Conjugate Gradient Method
  • Quasi-Newton Methods (e.g., BFGS)
  • Interior Point Methods

They often reply on the KKT conditions.


Examples are borrowed from the following sources: