Linear problems like , with
the unknown solution, appear in many applications. In theory, one needs to invert matrix
such that
. When
is very small, for example a 3×3 matrix, one can compute the inverse matrix if its determinant is non-zero. When the matrix is large, a direct inversion of the matrix is in general not a good idea, so other methods must be used.
Since we are interested in , rather than the inverse of
, it is possible to search for
through the minimization of a quadratic function
, for which the solution
of the linear problem, is a minimizer of the quadratic function. To see this, the solution of the minimization problem is located where its its gradient at
becomes zero, provided that the minimization is unique. Since the gradient of the minimization problem is exactly our linear problem, the minimizer solves the linear problem.
A method for solving this minimization problem is for example the Gradient Descent method. Please note that Gradient Descent is in general used for non-linear problems, but can also used for linear problems. Although the method is not very efficient for solving linear problems, it gives a good insight in other methods like the Conjugate Gradient method.
Derivation of Gradient Descent for solving linear problems
As the name of the method suggests, the approximation of solution descents along the gradient of the minimization problem, until it reaches the minimum of the quadratic function. The gradient of
at
is just
. Suppose this gradient direction at
is known as
, one takes a step from
in direction
until a minimum is found, i.e.,
, with
the unknown step-size. At this minimum, the gradient of
and
are perpendicular. Using this observation, the step-size
can be computed as follows:
Update rule for the approximation, with the unknown step-size.
Left-multiply by .
Add .
Please note that residual is the negative gradient of
, so
is computed as follows by using the property that the gradient and
are perpendicular.
Since at the minimum, we have:
Given , both
and
can be computed. In the Gradient Descent method,
is set using
. By repeating this procedure, eventually
reaches the global minimizer for
, and so a solution for the linear problem is obtained. When solving a non-linear problem,
is set small enough to ensure that
, and
is set to
.