Nonlinear Equations

There are many applications in which the goal is to find values for the variables that satisfy a set of given constraints without the need to optimize a particular objective function. When there are \(n\) variables and \(n\) equality constraints, the problem is one of solving a system of nonlinear equations. Mathematically, the problem is
\[f(x) = 0,\] where \(f : \cal{R}^n \rightarrow \cal{R}^n\) is a vector function,
\[f(x) = \left[ \begin{array}{c}
f_1(x) \\ f_2(x) \\ \vdots \\ f_n(x)
\end{array}
\right], \] where each \(f_i(x) : \cal{R}^n \rightarrow \cal{R}\), \(i=1, 2, \cdots, n\) is smooth. A vector \(x^*\) satisfying \(f(x)=0\) is called a solution or a root of the nonlinear equations. In general, a system of nonlinear equations will have no solution, a unique solution or many solutions.

Many algorithms for nonlinear equations are related to algorithms for unconstrained optimization and nonlinear least-squares. There are close connections to the nonlinear least-squares problem since a number of algorithms for nonlinear equations proceed by minimizing the sum of squares of the equations:
\[\min_x \sum_{i=1}^n f_i^2(x).\]

Despite the similarities, there are important differences between algorithms for the two problems. In nonlinear equations, the number of equations is equal to the number of variables and all of the equations must be satisfied at a solution point.

Algorithms

Newton’s method forms the basis for many of the algorithms to solve systems of nonlinear equations. We give a brief overview of Newton’s method and outline some of the related algorithms. For more details, see the sources in the references, for example, Chapter 11 in Nocedal and Wright (1999).

Newton’s method for nonlinear equations uses a linear approximation of the function \(f\) around the current iterate. Given an initial point \(x_0\),
\[f(x) \approx f(x_0) + J(x_0)(x – x_0),\] where \(J(x_0)\) is the Jacobian matrix, the matrix of all first-order partial derivatives of the components of \(f\). To find the vector \(x\) that makes \(f(x) = 0\), we choose the next iterate \(x_1\) so that
\[f(x_0) + J(x_0)(x_1 – x_0) = 0.\] Let \(p_0 = x_1 – x_0\). To find \(p_0\), we solve the system of linear equations
\[J(x_0)\, p_0 = -f(x_0)\] and then set \(x_1 = x_0 + p_0\).

Newton’s Method for Nonlinear Equations

Given an iterate \(x_k\), Newton’s method computes \(f(x_k) \,\) and its Jacobian matrix and finds a step \(p_k\) by solving the system of linear equations
\[J(x_k) \, p_k = – f(x_k) \quad (1) \,\] Then, the new iterate is \(x_{k+1} = x_k + p_k\,\).

Most of the computational cost of Newton’s method is associated with two operations: evaluation — of both the function \(f\,\) and the Jacobian matrix — and solution of the linear system of equations (1). For the Jacobian, the computation of the \(i^{th}\) column requires the computation of the partial derivative of each equation of \(f\) with respect to \(x_i\). The solution of the linear system (1) requires order \(n^3 \,\) operations when the Jacobian is dense.

Newton’s method is guaranteed to converge if the starting point is sufficiently close to the solution and the Jacobian is nonsingular at the solution. Under these conditions, the rate of convergence is quadratic:
\[\| x_{k+1} – x^* \| \leq \beta \| x_k – x^* \|^2,\] for some positive constant \(\beta \,\). This rapid local convergence is the main advantage of Newton’s method. The disadvantages include the calculation of the Jacobian matrix and the potentially erratic behavior when the starting point is not close to a solution (no guaranteed global convergence). Modifications and enhancements to Newton’s method have been developed to deal with these two difficulties.

References
  • Nocedal, J. and Wright, S. J. 1999. Numerical Optimization. Springer-Verlag, New York.