Systems of non-linear equations#

General form#

The general form of a system of \(N\) non-linear equations is

\[\begin{split} f_1(x_1,\ldots,x_N) = 0,\\ f_2(x_1,\ldots,x_N) = 0,\\ \ldots \\ f_N(x_1,\ldots,x_N) = 0 \end{split}\]

Denoting \({\bf f} = (f_1,\ldots,f_N)\) and \({\bf x} = (x_1,\ldots,x_N)\) this can be written in compact form using vector notation

\[ {\bf f}({\bf x}) = 0~. \]

As an example of a system of non-linear equations consider the intersection point (x,y) of functions \(y = x + \exp(-x) - 2\) and \(y = x^3 - x - 3\). This can be cast as a solution to a system of two non-linear equations

\[\begin{split} x + \exp(-x) - 2 - y = 0,\\ x^3 - x - 3 - y = 0. \end{split}\]

This implies \((x,y) = (x_1,x_2)\),

\[ f_1(x_1,x_2) = x_1 + \exp(-x_1) - 2 - x_2 \]

and

\[ f_2(x_1,x_2) = x_1^3 - x_1 - 3 - x_2. \]

For illustration, let us cast these equations in a form

\[\begin{align*} x_2 &= \tilde f_1(x_1)\\ x_2 &= \tilde f_2(x_1) \end{align*}\]

where \(\tilde f_1(x_1) = x_1 + \exp(-x_1) - 2\) and \(\tilde f_2(x_1) = x_1^3 - x_1 - 3\).

Hide code cell source
import numpy as np

def y1f(x):
    return x + np.exp(-x) - 2.

def y2f(x):
    return x**3 - x - 3.

xref = np.linspace(-1,2,100)
y1ref = y1f(xref)
y2ref = y2f(xref)

plt.xlabel("${x_1}$")
plt.ylabel("${x_2}$")
plt.plot(xref,y1ref,label = '$x_2 = {\\tilde f_1(x_1)}$')
plt.plot(xref,y2ref,label = '$x_2 = {\\tilde f_2(x_1)}$')
plt.plot([1.64998819 ], [-0.15795963], 'ro')
plt.legend()
plt.show()
../_images/44730a7a2066d477744453a5043915f09870dd8376f8ccaaa969a271d72e89e7.png

Let us implement a vector function \(\mathbf{f}(\mathbf{x})\).

In addition, we can also introduce a scalar-valued objective function

\[ \tilde{f}(\mathbf{x}) = \frac{\mathbf{f}(\mathbf{x}) \cdot \mathbf{f}(\mathbf{x})}{2} \]

which is equal to zero at the root. Finding roots is thus similar to minimizing this function

def f(x):
    return np.array([x[0] + np.exp(-x[0]) - 2. - x[1], x[0]**3 - x[0] - 3 - x[1]])

def ftil(f):
    return np.dot(f,f) / 2.
../_images/f2b63430d37385adf00b2ffc58c5e462579e32bdd311d18b0356623484e31130.png

Newton-Raphson method can be readily generalized to multi-dimensional systems of equations. There is also a multi-dimensional version of the secant method called Broyden’s method.

Let us explore these methods

Newton method#

Recall the Taylor expansion of a function \(f\) around the root \(x^*\) in one-dimensional case:

\[ f(x^*) \approx f(x) + f'(x) (x^* - x) \]

The multi-dimensional version of this expansion reads

\[ {\bf f} ({\bf x^*}) \approx {\bf f}({\bf x}) + J(\bf{x}) ({\bf x^*} - {\bf x}) \]

Here \(J(\mathbf{x})\) is the Jacobian, i.e. a \(N \times N\) matrix of derivatives evaluated at \(\bf{x}\)

\[ J_{ij}(\mathbf{x}) = \frac{\partial f_i}{\partial x_j}~. \]

Given that \(\mathbf{f}(\mathbf{x^*}) = 0\), we have

\[ J(\mathbf{x}) (\mathbf{x^*} - \mathbf{x}) \approx -\mathbf{f}(\mathbf{x}), \]

which is a system of linear equations for \(\mathbf{x^*} - \mathbf{x}\). Solving this system yields

\[ \mathbf{x^*} \approx \mathbf{x} - J^{-1}(\mathbf{x}) \, \mathbf{f}(\mathbf{x})~. \]

Here \(J^{-1}(\mathbf{x})\) is the inverse Jacobian matrix.

The multi-dimensional Newton’s method is an iterative procedure

\[ \mathbf{x_{n+1}} = \mathbf{x_n} - J^{-1}(\mathbf{x_n}) \, \mathbf{f}(\mathbf{x_n})~. \]
import numpy as np

last_newton_iterations = 0
newton_verbose = True

def newton_method_multi(
    f, 
    jacobian, 
    x0, 
    accuracy=1e-8, 
    max_iterations=100):
    x = x0
    global last_newton_iterations
    last_newton_iterations = 0
    
    if newton_verbose:
        print("Iteration: ", last_newton_iterations)
        print("x = ", x0)
        print("f = ", f(x0))
        print("|f| = ", ftil(f(x0)))
        
    for i in range(max_iterations):
        last_newton_iterations += 1
        f_val = f(x)
        jac = jacobian(x)
        jinv = np.linalg.inv(jac)
        delta = np.dot(jinv, -f_val)
        x = x + delta
        
        if newton_verbose:
            print("Iteration: ", last_newton_iterations)
            print("x = ", x)
            print("f = ", f(x))
            print("|f| = ", ftil(f(x)))
            
        if np.linalg.norm(delta, ord=2) < accuracy:
            return x
    return x

Let us perform the calculation for our example

%%time

def f(x):
    return np.array([x[0] + np.exp(-x[0]) - 2. - x[1], x[0]**3 - x[0] - 3 - x[1]])

def jacobian(x):
    return np.array([[1. - np.exp(-x[0]), -1.], [3*x[0]**2 - 1., -1]])

x0 = np.array([0., 0.])
root = newton_method_multi(f, jacobian, x0)
Iteration:  0
x =  [0. 0.]
f =  [-1. -3.]
|f| =  5.0
Iteration:  1
x =  [-2. -1.]
f =  [ 4.3890561 -8.       ]
|f| =  41.63190671978017
Iteration:  2
x =  [-1.28753717 -1.16290888]
f =  [ 1.49922234 -2.68397122]
|f| =  4.725684585141539
Iteration:  3
x =  [-0.65344197 -1.32745762]
f =  [ 0.59616108 -1.29811125]
|f| =  1.0202504224353612
Iteration:  4
x =  [ 0.92104475 -2.18320228]
f =  [ 1.50234993 -0.95649863]
|f| =  1.5859724775723376
Iteration:  5
x =  [3.52831703 0.88845726]
f =  [ 0.66921404 36.50731858]
|f| =  666.6160786805304
Iteration:  6
x =  [2.51526632 0.57435797]
f =  [0.02174973 9.82337075]
|f| =  48.249542955733645
Iteration:  7
x =  [1.94074619 0.06803257]
f =  [0.01631038 2.30103357]
|f| =  2.647510761586109
Iteration:  8
x =  [ 1.69879947 -0.122861  ]
f =  [0.00456345 0.32666032]
|f| =  0.053363893975197814
Iteration:  9
x =  [ 1.65171385 -0.15677108]
f =  [0.00020597 0.01119461]
|f| =  6.268087547408287e-05
Iteration:  10
x =  [ 1.64999046 -0.15795808]
f =  [2.84874536e-07 1.47119183e-05]
|f| =  1.0826084693606538e-10
Iteration:  11
x =  [ 1.64998819 -0.15795963]
f =  [4.94243535e-13 2.54747057e-11]
|f| =  3.2460245303301063e-22
Iteration:  12
x =  [ 1.64998819 -0.15795963]
f =  [ 0.00000000e+00 -6.66133815e-16]
|f| =  2.2186712959340957e-31
CPU times: user 5.36 ms, sys: 107 μs, total: 5.47 ms
Wall time: 1.26 ms
../_images/a09d68ac7245c2572a9d8e00744e25d5c22d60ab765355b0000ffca469d54adb.png

The method converged to the root in 12 iterations. The properties of multivariate Newton method are similar to the univariate case. There are a couple of drawbacks:

  • The Jacobian matrix must be computed at each iteration

  • The inversion of the Jacobian matrix is required at each iteration

Both operations can be expensive for large systems.

Broyden method#

Broyden method is a generalization of the secant method to multiple dimensions. It is a quasi-Newton method that approximates the Jacobian by finite differences (see lecture notes). With each iteration, it updates the Jacobian approximation using the secant update rule. The method can also be adjusted to directly update the inverse of the Jacobian, avoiding the need to invert the Jacobian at each iteration.

Secant Method (1D)#

The secant method for a single variable function approximates the derivative using:

\[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\]

where the derivative is approximated by:

\[f'(x_n) \approx \frac{f(x_n) - f(x_{n-1})}{x_n - x_{n-1}}\]

Broyden’s Method (Multi-dimensional)#

Broyden’s method extends this concept to vector functions:

\[\mathbf{x}_{n+1} = \mathbf{x}_n - J^{-1}(\mathbf{x}_n)\mathbf{f}(\mathbf{x}_n)\]

where the Jacobian approximation satisfies:

\[J(\mathbf{x}_n)(\mathbf{x}_n - \mathbf{x}_{n-1}) \approx \mathbf{f}(\mathbf{x}_n) - \mathbf{f}(\mathbf{x}_{n-1})\]

Jacobian Update#

The key insight of Broyden’s method is how it updates the Jacobian approximation. Rather than recalculating the entire Jacobian matrix at each step (which would be computationally expensive), Broyden’s method uses a rank-one update:

\[J_n = J_{n-1} + \frac{\Delta \mathbf{f}_n - J_{n-1}\Delta \mathbf{x}_n}{\|\Delta \mathbf{x}_n\|^2}\Delta \mathbf{x}_n^T\]

Where:

  • \(\Delta \mathbf{x}_n = \mathbf{x}_n - \mathbf{x}_{n-1}\) (the step in x)

  • \(\Delta \mathbf{f}_n = \mathbf{f}_n - \mathbf{f}_{n-1}\) (the change in function values)

  • \(J_n\) is the approximation of the Jacobian at step n

This update formula ensures that the new Jacobian approximation \(J_n\) satisfies the secant equation:

\[J_n \Delta \mathbf{x}_n = \Delta \mathbf{f}_n\]

Initial Jacobian#

For the initial Jacobian \(J_0\), there are two common approaches:

  1. Calculate the exact Jacobian at the initial point \(J(\mathbf{x}_0)\) - This requires derivatives but provides more accurate initial steps and potentially faster convergence.

  2. Initialize with the identity matrix \(J(\mathbf{x}_0) = I\) - This requires no derivative calculations but may converge more slowly.

Algorithm Steps#

  1. Choose an initial guess \(\mathbf{x}_0\) and initialize the Jacobian \(J_0\) (either by calculation or using the identity matrix)

  2. For each iteration until convergence:

    • Compute \(\mathbf{f}(\mathbf{x}_n)\)

    • Update \(\mathbf{x}_{n+1} = \mathbf{x}_n - J^{-1}(\mathbf{x}_n)\mathbf{f}(\mathbf{x}_n)\)

    • Compute \(\Delta \mathbf{x}_n = \mathbf{x}_{n+1} - \mathbf{x}_n\) and \(\Delta \mathbf{f}_n = \mathbf{f}(\mathbf{x}_{n+1}) - \mathbf{f}(\mathbf{x}_n)\)

    • Update the Jacobian using the Broyden formula

    • Check for convergence

Advantages and Limitations#

Advantages:

  • Requires only one Jacobian evaluation (or none if using identity matrix initialization)

  • Converges superlinearly under appropriate conditions

  • Significantly reduces computational cost compared to Newton’s method for large systems

Limitations:

  • The solution for \(J(\mathbf{x}_n)\) is not unique

  • Convergence is typically slower than Newton’s method

  • The direct version of the method requires matrix inversion at each step

import numpy as np

last_broyden_iterations = 0
broyden_verbose = True

# Implementation of direct Broyden's method 
# (performs matrix inversion at each step)
def broyden_method_direct(
    f, 
    x0, 
    jacobian=None,
    accuracy=1e-8, 
    max_iterations=100):
    global last_broyden_iterations
    last_broyden_iterations = 0
    x = x0
    n = x0.shape[0]
    if jacobian is None:
        J = np.eye(n)
    else:
        J = jacobian(x)
    
        
    for i in range(max_iterations):
        last_broyden_iterations += 1
        f_val = f(x)
        Jinv = np.linalg.inv(J)
        delta = np.dot(Jinv, -f_val)
        x = x + delta
        if np.linalg.norm(delta, ord=2) < accuracy:
            return x
        f_new = f(x)
        u = f_new - f_val
        v = delta
        J = J + np.outer(u - J.dot(v), v) / np.dot(v, v)
            
    return x

Let us apply the direct Broyden’s method to the same system of equations as in the previous example.

First we do not compute the initial Jacobian, but initialize it to the identity matrix.

%%time

x0 = np.array([0., 0.])
root = broyden_method_direct(f, x0, jacobian=None)  # By providing None as the jacobian, we initialize it to the identity matrix
print("Number of iterations: ", last_broyden_iterations)
print("Root: ", root)
Number of iterations:  55
Root:  [ 1.64998819 -0.15795963]
CPU times: user 5.31 ms, sys: 611 μs, total: 5.92 ms
Wall time: 1.5 ms

The method converged to the root, but it took 55 iterations to reach the desired accuracy compared to 12 iterations for the Newton method.

Let us apply the method where we compute the initial Jacobian and its inverse explicitly.

%%time

x0 = np.array([0., 0.])
root = broyden_method_direct(f, x0, jacobian=jacobian) # By providing the jacobian, we compute it in the first iteration explicitly
print("Number of iterations: ", last_broyden_iterations)
print("Root: ", root)
Number of iterations:  16
Root:  [ 1.64998819 -0.15795963]
CPU times: user 2.14 ms, sys: 333 μs, total: 2.47 ms
Wall time: 599 μs

Now the method has converged much faster.

Avoiding Matrix Inversion in Broyden’s Method#

One of the key implementation considerations in Broyden’s method is to avoid explicitly computing the inverse of the Jacobian matrix. The Sherman-Morrison formula provides a solution to this problem by directly updating the inverse Jacobian.

Instead of computing \(J_n\) and then inverting it, we can directly update the inverse Jacobian \(J_n^{-1}\) using the Sherman-Morrison formula:

\[J_n^{-1} = J_{n-1}^{-1} + \frac{\Delta \mathbf{x}_n - J_{n-1}^{-1}\Delta \mathbf{f}_n}{\Delta \mathbf{x}_n^T J_{n-1}^{-1} \Delta \mathbf{f}_n}\Delta \mathbf{x}_n^T J_{n-1}^{-1}\]

This formula allows us to update the inverse Jacobian directly, avoiding the computationally expensive matrix inversion operation at each iteration. This improves the efficiency of the algorithm by reducing the number of operations required to compute the inverse Jacobian: instead of \(O(n^3)\) operations for matrix inversion, we only need \(O(n^2)\) operations for the Sherman-Morrison update.

This results in the following implementation of the full Broyden’s method:

import numpy as np

last_broyden_iterations = 0
broyden_verbose = True


# Implementation of Broyden's method using Sherman-Morrison formula
def broyden_method(
    f, 
    x0, 
    jacobian=None,
    accuracy=1e-8, 
    max_iterations=100):
    global last_broyden_iterations
    last_broyden_iterations = 0
    x = x0
    n = x0.shape[0]
    if jacobian is None:
        Jinv = np.eye(n)
    else:
        Jinv = np.linalg.inv(jacobian(x))
    
    if broyden_verbose:
        print("Iteration: ", last_broyden_iterations)
        print("x = ", x0)
        print("f = ", f(x0))
        print("|f| = ", ftil(f(x0)))
        
    for i in range(max_iterations):
        last_broyden_iterations += 1
        f_val = f(x)
        delta = -Jinv.dot(f_val)
        x = x + delta
        if np.linalg.norm(delta, ord=2) < accuracy:
            return x
        f_new = f(x)
        df = f_new - f_val
        dx = delta
        Jinv = Jinv + np.outer(dx - Jinv.dot(df), dx.T.dot(Jinv)) / np.dot(dx.T, Jinv.dot(df))
        
        if broyden_verbose:
            print("Iteration: ", last_broyden_iterations)
            print("x = ", x)
            print("f = ", f(x))
            print("|f| = ", ftil(f(x)))
            
    return x
%%time

x0 = np.array([0., 0.])
broyden_verbose = False
root = broyden_method(f, x0, jacobian=None, accuracy=1.e-8, max_iterations=5000)
print("Number of iterations: ", last_broyden_iterations)
print("Root: ", root)
Number of iterations:  55
Root:  [ 1.64998819 -0.15795963]
CPU times: user 5.05 ms, sys: 1.74 ms, total: 6.79 ms
Wall time: 1.83 ms

The method took the same number of iterations as the previous one, but avoiding matrix inversion made it considerably faster.

Similarly, we can specify the Jacobian explicitly to have faster convergence.

%%time

x0 = np.array([0., 0.])
broyden_verbose = False
root = broyden_method(f, x0, jacobian=jacobian, accuracy=1.e-8, max_iterations=5000)
print("Number of iterations: ", last_broyden_iterations)
print("Root: ", root)
Number of iterations:  16
Root:  [ 1.64998819 -0.15795963]
CPU times: user 3.18 ms, sys: 513 μs, total: 3.69 ms
Wall time: 842 μs