FNC Sections 2.8 to 2.9

Andrew Pua

2023-09-07

Learning objectives

  • Work out an intuitive measure of stability called the condition number (applied to systems of linear equations)
  • Explore the effect of additional structure on the implementation and computational time of LU factorization

Why should we care about the condition number

  • A way to quantify stability of outputs with respect to small changes in inputs
  • Play a role in error bounds
  • Can indicate whether your square matrix is effectively singular from a numerical point of view
  • A way to signal to us that we may have to back to the drawing table

Key theory: Conditioning

  • Definition 1.2 contains a working definition of the condition number but it does not apply to the case of vectors and matrices.

    • Section 1.2 also shows that in one-dimensional cases, the condition number is an elasticity.
    • Relative (or percentage) change in output for every relative change in input
  • Once again the context is solving systems of linear equations \(\mathbf{Ax}=\mathbf{b}\).
  • Section 2.7 about vector/matrix norms + the elasticity interpretation gives error bounds for the effects of perturbations of \(\mathbf{b}\) or \(\mathbf{A}\) on \(\mathbf{x}\).
  • If \(\mathbf{b}\) is perturbed to \(\mathbf{b}+\Delta\mathbf{b}\) , then \[\frac{||\Delta x||}{||x||} \leq \kappa\left(\mathbf{A}\right) \frac{||\Delta\mathbf{b}||}{||\mathbf{b}||}.\]
  • If \(\mathbf{A}\) is perturbed to \(\mathbf{A}+\Delta\mathbf{A}\) , then \[\frac{||\Delta x||}{||x||} \leq \kappa\left(\mathbf{A}\right) \frac{||\Delta\mathbf{A}||}{||\mathbf{A}||},\] in the limit \(||\Delta\mathbf{A}||\to 0\).
  • Four things to observe / note:

    • Upper bound depends only on a function of \(\mathbf{A}\).
    • What is \(\kappa\left(\mathbf{A}\right)\)?
    • Observation 2.8.5: When solving a linear system, all that can be expected is that the backward error, not the error, is small. (We really do not get to see the exact solution \(\mathbf{x}\).)
    • The residual (Definition 2.8.4) and backward error (Definition 1.4.4) are the same thing for a linear system.
  • \(\kappa\left(\mathbf{A}\right)\) is then called the condition number. It has a closed form: \[\kappa\left(\mathbf{A}\right)=||\mathbf{A}^{-1}|| \cdot ||\mathbf{A}||\]

    • Depends on the norm used
    • Goes to \(\infty\) if \(\mathbf{A}\) is singular
    • \(\log_{10} \kappa\left(\mathbf{A}\right)\) rounded up is a rough measure of how many digits of accuracy were lost

Key theory: Structured matrices

  • Special structures of matrices can help reduce computational times, but their special structure has to be incorporated into the algorithm.

    • Let \(\mathbf{A}\) be a symmetric matrix, i.e. \(\mathbf{A}^T=\mathbf{A}\).
    • The restrictions implied by \(\mathbf{A}^T=\mathbf{A}\) will not be used if you simply proceeded with \(\mathbf{A}=\mathbf{LU}\).
    • You need a factorization which preserves symmetry.
  • Let \(\mathbf{L}\) be a unit lower triangular matrix. It seems that \(\mathbf{L}\mathbf{L}^T\) is a good candidate to preserve symmetry.

    • But there are not enough “independent pieces of information” to pin down some of the entries of \(\mathbf{L}\) uniquely.
    • If you remove the “unit” requirement that would work, but then it brings in extra restrictions not implied by symmetry. For example, the main diagonal entries are restricted to be positive and they are not “free to vary”.
  • So an option would be to use \(\mathbf{LDL}^T\) instead, where \(\mathbf{D}\) is a diagonal matrix.

  • If you have a symmetric positive definite (SPD) matrix, you can use a similar argument as earlier.

    • Idea is to impose the restrictions implied by positive definiteness to an \(\mathbf{LDL}^T\) factorization.
    • Need \(\mathbf{x}^T\mathbf{LDL}^T\mathbf{x} >0\) for all \(\mathbf{x}\neq \mathbf{0}\).
    • For that to happen, need \(\mathbf{D}\) to have positive main diagonal entries.
    • You obtain a Cholesky factorization as a result.
  • Contrast computing times (\(n\) being the order of a square matrix):

    • LU factorization for tridiagonal matrices: \(O\left(n\right)\) flops
    • LU factorization: \(\sim 2n^3/3\) flops
    • Factorization for symmetric matrices: \(\sim n^3/3\) flops
    • Cholesky factorization for SPD matrices: \(\sim n^3/3\) flops
  • For last three: No slope change, but a parallel intercept shift on a log10-log10 plot of computational time vs \(n\)

Special matrix structures encountered in Chapter 2

  • “Numerically nice”: diagonally dominant, symmetric positive definite

  • Sparse: diagonal, lower triangular, upper triangular, tridiagonal, banded

  • “Deadly”: Vandermonde, Hilbert (different from Hilbertian, refer also to Exercise 2.8.1)

Load Julia for Rmd/qmd

if (!require('JuliaCall')) stop("You do not have JuliaCall. It can be installed using the following command: devtools::install_github('Non-Contradiction/JuliaCall')")
# change this for your situation
julia_setup(JULIA_HOME = "/home/apua/julia-1.9.2/bin")

Julia commands

  • Generating some random matrices: rand(), randn()
  • Calculating norms: norm(), opnorm(), normalize()
  • Extract diagonal entry with options for above / below main diagonal: diag()
  • Sparse banded matrices: FNC.spdiagm()
  • Cholesky factorization: FNC.cholesky()

Learn Julia by revisiting Exercise 2.3.7

Problem: Consider the following system of linear equations: \[\underbrace{\begin{bmatrix}1 & -1 & 0 & \alpha-\beta & \beta \\ 0 & 1 & -1 & 0 & 0 \\ 0 & 0 & 1 & -1 & 0 \\ 0 & 0 &0 &1 & -1 \\ 0 & 0 & 0 & 0 & 1\end{bmatrix}}_{\mathbf{A}}\underbrace{\begin{bmatrix}x_1\\x_2\\x_3\\x_4\\x_5\end{bmatrix}}_{\mathbf{x}}=\underbrace{\begin{bmatrix}\alpha \\0 \\ 0 \\ 0 \\ 1\end{bmatrix}}_{\mathbf{b}}\]

Recall:

  • \(x_1=x_2=\cdots=x_5=1\) is the analytical solution using backward substitution.
  • This is an example of subtractive cancellation.
  • Code exactly the same as before, but now use condition numbers as a “summary”.
using FundamentalsNumericalComputation
AbsAcc = [];
CondNum = [];
α = 0.1;
β = 10 .^ (1:12);
for β in β
  A = FNC.diagm(0 => ones(5), 1 => [-1,-1,-1,-1], 3 =>-β,0], 4 => [β])
  x = FNC.backsub(A, [α, 0, 0, 0, 1])
  push!(AbsAcc, abs(x[1] - 1.))
  push!(CondNum, FNC.cond(A))
end
FNC.pretty_table((β=β,AbsAcc=AbsAcc, CondNum=CondNum),["β" "|x₁-1|"  "κ(A)"])
┌───────────────┬─────────────┬────────────┐
│             β │      |x₁-1| │       κ(A) │
├───────────────┼─────────────┼────────────┤
│            10 │ 4.44089e-16 │    160.421 │
│           100 │ 5.77316e-15 │    14268.9 │
│          1000 │ 2.27596e-14 │  1.41542e6 │
│         10000 │  3.6382e-13 │  1.41433e8 │
│        100000 │ 5.82079e-12 │ 1.41423e10 │
│       1000000 │ 2.32832e-11 │ 1.41421e12 │
│      10000000 │ 3.72529e-10 │ 1.41421e14 │
│     100000000 │  5.96046e-9 │ 1.41421e16 │
│    1000000000 │  2.38419e-8 │ 1.41421e18 │
│   10000000000 │   3.8147e-7 │ 1.41421e20 │
│  100000000000 │  6.10352e-6 │ 1.41421e22 │
│ 1000000000000 │  2.44141e-5 │ 1.41421e24 │
└───────────────┴─────────────┴────────────┘

Exercise 2.8.5: unpivoted LU factorization is unstable

Problem: Let \[\mathbf{A}=\begin{bmatrix}-\epsilon & 1 \\ 1 & -1\end{bmatrix}.\] Without pivoting, an LU factorization (done by hand) is given by \[\mathbf{L}=\begin{bmatrix}1 & 0 \\ -\epsilon^{-1} & 1\end{bmatrix},\ \ \mathbf{U}=\begin{bmatrix}-\epsilon & 1 \\ 0 & \epsilon^{-1}-1\end{bmatrix}.\]

For \(\epsilon = 10^{-2},10^{-4},\ldots,10^{-10}\), make a table with columns for \(\epsilon\), \(\kappa(\mathbf{A})\), \(\kappa(\mathbf{L})\), and \(\kappa(\mathbf{U})\).

using FundamentalsNumericalComputation
CondNumA = []; CondNumL = []; CondNumU = [];
ϵ = 10. .^ (-2:-2:-10);
for ϵ in ϵ
  A = [-ϵ 1; 1 -1];
  L, U = FNC.lufact(A);
  push!(CondNumA, FNC.cond(A));
  push!(CondNumL, FNC.cond(L));
  push!(CondNumU, FNC.cond(U));
end
FNC.pretty_table((ϵ=ϵ,CondNumA=CondNumA, CondNumL=CondNumL,CondNumU=CondNumU),["ϵ" "κ(A)" "κ(L)" "κ(U)"])
┌─────────┬─────────┬─────────┬────────────┐
│       ϵ │    κ(A) │    κ(L) │       κ(U) │
├─────────┼─────────┼─────────┼────────────┤
│    0.01 │ 2.65355 │ 10002.0 │    9901.01 │
│  0.0001 │ 2.61839 │   1.0e8 │    9.999e7 │
│  1.0e-6 │ 2.61804 │  1.0e12 │ 9.99999e11 │
│  1.0e-8 │ 2.61803 │  1.0e16 │     1.0e16 │
│ 1.0e-10 │ 2.61803 │  1.0e20 │     1.0e20 │
└─────────┴─────────┴─────────┴────────────┘

Reproducing the last part of Demo 2.8.3

  • Not sure about your experience but… Last part of Demo 2.8.3 stating

    • relative_error = norm(Δx) / norm(x) = 2.3271868502658917
    • “As anticipated, the solution has zero accurate digits in the 2-norm.”
  • Are incompatible statements!

# Hilbert matrix
A = [ 1/(i+j) for i=1:14, j=1:14 ]
14×14 Matrix{Float64}:
 0.5        0.333333   0.25       …  0.0769231  0.0714286  0.0666667
 0.333333   0.25       0.2           0.0714286  0.0666667  0.0625
 0.25       0.2        0.166667      0.0666667  0.0625     0.0588235
 0.2        0.166667   0.142857      0.0625     0.0588235  0.0555556
 0.166667   0.142857   0.125         0.0588235  0.0555556  0.0526316
 0.142857   0.125      0.111111   …  0.0555556  0.0526316  0.05
 0.125      0.111111   0.1           0.0526316  0.05       0.047619
 0.111111   0.1        0.0909091     0.05       0.047619   0.0454545
 0.1        0.0909091  0.0833333     0.047619   0.0454545  0.0434783
 0.0909091  0.0833333  0.0769231     0.0454545  0.0434783  0.0416667
 0.0833333  0.0769231  0.0714286  …  0.0434783  0.0416667  0.04
 0.0769231  0.0714286  0.0666667     0.0416667  0.04       0.0384615
 0.0714286  0.0666667  0.0625        0.04       0.0384615  0.037037
 0.0666667  0.0625     0.0588235     0.0384615  0.037037   0.0357143
# Exact known solution
x = 1:14; b = A*x; Δx = A\b - x; 
# Relative error
norm(Δx) / norm(x)
16.957912978061536

Exercise 2.8.2: Verifying error bound

Problem. Demonstrate the inequality \[\frac{||\Delta x||}{||x||} \leq \kappa\left(\mathbf{A}\right) \frac{||\Delta\mathbf{b}||}{||\mathbf{b}||}\] using

  • a specific type of matrix \(\mathbf{A}\)
  • machine-precision \(||\Delta\mathbf{b}|| / ||\mathbf{b}||\)
  • known solution \(\mathbf{x}\)

What the matrix \(\mathbf{A}\) looks like for this problem:

FNC.matrixdepot("prolate", 2, 0.4)
2×2 Matrix{Float64}:
  0.8       -0.605461
 -0.605461   0.8
FNC.matrixdepot("prolate", 4, 0.4)
4×4 Matrix{Float64}:
  0.8       -0.605461   0.908192  -0.748391
 -0.605461   0.8       -0.605461   0.908192
  0.908192  -0.605461   0.8       -0.605461
 -0.748391   0.908192  -0.605461   0.8
using FundamentalsNumericalComputation
n = 10:10:70;
RelErr = []; RHS = [];
for n in n
  A = FNC.matrixdepot("prolate", n, 0.4);
  x = (1:n)/n;
  b = A*x;
  xtilde = A \ b;
  push!(RelErr, norm(x - xtilde) / norm(x));
  push!(RHS, FNC.cond(A)*eps());
end
FNC.pretty_table((n=n,RelErr=RelErr,RHS=RHS),["n" "Relative error of x" "Error bound"])
┌────┬─────────────────────┬─────────────┐
│  n │ Relative error of x │ Error bound │
├────┼─────────────────────┼─────────────┤
│ 10 │         1.25186e-15 │ 2.42709e-14 │
│ 20 │         3.89761e-13 │ 1.98397e-11 │
│ 30 │          6.3531e-15 │ 2.29054e-13 │
│ 40 │         1.28347e-14 │ 5.56033e-13 │
│ 50 │         4.65616e-14 │ 8.94772e-13 │
│ 60 │         5.84773e-14 │ 6.93715e-12 │
│ 70 │         8.35051e-14 │ 4.13638e-12 │
└────┴─────────────────────┴─────────────┘

Exercise 2.9.5: Running times for tridiagonal matrices

using FundamentalsNumericalComputation;
n = 1000:200:3000;
t1 = []; t2 = [];
for n in n
    A = triu( tril(rand(n,n),1), -1);
    b = ones(n);
    AltA = FNC.Tridiagonal(rand(n,n))
    time1 = @elapsed for j in 1:50; x = A \ b; end
    time2 = @elapsed for j in 1:50; x = AltA \b; end
    push!(t1,time1);
    push!(t2,time2);
end
scatter(n,t1,label="triu/tril",legend=:topleft,
    xaxis=(:log10,L"n"), yaxis=(:log10,"elapsed time"))
plot!(n,t1[end]*(n/n[end]).^3,l=:dash,label=L"O(n^3)")
scatter!(n,t2,label="Tridiagonal",legend=:topleft,
    xaxis=(:log10,L"n"), yaxis=(:log10,"elapsed time"))
plot!(n,t2[end]*(n/n[end]),l=:dash,label=L"O(n)")
  • One looks like an outlier but it is NOT: throwaway to force compilation is important for comparing computational times
  • Specialized commands like TriDiagonal really help in reducing computational time.
  • It is not enough to know that there is sparsity. You have to modify the algorithm to exploit the sparsity! (related exercise in 2.9.4)