FNC Sections 8.5 to 8.8

## Loading required package: JuliaCall

## Julia version 1.9.3 at location /home/apua/julia-1.9.3/bin will be used.

## Loading setup script for JuliaCall...

## Finish loading setup script for JuliaCall.

Learning objectives

Solve \(Ax=b\) using Krylov methods
Dealing with matrices which are not completely known
Designing a preconditioner

What’s new in Julia

Refer to notes for Sections 8.1 to 8.4
Instance called ConvergenceHistory from the IterativeSolvers package
gmres(), minres(), cg() and underlying options
Mostly from Section 8.7: reshape(), clamp01(), vec(), unvec(), LinearMap()
Specific to Section 8.8: ilu(), DiagonalPreconditioner()

Highlights of Sections 8.5 and 8.6

Krylov method performance may improve depending on type of \(\mathbf{A}\):
- \(\mathbf{A}\) is generic: basically just Arnoldi iteration
- \(\mathbf{A}\) is Hermitian: fewer steps for Arnoldi iteration, leading to Lanczos iteration
- \(\mathbf{A}\) is Hermitian and positive definite:
Task is still to solve \(\mathbf{Ax}=\mathbf{b}\)
Instead of working directly with \(\mathbf{K}_m\), we work with \(\mathbf{Q}_m\) from the Arnoldi algorithm.
Set \(\mathbf{x}=\mathbf{Q}_m\mathbf{z}\) and by the fundamental identity behind Arnoldi iteration, \[\arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{A} \mathbf{Q}_m \mathbf{z}-\mathbf{b} \bigr\|=\arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} \mathbf{H}_m\mathbf{z}-\mathbf{b} \bigr\| \]
The starting point of Arnoldi iteration \(\mathbf{q}_1\) uses \(\mathbf{b}\) as a “natural” seed vector.
- It must be that \(\mathbf{q}_1\) should be of unit length and also proportional to \(\mathbf{b}\).
- So \(\mathbf{q}_1\) could be written as \(\mathbf{b}/ \|\mathbf{b}\|\).
- But \(\mathbf{q}_1\) could also be expressed as \(\mathbf{Q}_{m+1}\mathbf{e}_1\).
- Therefore, \(\mathbf{b} = \|\mathbf{b}\| \mathbf{Q}_{m+1}\mathbf{e}_1\)
As a result, \[\begin{eqnarray*}\arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{A} \mathbf{Q}_m \mathbf{z}-\mathbf{b} \bigr\| &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} \mathbf{H}_m\mathbf{z}-\mathbf{b} \bigr\| \\ &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} (\mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\mathbf{e}_1) \bigr\| \end{eqnarray*}\]
To make clear that a Krylov method involves dimension reduction, observe that for any \(\mathbf{w}\in\mathbb{C}^{m+1}\), \[\|\mathbf{Q}_{m+1}\mathbf{w}\|^2 = \mathbf{w}^*\mathbf{Q}_{m+1}^*\mathbf{Q}_{m+1}\mathbf{w} = \mathbf{w}^*\mathbf{w} = \|\mathbf{w}\|^2.\]
Putting everything together, we now have \[\begin{eqnarray*}\arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{A} \mathbf{Q}_m \mathbf{z}-\mathbf{b} \bigr\| &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} \mathbf{H}_m\mathbf{z}-\mathbf{b} \bigr\| \\ &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} (\mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\mathbf{e}_1) \bigr\| \\ &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\,\mathbf{e}_1 \bigr\| \end{eqnarray*}\]
At every iteration, solve for \(\mathbf{z}\) first, then recover \(\mathbf{x}\).
No convergence result for GMRES was established.
- As the dimension of the Krylov subspace grows, the number of new entries to be found in \(\mathbf{H}_m\) and the total number of columns in \(\mathbf{Q}\) also grow.
- Thus both the work and the storage requirements are quadratic in \(m\).
- GMRES is often used with restarting.
When \(\mathbf{A}\) is Hermitian, Arnoldi becomes Lanczos and GMRES becomes MINRES.
- Lanczos iteration requires fewer steps and no need to restart.
When \(\mathbf{A}\) is Hermitian and positive definite (HPD), use CG.
- Theory not very clear, work on Exercise 8.6.5.
Convergence results available
- Dependent on magnitude of eigenvalues
- Criteria to measure convergence is different for MINRES and CG
- For matrices with small condition numbers, possible to work out a rough calculation of number of iterations needed for a target tolerance.

Exercise 8.5.5

Strong effect the eigenvalues of the matrix may have on GMRES convergence
Let \[\mathbf{B}= \begin{bmatrix} 1 & & & \\ & 2 & & \\ & & \ddots & \\ & & & 100 \end{bmatrix}, \] let \(\mathbf{I}\) be a \(100\times 100\) identity, and let \(\mathbf{Z}\) be a \(100\times 100\) matrix of zeros. Also let \(\mathbf{b}\) be a \(200\times 1\) vector of ones.
Let \(\mathbf{A} = \begin{bmatrix} \mathbf{B} & \mathbf{I} \\ \mathbf{Z} & \mathbf{B} \end{bmatrix}.\) What are its eigenvalues (no computer required here)? Apply gmres with tolerance \(10^{-10}\) for 100 iterations without restarts, and plot the residual convergence.
Repeat with restarts every 20 iterations.
Now let \(\mathbf{A} = \begin{bmatrix} \mathbf{B} & \mathbf{I} \\ \mathbf{Z} & -\mathbf{B} \end{bmatrix}.\) What are its eigenvalues? Which matrix is more difficult for GMRES?

# Exercise (a)
A = [diagm(1:100) diagm(fill(1,100)); diagm(fill(0,100)) diagm(1:100)];
x, hist = IterativeSolvers.gmres(A, fill(1,200); reltol=10^(-10), maxiter=100, log=true);
resnorm = hist[:resnorm];
plot(resnorm,m=:o,
    xaxis=(L"m"),yaxis=(:log10,"norm of mth residual"), 
    title="Residual for GMRES",leg=:none)

# Exercise (b)
x, hist = IterativeSolvers.gmres(A, fill(1,200); restart=20, reltol=10^(-10), maxiter=100, log=true);
resnorm = hist[:resnorm];
plot(resnorm,m=:o,
    xaxis=(L"m"),yaxis=(:log10,"norm of mth residual"), 
    title="Residual for GMRES",leg=:none)

# Exercise (c)
A = [diagm(1:100) diagm(fill(1,100)); diagm(fill(0,100)) -diagm(1:100)];
x, hist = IterativeSolvers.gmres(A, fill(1,200); reltol=10^(-10), maxiter=100, log=true);
resnorm = hist[:resnorm];
plot(resnorm,m=:o,
    xaxis=(L"m"),yaxis=(:log10,"norm of mth residual"), 
    title="Residual for GMRES",leg=:none)

Exercise 8.6.5

Point of the exercise is to reformulate the minimization problem for CG
Given real \(n\times n\) symmetric \(\mathbf{A}\) and vector \(\mathbf{b}=\mathbf{A}\mathbf{x}\)
Define the scalar-valued function \[\varphi(\mathbf{u}) = \mathbf{u}^T \mathbf{A} \mathbf{u} - 2 \mathbf{u}^T \mathbf{b}, \qquad \mathbf{u}\in\mathbb{R}^n.\]
Expand and simplify the expression \(\varphi(\mathbf{x}+\mathbf{v})-\varphi(\mathbf{x})\), keeping in mind that \(\mathbf{A}\mathbf{x}=\mathbf{b}\).
- Do some algebra (noting also that \(\varphi(\cdot)\) is scalar) to show that \[\varphi(\mathbf{x}+\mathbf{v})-\varphi(\mathbf{x})=\mathbf{v}^T \mathbf{A}\mathbf{v}\]
Prove that if \(\mathbf{A}\) is an SPD matrix, \(\varphi\) has a global minimum at \(\mathbf{x}\).
- If \(\mathbf{A}\) were SPD and noting that \(\mathbf{v}^T \mathbf{A}\mathbf{v}\) is a quadratic form, then \(\mathbf{v}^T \mathbf{A}\mathbf{v}>0\) for \(\mathbf{v}\neq \mathbf{0}\).
- Therefore, for \(\mathbf{v}\neq \mathbf{0}\), \(\varphi(\mathbf{x}+\mathbf{v})>\varphi(\mathbf{x})\).
- If, in addition, \(\mathbf{v}= \mathbf{0}\), then \(\varphi(\mathbf{x}+\mathbf{v})\geq \varphi(\mathbf{x})\). Thus, \(\varphi\) has a global minimum at \(\mathbf{x}\).
Show that for any vector \(\mathbf{u}\), \(\|\mathbf{u}-\mathbf{x}\|_{\mathbf{A}}^2-\varphi(\mathbf{u})\) is constant.
- Direct calculation to show that the constant is \(\mathbf{x}^T \mathbf{A}\mathbf{x}\).
Prove that CG minimizes \(\varphi(\mathbf{u})\) over Krylov subspaces. \[\begin{eqnarray*}\arg\min_{\mathbf{x}\in\mathcal{K}_m} \|\mathbf{x}_m-\mathbf{x}\|_{\mathbf{A}}&=&\arg\min_{\mathbf{u}\in\mathcal{K}_m} \|\mathbf{x}_m-\mathbf{u}\|_{\mathbf{A}} \\ &=& \arg\min_{\mathbf{u}\in\mathcal{K}_m} \left(\varphi(\mathbf{u})+ \mathbf{x}^T \mathbf{A}\mathbf{x}\right) \\ &=& \arg\min_{\mathbf{u}\in\mathcal{K}_m} \varphi(\mathbf{u})\end{eqnarray*}\]

Exercise 8.6.6

Set A = FNC.poisson(n) - k^2*I and b = -ones(n^2).
Apply both MINRES and CG to the linear system for \(n=50\) and \(k=1.3\), solving to a relative residual tolerance of \(10^{-5}\). Plotting their convergence curves together.
Repeat for \(k=8\).
Explain why the CG convergence curve for the case of \(k=8\) looks strange.

n = 50; k = 1.3;
A = FNC.poisson(n) - k^2*I;
b = -ones(n^2);
x1, hist1 = minres(A,b,reltol=1e-5, log=true);
x2, hist2 = cg(A,b,reltol=1e-5, log=true);
relres1 = hist1[:resnorm] / norm(b);
relres2 = hist2[:resnorm] / norm(b);
plot(relres1,label="MINRES",leg=:left,
    xaxis=L"m",yaxis=(:log10,"relative residual"),
    title=("Convergence of MINRES and CG") )

plot!(relres2,label="CG")


# Exercise (b)
k = 8;
A = FNC.poisson(n) - k^2*I;
x1, hist1 = minres(A,b,reltol=1e-5, log=true);
x2, hist2 = cg(A,b,reltol=1e-5, log=true);
relres1 = hist1[:resnorm] / norm(b);
relres2 = hist2[:resnorm] / norm(b);
plot(relres1,label="MINRES",leg=:left,
    xaxis=L"m",yaxis=(:log10,"relative residual"),
    title=("Convergence of MINRES and CG") )

plot!(relres2,label="CG")


# Exercise (c)
evs, _ = eigs(A, which=:SR);
evl, _ = eigs(A, which=:LR);
(length(evs) > 0, length(evl) > 0)

## (true, true)

Highlights of Sections 8.7 and 8.8

Deal with incomplete knowledge of \(\mathbf{A}\)
- Encode the linear transformation represented by \(\mathbf{A}\)
- Useful when “undoing” the effects of linear transformation without knowing what \(\mathbf{A}\) actually looks like
Deal with ill-conditioned \(\mathbf{A}\)’s
- Most common version is to apply left-preconditioning
- Pre-multiply \(\mathbf{A}\mathbf{x}=\mathbf{b}\) by \(\mathbf{M}^{-1}\).
- The choice of \(\mathbf{M}\) requires some “art”.
Specific examples:
- diagonal preconditioning: \(\mathbf{M}=\mathrm{diag}\left(\mathbf{A}\right)\), use DiagonalPreconditioner()
- incomplete LU factorization of \(\mathbf{M}\) using a threshold \(\tau>0\) (smaller \(\tau\) means more elements are kept): \(\mathbf{M}=\mathbf{LU}\), use the factorization from ilu() directly in the Pl option for IterativeSolvers.gmres

Exercise 8.7.4

Redo Demo 8.7.2 using conjugate gradients instead of MINRES.
Pay attention to clamp01().

img = testimage("lighthouse");
m,n = size(img);
X = @. Float64(Gray(img));
B = spdiagm(0=>fill(0.5,m),
        1=>fill(0.25,m-1),-1=>fill(0.25,m-1));
C = spdiagm(0=>fill(0.5,n),
        1=>fill(0.25,n-1),-1=>fill(0.25,n-1));
blur = X -> B^12 * X * C^12;
Z = blur(X);
unvec = z -> reshape(z,m,n);  # convert vector to matrix

## #99 (generic function with 1 method)

T = LinearMap(x -> vec(blur(unvec(x))),m*n);
# change this part for the exercise
y = cg(T,vec(Z),maxiter=50,reltol=1e-5);
# clamp01 is to cut between 0 and 1
Y = unvec(clamp01.(y)); 
plot(Gray.(X),layout=2,title="Original", frame=:none);
plot!(Gray.(Y),subplot=2,title="Deblurred", frame=:none)

Exercise 8.7.5

Verify for \(m=50\) that the vertical blur matrix \(\mathbf{B}\) has a Cholesky factorization and is SPD.
Find condition number of \(\mathbf{B}\).
Explain why \(\kappa( \mathbf{B}^k ) = \kappa(\mathbf{B})^k\), and what happens when \(k\to\infty\).

# Exercise (a)
m = 50; 
B = spdiagm(0=>fill(0.5,m), 1=>fill(0.25,m-1),-1=>fill(0.25,m-1));
cholesky(B)

## SparseArrays.CHOLMOD.Factor{Float64}
## type:    LLt
## method:  simplicial
## maxnnz:  99
## nnz:     99
## success: true

cond(Matrix(B))

## 1053.4789912000572

Exercise 8.7.6

The cumulative summation function cumsum is to be used as a linear map \(f\).
Define vector \(\mathbf{b}\) by \(b_i = (i/100)^2\) for \(i=1,\ldots,100\). Then use gmres to find \(\mathbf{x}=\mathbf{f}^{-1}(\mathbf{b})\).

b= @. ((1:100) / 100)^2

## 100-element Vector{Float64}:
##  0.0001
##  0.0004
##  0.0009
##  0.0016
##  0.0025000000000000005
##  0.0036
##  0.004900000000000001
##  0.0064
##  0.0081
##  0.010000000000000002
##  ⋮
##  0.8464
##  0.8649000000000001
##  0.8835999999999999
##  0.9025
##  0.9216
##  0.9409
##  0.9603999999999999
##  0.9801
##  1.0

T = LinearMap(x -> cumsum(x), 100);
x, hist = IterativeSolvers.gmres(T, b; reltol=10^(-8), maxiter=100, log=true);
plot(x)

Exercise 8.8.1

Theory as to how an SPD preconditioner works
Suppose \(\mathbf{M}=\mathbf{R}^T\mathbf{R}\). Show that the eigenvalues of \(\mathbf{R}^{-T}\mathbf{A}\mathbf{R}^{-1}\) are the same as the eigenvalues of \(\mathbf{M}^{-1}\mathbf{A}\).
- Let \(\lambda\) be an eigenvalue of \(\mathbf{R}^{-T}\mathbf{A}\mathbf{R}^{-1}\).
- There exists \(\mathbf{x}\) such that \(\mathbf{R}^{-T}\mathbf{A}\mathbf{R}^{-1}\mathbf{x}=\lambda\mathbf{x}\).
- Let \(\mathbf{z}=\mathbf{R}^{-1}\mathbf{x}\). Then, \(\mathbf{R}^{-T}\mathbf{A}\mathbf{z}=\lambda \mathbf{R}\mathbf{z}\).
- Some more algebra leads to \(\mathbf{A}\mathbf{z}=\lambda \mathbf{R}^T\mathbf{R}\mathbf{z}\).
- Since \(\mathbf{M}=\mathbf{R}^T\mathbf{R}\), we can write \(\mathbf{M}^{-1}\mathbf{A}\mathbf{z}=\lambda\mathbf{z}\).

Exercise 8.8.2

Effects of using incomplete LU factorization
Use threshold \(\tau=0.3\) and plot the eigenvalues of \(\mathbf{A}\) and of \(\mathbf{M}^{-1}\mathbf{A}\) in the complex plane on side-by-side subplots.
Is \(\mathbf{M}^{-1}\mathbf{A}\) “more like” an identity matrix than \(\mathbf{A}\) is?
Repeat for \(\tau=0.03\). Is \(\mathbf{M}\) more accurate than in part (a), or less?

# Exercise (a)
A = 1.5I + sprand(800,800,0.005);
iLU = ilu(A,τ=0.3);
L, U = I+iLU.L, iLU.U';
M = L*U;
precA = inv(Matrix(M))*A;
eigA = eigvals(Matrix(A));
eigprecA = eigvals(Matrix(precA));
l = @layout [a  b]; 
p1 = plot(eigA,seriestype=:scatter,legend=:none); 
p2 = plot(eigprecA, seriestype=:scatter,legend=:none);
plot(p1, p2, layout = l)

A = 1.5I + sprand(800,800,0.005);

# Exercise (b) change tau, basically repeat everything
iLU = ilu(A,τ=0.03);
L, U = I+iLU.L, iLU.U';
M = L*U;
precA = inv(Matrix(M))*A;
eigA = eigvals(Matrix(A));
eigprecA = eigvals(Matrix(precA));
l = @layout [a  b]; 
p1 = plot(eigA,seriestype=:scatter,legend=:none); 
p2 = plot(eigprecA, seriestype=:scatter,legend=:none);
plot(p1, p2, layout = l)

Exercise 8.8.3

Continuation of Exercise 8.5.5
Shows that choosing an appropriate diagonal preconditioner is tough
Design a diagonal preconditioner \(\mathbf{M}\), with all diagonal elements equal to \(1\) or \(-1\), such that \(\mathbf{M}^{-1}\mathbf{A}\) has all positive eigenvalues. Apply gmres without restarts using this preconditioner and a tolerance of \(10^{-10}\) for 100 iterations. Plot the convergence curve.
Now design another diagonal preconditioner such that all the eigenvalues of \(\mathbf{M}^{-1}\mathbf{A}\) are \(1\), and apply preconditioned gmres again. How many iterations are apparently needed for convergence?

# Exercise (a)
A = [diagm(1:100) diagm(fill(0,100)); diagm(fill(1,100)) -diagm(1:100)];
M = DiagonalPreconditioner(diagm([fill(1,100);fill(-1,100)]));
# does not converge in 100 iterations
x, hist = IterativeSolvers.gmres(A, fill(1,200); Pl=M, reltol=10^(-10), maxiter=100, log=true);
resnorm = hist[:resnorm];
@show resnorm[end] / resnorm[1];
err = @. resnorm/resnorm[1];
plot(err, yaxis=:log10)

# Exercise (b)
A = [diagm(1:100) diagm(fill(0,100)); diagm(fill(1,100)) -diagm(1:100)];
M = DiagonalPreconditioner(diagm([1:100; -1:-1:-100]));
# changed to 2 iterations
x, hist = IterativeSolvers.gmres(A, fill(1,200); Pl=M, reltol=10^(-10), maxiter=2, log=true);
resnorm = hist[:resnorm];
@show resnorm[end] / resnorm[1]

## resnorm[end] / resnorm[1] = 3.708814968504319e-16

## 3.708814968504319e-16

Exercise 8.8.4

Use A = matrixdepot("Bai/rdb2048")and b a vector of 2048 ones.
Use GMRES for up to 300 iterations without restarts and with a stopping tolerance of \(10^{-4}\).
Time the GMRES solution without preconditioning. Verify that convergence was achieved.
Show that diagonal preconditioning is not helpful for this problem.
To two digits, find a value of \(\tau\) in iLU such that the preconditioned method transitions from effective and faster than without preconditioning to ineffective.

# Exercise (a)
A = matrixdepot("Bai/rdb2048");
b = fill(1, 2048);
# throwaway to force compilation
IterativeSolvers.gmres(A,b,maxiter=300,reltol=1e-4,log=true);
# actual timing
t = @elapsed x,history = IterativeSolvers.gmres(A,b,maxiter=300,reltol=1e-5,log=true);
resnorm = history[:resnorm];
@show resnorm[end] / resnorm[1]

## resnorm[end] / resnorm[1] = 2.092076472043198e-5

## 2.092076472043198e-5

@show t

## t = 0.010560976

## 0.010560976

# Exercise (b)
M = DiagonalPreconditioner(diag(A));
x,history = IterativeSolvers.gmres(A,b,maxiter=300,Pl=M,reltol=1e-5,log=true);
resnorm = history[:resnorm];
@show resnorm[end] / resnorm[1]

## resnorm[end] / resnorm[1] = 0.9601892916173383

## 0.9601892916173383

# Exercise (c)
# throwaway once more
IterativeSolvers.gmres(A,b,Pl=iLU,maxiter=300,reltol=1e-4,log=true);
err=[]; time=[];
for τ in [2,1,0.25,0.11, 0.1, 0.09, 0.08, 0.07]
    iLU = ilu(A;τ);
    t = @elapsed _,history = IterativeSolvers.gmres(A,b,Pl=iLU,maxiter=300,reltol=1e-5,log=true);
    resnorm = history[:resnorm];
    push!(err, resnorm[end] / resnorm[1]);
    push!(time, t);
end
[err time]

## 8×2 Matrix{Float64}:
##  0.00623124   0.0523113
##  1.72954e-5   0.0637916
##  4.96225e-5   0.00142432
##  6.32084e-5   0.00115226
##  1.04842e-5   0.00155987
##  1.04693e-5   0.00126476
##  0.000112019  0.00112618
##  6.77756e-5   0.00112264

FNC Sections 8.5 to 8.8

Andrew Pua

2023-12-06

Learning objectives

What’s new in Julia

Highlights of Sections 8.5 and 8.6

Exercise 8.5.5

Exercise 8.6.5

Exercise 8.6.6

Highlights of Sections 8.7 and 8.8

Exercise 8.7.4

Exercise 8.7.5

Exercise 8.7.6

Exercise 8.8.1

Exercise 8.8.2

Exercise 8.8.3

Exercise 8.8.4