## Loading required package: JuliaCall
## Julia version 1.9.3 at location /home/apua/julia-1.9.3/bin will be used.
## Loading setup script for JuliaCall...
## Finish loading setup script for JuliaCall.
ConvergenceHistory
from the
IterativeSolvers
packagegmres()
, minres()
, cg()
and
underlying optionsreshape()
,
clamp01()
, vec()
, unvec()
,
LinearMap()
ilu()
,
DiagonalPreconditioner()
Krylov method performance may improve depending on type of \(\mathbf{A}\):
Task is still to solve \(\mathbf{Ax}=\mathbf{b}\)
Instead of working directly with \(\mathbf{K}_m\), we work with \(\mathbf{Q}_m\) from the Arnoldi algorithm.
Set \(\mathbf{x}=\mathbf{Q}_m\mathbf{z}\) and by the fundamental identity behind Arnoldi iteration, \[\arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{A} \mathbf{Q}_m \mathbf{z}-\mathbf{b} \bigr\|=\arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} \mathbf{H}_m\mathbf{z}-\mathbf{b} \bigr\| \]
The starting point of Arnoldi iteration \(\mathbf{q}_1\) uses \(\mathbf{b}\) as a “natural” seed vector.
As a result, \[\begin{eqnarray*}\arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{A} \mathbf{Q}_m \mathbf{z}-\mathbf{b} \bigr\| &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} \mathbf{H}_m\mathbf{z}-\mathbf{b} \bigr\| \\ &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} (\mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\mathbf{e}_1) \bigr\| \end{eqnarray*}\]
To make clear that a Krylov method involves dimension reduction, observe that for any \(\mathbf{w}\in\mathbb{C}^{m+1}\), \[\|\mathbf{Q}_{m+1}\mathbf{w}\|^2 = \mathbf{w}^*\mathbf{Q}_{m+1}^*\mathbf{Q}_{m+1}\mathbf{w} = \mathbf{w}^*\mathbf{w} = \|\mathbf{w}\|^2.\]
Putting everything together, we now have \[\begin{eqnarray*}\arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{A} \mathbf{Q}_m \mathbf{z}-\mathbf{b} \bigr\| &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} \mathbf{H}_m\mathbf{z}-\mathbf{b} \bigr\| \\ &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} (\mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\mathbf{e}_1) \bigr\| \\ &=& \arg\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\,\mathbf{e}_1 \bigr\| \end{eqnarray*}\]
At every iteration, solve for \(\mathbf{z}\) first, then recover \(\mathbf{x}\).
No convergence result for GMRES was established.
When \(\mathbf{A}\) is Hermitian, Arnoldi becomes Lanczos and GMRES becomes MINRES.
When \(\mathbf{A}\) is Hermitian and positive definite (HPD), use CG.
Convergence results available
Strong effect the eigenvalues of the matrix may have on GMRES convergence
Let \[\mathbf{B}= \begin{bmatrix} 1 & & & \\ & 2 & & \\ & & \ddots & \\ & & & 100 \end{bmatrix}, \] let \(\mathbf{I}\) be a \(100\times 100\) identity, and let \(\mathbf{Z}\) be a \(100\times 100\) matrix of zeros. Also let \(\mathbf{b}\) be a \(200\times 1\) vector of ones.
Let \(\mathbf{A} = \begin{bmatrix}
\mathbf{B} & \mathbf{I} \\ \mathbf{Z} & \mathbf{B}
\end{bmatrix}.\) What are its eigenvalues (no computer required
here)? Apply gmres
with tolerance \(10^{-10}\) for 100 iterations without
restarts, and plot the residual convergence.
Repeat with restarts every 20 iterations.
Now let \(\mathbf{A} = \begin{bmatrix} \mathbf{B} & \mathbf{I} \\ \mathbf{Z} & -\mathbf{B} \end{bmatrix}.\) What are its eigenvalues? Which matrix is more difficult for GMRES?
# Exercise (a)
A = [diagm(1:100) diagm(fill(1,100)); diagm(fill(0,100)) diagm(1:100)];
x, hist = IterativeSolvers.gmres(A, fill(1,200); reltol=10^(-10), maxiter=100, log=true);
resnorm = hist[:resnorm];
plot(resnorm,m=:o,
xaxis=(L"m"),yaxis=(:log10,"norm of mth residual"),
title="Residual for GMRES",leg=:none)
# Exercise (b)
x, hist = IterativeSolvers.gmres(A, fill(1,200); restart=20, reltol=10^(-10), maxiter=100, log=true);
resnorm = hist[:resnorm];
plot(resnorm,m=:o,
xaxis=(L"m"),yaxis=(:log10,"norm of mth residual"),
title="Residual for GMRES",leg=:none)
# Exercise (c)
A = [diagm(1:100) diagm(fill(1,100)); diagm(fill(0,100)) -diagm(1:100)];
x, hist = IterativeSolvers.gmres(A, fill(1,200); reltol=10^(-10), maxiter=100, log=true);
resnorm = hist[:resnorm];
plot(resnorm,m=:o,
xaxis=(L"m"),yaxis=(:log10,"norm of mth residual"),
title="Residual for GMRES",leg=:none)
Point of the exercise is to reformulate the minimization problem for CG
Given real \(n\times n\) symmetric \(\mathbf{A}\) and vector \(\mathbf{b}=\mathbf{A}\mathbf{x}\)
Define the scalar-valued function \[\varphi(\mathbf{u}) = \mathbf{u}^T \mathbf{A} \mathbf{u} - 2 \mathbf{u}^T \mathbf{b}, \qquad \mathbf{u}\in\mathbb{R}^n.\]
Expand and simplify the expression \(\varphi(\mathbf{x}+\mathbf{v})-\varphi(\mathbf{x})\), keeping in mind that \(\mathbf{A}\mathbf{x}=\mathbf{b}\).
Prove that if \(\mathbf{A}\) is an SPD matrix, \(\varphi\) has a global minimum at \(\mathbf{x}\).
Show that for any vector \(\mathbf{u}\), \(\|\mathbf{u}-\mathbf{x}\|_{\mathbf{A}}^2-\varphi(\mathbf{u})\) is constant.
Prove that CG minimizes \(\varphi(\mathbf{u})\) over Krylov subspaces. \[\begin{eqnarray*}\arg\min_{\mathbf{x}\in\mathcal{K}_m} \|\mathbf{x}_m-\mathbf{x}\|_{\mathbf{A}}&=&\arg\min_{\mathbf{u}\in\mathcal{K}_m} \|\mathbf{x}_m-\mathbf{u}\|_{\mathbf{A}} \\ &=& \arg\min_{\mathbf{u}\in\mathcal{K}_m} \left(\varphi(\mathbf{u})+ \mathbf{x}^T \mathbf{A}\mathbf{x}\right) \\ &=& \arg\min_{\mathbf{u}\in\mathcal{K}_m} \varphi(\mathbf{u})\end{eqnarray*}\]
A = FNC.poisson(n) - k^2*I
and
b = -ones(n^2)
.n = 50; k = 1.3;
A = FNC.poisson(n) - k^2*I;
b = -ones(n^2);
x1, hist1 = minres(A,b,reltol=1e-5, log=true);
x2, hist2 = cg(A,b,reltol=1e-5, log=true);
relres1 = hist1[:resnorm] / norm(b);
relres2 = hist2[:resnorm] / norm(b);
plot(relres1,label="MINRES",leg=:left,
xaxis=L"m",yaxis=(:log10,"relative residual"),
title=("Convergence of MINRES and CG") )
plot!(relres2,label="CG")
# Exercise (b)
k = 8;
A = FNC.poisson(n) - k^2*I;
x1, hist1 = minres(A,b,reltol=1e-5, log=true);
x2, hist2 = cg(A,b,reltol=1e-5, log=true);
relres1 = hist1[:resnorm] / norm(b);
relres2 = hist2[:resnorm] / norm(b);
plot(relres1,label="MINRES",leg=:left,
xaxis=L"m",yaxis=(:log10,"relative residual"),
title=("Convergence of MINRES and CG") )
plot!(relres2,label="CG")
# Exercise (c)
evs, _ = eigs(A, which=:SR);
evl, _ = eigs(A, which=:LR);
(length(evs) > 0, length(evl) > 0)
## (true, true)
Deal with incomplete knowledge of \(\mathbf{A}\)
Deal with ill-conditioned \(\mathbf{A}\)’s
Specific examples:
DiagonalPreconditioner()
ilu()
directly in the Pl
option for IterativeSolvers.gmres
clamp01()
.img = testimage("lighthouse");
m,n = size(img);
X = @. Float64(Gray(img));
B = spdiagm(0=>fill(0.5,m),
1=>fill(0.25,m-1),-1=>fill(0.25,m-1));
C = spdiagm(0=>fill(0.5,n),
1=>fill(0.25,n-1),-1=>fill(0.25,n-1));
blur = X -> B^12 * X * C^12;
Z = blur(X);
unvec = z -> reshape(z,m,n); # convert vector to matrix
## #99 (generic function with 1 method)
T = LinearMap(x -> vec(blur(unvec(x))),m*n);
# change this part for the exercise
y = cg(T,vec(Z),maxiter=50,reltol=1e-5);
# clamp01 is to cut between 0 and 1
Y = unvec(clamp01.(y));
plot(Gray.(X),layout=2,title="Original", frame=:none);
plot!(Gray.(Y),subplot=2,title="Deblurred", frame=:none)
# Exercise (a)
m = 50;
B = spdiagm(0=>fill(0.5,m), 1=>fill(0.25,m-1),-1=>fill(0.25,m-1));
cholesky(B)
## SparseArrays.CHOLMOD.Factor{Float64}
## type: LLt
## method: simplicial
## maxnnz: 99
## nnz: 99
## success: true
cond(Matrix(B))
## 1053.4789912000572
cumsum
is to be used
as a linear map \(f\).gmres
to find \(\mathbf{x}=\mathbf{f}^{-1}(\mathbf{b})\).b= @. ((1:100) / 100)^2
## 100-element Vector{Float64}:
## 0.0001
## 0.0004
## 0.0009
## 0.0016
## 0.0025000000000000005
## 0.0036
## 0.004900000000000001
## 0.0064
## 0.0081
## 0.010000000000000002
## ⋮
## 0.8464
## 0.8649000000000001
## 0.8835999999999999
## 0.9025
## 0.9216
## 0.9409
## 0.9603999999999999
## 0.9801
## 1.0
T = LinearMap(x -> cumsum(x), 100);
x, hist = IterativeSolvers.gmres(T, b; reltol=10^(-8), maxiter=100, log=true);
plot(x)
Theory as to how an SPD preconditioner works
Suppose \(\mathbf{M}=\mathbf{R}^T\mathbf{R}\). Show that the eigenvalues of \(\mathbf{R}^{-T}\mathbf{A}\mathbf{R}^{-1}\) are the same as the eigenvalues of \(\mathbf{M}^{-1}\mathbf{A}\).
# Exercise (a)
A = 1.5I + sprand(800,800,0.005);
iLU = ilu(A,τ=0.3);
L, U = I+iLU.L, iLU.U';
M = L*U;
precA = inv(Matrix(M))*A;
eigA = eigvals(Matrix(A));
eigprecA = eigvals(Matrix(precA));
l = @layout [a b];
p1 = plot(eigA,seriestype=:scatter,legend=:none);
p2 = plot(eigprecA, seriestype=:scatter,legend=:none);
plot(p1, p2, layout = l)
A = 1.5I + sprand(800,800,0.005);
# Exercise (b) change tau, basically repeat everything
iLU = ilu(A,τ=0.03);
L, U = I+iLU.L, iLU.U';
M = L*U;
precA = inv(Matrix(M))*A;
eigA = eigvals(Matrix(A));
eigprecA = eigvals(Matrix(precA));
l = @layout [a b];
p1 = plot(eigA,seriestype=:scatter,legend=:none);
p2 = plot(eigprecA, seriestype=:scatter,legend=:none);
plot(p1, p2, layout = l)
gmres
without restarts using
this preconditioner and a tolerance of \(10^{-10}\) for 100 iterations. Plot the
convergence curve.gmres
again. How many iterations are apparently needed for
convergence?# Exercise (a)
A = [diagm(1:100) diagm(fill(0,100)); diagm(fill(1,100)) -diagm(1:100)];
M = DiagonalPreconditioner(diagm([fill(1,100);fill(-1,100)]));
# does not converge in 100 iterations
x, hist = IterativeSolvers.gmres(A, fill(1,200); Pl=M, reltol=10^(-10), maxiter=100, log=true);
resnorm = hist[:resnorm];
@show resnorm[end] / resnorm[1];
err = @. resnorm/resnorm[1];
plot(err, yaxis=:log10)
# Exercise (b)
A = [diagm(1:100) diagm(fill(0,100)); diagm(fill(1,100)) -diagm(1:100)];
M = DiagonalPreconditioner(diagm([1:100; -1:-1:-100]));
# changed to 2 iterations
x, hist = IterativeSolvers.gmres(A, fill(1,200); Pl=M, reltol=10^(-10), maxiter=2, log=true);
resnorm = hist[:resnorm];
@show resnorm[end] / resnorm[1]
## resnorm[end] / resnorm[1] = 3.708814968504319e-16
## 3.708814968504319e-16
A = matrixdepot("Bai/rdb2048")
and b
a
vector of 2048 ones.iLU
such that the
preconditioned method transitions from effective and faster than without
preconditioning to ineffective.# Exercise (a)
A = matrixdepot("Bai/rdb2048");
b = fill(1, 2048);
# throwaway to force compilation
IterativeSolvers.gmres(A,b,maxiter=300,reltol=1e-4,log=true);
# actual timing
t = @elapsed x,history = IterativeSolvers.gmres(A,b,maxiter=300,reltol=1e-5,log=true);
resnorm = history[:resnorm];
@show resnorm[end] / resnorm[1]
## resnorm[end] / resnorm[1] = 2.092076472043198e-5
## 2.092076472043198e-5
@show t
## t = 0.010560976
## 0.010560976
# Exercise (b)
M = DiagonalPreconditioner(diag(A));
x,history = IterativeSolvers.gmres(A,b,maxiter=300,Pl=M,reltol=1e-5,log=true);
resnorm = history[:resnorm];
@show resnorm[end] / resnorm[1]
## resnorm[end] / resnorm[1] = 0.9601892916173383
## 0.9601892916173383
# Exercise (c)
# throwaway once more
IterativeSolvers.gmres(A,b,Pl=iLU,maxiter=300,reltol=1e-4,log=true);
err=[]; time=[];
for τ in [2,1,0.25,0.11, 0.1, 0.09, 0.08, 0.07]
iLU = ilu(A;τ);
t = @elapsed _,history = IterativeSolvers.gmres(A,b,Pl=iLU,maxiter=300,reltol=1e-5,log=true);
resnorm = history[:resnorm];
push!(err, resnorm[end] / resnorm[1]);
push!(time, t);
end
[err time]
## 8×2 Matrix{Float64}:
## 0.00623124 0.0523113
## 1.72954e-5 0.0637916
## 4.96225e-5 0.00142432
## 6.32084e-5 0.00115226
## 1.04842e-5 0.00155987
## 1.04693e-5 0.00126476
## 0.000112019 0.00112618
## 6.77756e-5 0.00112264