Matrix Reference Manual
Proofs Section 4: Matrix Identities and Equations


Go to: Introduction, Notation, Index



4.1 diag(abT) = a • b
  • If X = abT, then xi,j = aibj and the diagonal elements of X, xi,i = aibi. This is precisely the ith element of a • b.
4.2

(A-1 + UVH)-1 = A - (AU(I + VHAU)-1)VHA

We show that A-1 + UVH and A - (AU(I + VHAU)-1)VHA are inverses by multiplying them together:

  • (A-1 + UVH)( A - (AU(I + VHAU)-1)VHA)
  • = I+UVHA-(U+UVHAU)(I + VHAU)-1VHA
  • = I+UVHA-U(I+VHAU)(I + VHAU)-1VHA
  • = I+UVHA-UVHA = I
4.3 det(A-1 + UVH) = det(I + VHAU) × det(A-1)

From  [3.1] we have that det([A-1, -U; VH, I]) = det(A-1) × det(I+VHAU) = det(I)*det(A-1+UI-1VH)  which simplifies to det([A-1, -U; VH, I]) = det(A-1) × det(I+VHAU) = det(A-1+UVH)

4.4 If D[n#n]= DIAG(d) is real with di>=di+1 for all i, and if Y[n#k] is constrained to satisfy YHY=I then (a) maxY tr(YHDY) = sum(d1:k) and is attained by Y=[I; 0] and (b)  minY tr(YHDY) = sum(dn-k+1:n) and is attained by Y=[0; I].
  • Define vi = sumj(|yij|2). We must have 0<=vi<=1 since Y consists of k columns from an n#n unitary matrix. Also sum(v)=k since each of the k columns of Y has unit length.
  • tr(YHDY) = vTd which is a weighted sum of the di with the constraints on v given above. (a) To maximize this sum, we set v1:k=1 and vk+1:n=0 which gives tr(YHDY) = sum(d1:k). (b) To minimize this sum, we set v1:n-k=0 and vn-k+1:n=1 which gives tr(YHDY) = sum(dn-k+1:n).
4.5 If D[n#n]= DIAG(d) with di>=di+1 for all i, then maxy (yHDy | yHy=1) = d1 and miny (yHDy | yHy=1) = dn and these bounds are attained by y=e1 and y=en respectively.

This is a special case of [4.4] with k=1, but may also be proved directly.

  • yHDy = sum(d • yC • y) <= max(d) × sum(yC • y) = d1 × yHy = d1
  • yHDy = sum(d • yC • y) >= min(d) × sum(yC • y) = dn × yHy = dn
4.6 If D[n#n]= DIAG(d) with di>=di+1 for all i, then minW:[n#k] maxy (yHDy | yHy=1 and WHy=0) = dn-k and this bound is attained by W=[0; I[k#k]] and y=en-k.

First define V =  [0; I[k#k]] and note that VHy=0 implies that y=PPHy where P=(I[n-k#n-k]; 0) since PPH+VVH=I

For any fixed W[k#k],

maxy (yHDy | yHy=1 and WHy=0)
 >= maxy (yHDy | yHy=1 and WHy=0 and VHy=0)
=maxy (yHPPHDPPHy | yHy=1 and WHy=0 and VHy=0)
>= min(d1:n-k) from [4.5] since PHDP=diag(d1:n-k)
=dn-k since the entries of d are in descending order.

From [4.5] this bound is attained when PHy=en-k which is in turn when y=en-k.

Since this is true for any W[k#k], we must have minW:[n#k] maxy (yHDy | yHy=1 and WHy=0) >= dn-k
But if we choose W=V, then we attain this bound with y=en-k. where ei is the i'th column of the identity matrix.

4.7 If H[n#n]=UDUH is hermitian, U is unitary and D=diag(d)=diag(eig(H)), then minW maxx (xHHx | xHx=1 and W[n#k]Hx=0) =  dn-k and this bound is attained by W=U:,n-k+1:n and y=un-k 
  • Set y=UHx, then xHHx = yHDy and xHx = yHy
  • minW maxx (xHHx | xHx=1 and W[n#k]Hx=0) = minW maxy (yHDy | yHy=1 and W[n#k]HUy=0)
    From [4.6] we attain the bound with UHW= [0; I[k#k]] and y=en-k.
    Hence W = U[0; I[k#k]] = U:,n-k+1:n and x = Uen-k = un-k.
4.8 If H[n#n]=UDUH is hermitian, U is unitary and D=diag(d)=diag(eig(H)) contains the eigenvalues in decreasing order, then maxx (xHHx | xHx=1) = d1 and  minx (xHHx | xHx=1) = dn and these bounds are attained by x=u1 and y=un respectively.

From [4.5] and setting y=UHxd1= maxy (yHDy | yHy=1) = maxx ((UHx)HD(UHx) | (UHx)H(UHx)=1) = maxx (xHHx | xHx=1)
Similarly, dn= miny (yHDy | yHy=1) = minx ((UHx)HD(UHx) | (UHx)H(UHx)=1) = minx (xHHx | xHx=1)

4.9 If D[n#n]= DIAG(d) is real with di>=di+1>0 for all i, and if Y[n#k] is constrained to have rank k, then (a) max(det(YHDY)/det(YHY)) = prod(d1:k) and is attained by Y=[I; 0] and (b)  min(det(YHDY)/det(YHY)) = prod(dn-k+1:n) and is attained by Y=[0; I].
  • At an extreme value, the complex gradient must be zero. Hence 0 = d(ln(det(YHDY)/det(YHY)))/dYC = d(ln(det(YHDY)))/dYC - d(ln(det(YHY)))/dYC = (DY(YHDY)-1 - Y(YHY)-1): [2.13]
  • Hence DY(YHDY)-1 = Y(YHY)-1 and so DY = Y(YHY)-1YHDY
  • Taking the Hermitian transpose, we get YHDY(YHY)-1 YH = YH D
  • Since rank(Y)=k, YH must contain k linearly independent columns and so the columns of YH contain a complete set of eigenvectors for the k#k matrix YHDY(YHY)-1 with eigenvalues given by the corresponding entries in d.
  •  det(YHDY)/det(YHY) = det(YHDY(YHY)-1) equals the product of the eigenvalues and is therefore the product of k elements of d. Since the elements of d and in decreasing order, the maximum possible product is prod(d1:k) and the minimum is prod(dn-k+1:n). These values are attained by Y=[I; 0] and Y=[0; I] respectively.

 

4.10 If A is hermitian and B is +ve definite hermitian there exists X such that XHBX=I and XHAX=DIAG(d) where d contains the eigenvalues of B-1A in non-increasing order and the columns of X the corresponding eigenvectors. If S is the +ve definite hermitian square root of B-1 (i.e. S2B=I) then d contains the eigenvalues of the hermitian matrix SAS.
  • Since B is +ve definite hermitian, we can decompose it as B = ULUH where U is unitary and L is diagonal with real diagonal entries >0.
  • Define the real diagonal matrix M = L.
  • The matrix MHUHAUM is hermitian and can therefore be decomposed as MHUHAUM = VDVH where V is unitary and D=DIAG(d) is real. We may order the columns of V so that the elements of d are in non-increasing order.
  • Define X = UMV. Then XHBX = VHMHUHBUMV = VHMHLMV = VHV = I and XHAX = VHMHUHAUMV = VHVDVHV = D
  • B-1A = (ULUH)-1(UM-1VDVHM-1UH) = (UM2UH) (UM-1VDVHM-1UH) = UMVDVHM-1UH = (UMV) D (UMV)-1XDX-1
  • The hermitian +ve definite square root of B-1 is S = UMUH. Therefore SAS = (UMUH) (UM-1VDVHM-1UH) (UMUH) = UVDVHUH = (UV) D (UV)-1
  • If A and B are real, then all the above matrices can be taken as real.
4.11 If W is +ve definite Hermitian and B is Hermitian, then if X[n#k] is restricted to have rank(X)=k,  maxX tr((XHWX)-1 XHBX) = sum(d1:k) where d are the eigenvalues of W-1B sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors.
  • From [4.10] we can find G[n#n] such that GHWG = I and GHBG = D = DIAG(d) where d and G  are the eigenvalues and corresponding eigenvectors of W-1B with the elements of d in non-increasing order. Since G is non-singular, the range of X=GY[n#k] over rank(Y)=k includes all n#k matrices with rank k.
  • Hence, maxX tr((XHWX)-1 XHBX) = maxY tr(((GY)HW(GY))-1 (GY)HB(GY)) = maxY tr((YHGHWGY)-1 YHGHBGY)  = maxY tr((YHY)-1 YHDY).
  • From [1.9] any Y[n#k] with rank k may be decomposed as Y=Q[n#k]R[k#k] with QHQ=I and R non-singular.
  • Hence, maxY tr((YHY)-1 YHDY) = maxQ,R tr((RHQHQR)-1 RHQHDQR) = maxQ,R tr(R-1R-H RHQHDQR) = maxQ,R tr(R-1QHDQR) = maxQ,R tr(QHDQRR-1) = maxQ,R tr(QHDQ) = maxQ tr(QHDQ)
  • From [4.4], the maximum is sum(d1:k) and is attained by Y=Q=[I; 0]. From this we find that X=GY consists of the first k columns of G, that is, the eigenvectors corresponding to the k highest eigenvalues.
4.12 If W is +ve definite Hermitian and B is Hermitian, then if X[n#k] is restricted to have rank(X)=k,  maxX det((XHWX)-1 XHBX) = prod(d1:k) where d are the eigenvalues of W-1B sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors.
  • From [4.10] we can find G[n#n] such that GHWG = I and GHBG = D = DIAG(d) where d and G  are the eigenvalues and corresponding eigenvectors of W-1B with the elements of d in non-increasing order. Since G is non-singular, the range of X=GY[n#k] over rank(Y)=k includes all n#k matrices with rank k.
  • Hence, maxX det((XHWX)-1 XHBX) = maxY det(((GY)HW(GY))-1 (GY)HB(GY)) = maxY det((YHGHWGY)-1 YHGHBGY)  = maxY det((YHY)-1 YHDY).
  • From [4.9], the maximum is prod(d1:k) and is attained by Y=[I; 0]. From this we find that X=GY consists of the first k columns of G, that is, the eigenvectors corresponding to the k highest eigenvalues.
4.13 If W is +ve definite Hermitian and B is Hermitian and A[n#m] is a given matrix, then if X[n#k] is restricted such that rank([A X])=m+k,  maxX tr(([A X]HW[A X])-1 [A X]HB[A X]) = tr((AHWA)-1AHBA) + sum(d1:k) where d are the eigenvalues of (W-1-A(AHWA)-1AH)B sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors.
  • From [4.10] we can find G[n#n] such that GHWG = I and GHBG = D = DIAG(d) where d and G  are the eigenvalues and corresponding eigenvectors of W-1B with the elements of d in non-increasing order.
  • We can do the QR decomposition G-1A = U[n#m]R[m#m] = [U[n#m] V[n#n-m]][R; 0] where [U V]H[U V] = I
    Note that (AHWA)-1 AHBA = (RHUHGHWGUR)-1 RHUHGHBGUR= R-1UHDUR
  • Now for any X[n#k] satisfying rank([A X])=m+k,
    • We form the QR decomposition VHG-1X = K[n-m#k]S[k#k] and we define the upper triangular matrix  T[m+k#m+k] = [R  UHG-1X; 0 S] giving [A X] = G[U  VK]TT must be non singular since rank([A X])=m+k.
    • Now we have tr(([A X]HW[A X])-1 [A X]HB[A X])
      =  tr((TH[U VK]HGHWG[U VK]T)-1 TH[U VK]HGHBG[U VK]T) since [A X] = G[U  VK]T
      =  tr(T-1([U VK]H[U VK])-1 T-HTH[U VK]HD[U VK]T) since T is non-singular, GHWG = I
      =  tr([U VK]HD[U VK]) since [U  VK]H[U  VK] = I[m+k#m+k] and from [1.17]
      =  tr(UHDU) + tr(KHVHDVK) [1.18]
      =  tr((AHWA)-1 AHBA) + tr(KHVHDVK)
    • Thus we need to maximize tr(KHVHDVK) subject to KHK = I . From [4.11] (with W=I) the maximum equals the sum of the top k eigenvalues of VHDV and is attained by setting K to the corresponding eigenvectors. From K we can derive X = GVK. Note that S does not affect our objective function and so can be taken to be the identity.
  • We have VHDV K = K L where L is a diagonal matrix containing the top  k eigenvalues of VHDV.
    Therefore X L =  GVK L = GVVHDV K = GVVHGHB GVK  = GVVHGHB X which means that X consists of the eigenvectors of GVVHGHB with eigenvalues diag(L).
  • We can use the fact that A=GUR, W = G-HG-1 and UHU=I to give AHWA = RHUHGHG-HG-1GUR = RHR. Thus A(AHWA)-1AH = GUR R-1R-H RHUHGH = GUUHGH
  • Therefore, since UUH+VVH=I , we get GVVHGHB= G(I-UUH)GHB = (GGH-GUUHGH)B = (W-1-A(AHWA)-1AH)B
4.14 If W=FHF is +ve definite Hermitian, B is Hermitian and A[n#m] is a given matrix and the columns of V are an orthonormal basis for the null space of AHFH, then if X[n#k] is restricted such that rank([A X])=m+k,  maxX tr(([A X]HW[A X])-1 [A X]HB[A X]) = tr((AHWA)-1AHBA) + sum(d1:k) where d are the eigenvalues of VHF-HBF-1V sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors multiplied by F-1V.
  • For any X[n#k] we can do a QR decomposition VHFX = K[n-m#k]S[k#k] and we define the upper triangular matrix T[m+k#m+k] = [R  UHFX; 0 S[k#k]] where FA = U[n#m]R[m#m] is also a QR decomposition. We note that  Tis non singular iff rank([A X])=m+k.
  • Now, tr(([A X]HW[A X])-1 [A X]HB[A X]) =  tr((TH[U VK]HF-HFHFF-1[U VK]T)-1 TH[U VK]HF-HBF-1[U VK]T)  =  tr(T-1([U VK]H[U VK])-1 T-HTH[U VK]HD[U VK]T)  =  tr([U VK]HF-HBF-1[U VK])  =  tr(UHF-HBF-1U) + tr(KHVHF-HBF-1VK)
  • The first term is independent of X, while the maximum value of the second subject to KHK=I is equal to the sum of the k highest eigenvalues of VHF-HBF-1V with the columns of K the corresponding eigenvectors. A suitable X is then given by X = F-1VK.
4.15 If W is +ve definite Hermitian and B is Hermitian and A[n#m] is a given matrix, then  maxX det(([A X]HW[A X])-1 [A X]HB[A X] | rank([A X[n#k]])=m+k) = det((AHWA)-1AHBA)×prod(l1:k) where l are the eigenvalues of  W-1B(I  - A (AHBA)-1AH ) sorted into decreasing order and this maximum may be attained by taking the columns of X to be the corresponding eigenvectors.
  • From [4.10] we can find G[n#n] such that GHWG = I and GHBG = D = DIAG(d) where d and G  are the eigenvalues and corresponding eigenvectors of W-1B with the elements of d in non-increasing order.
  • We can do the QR decomposition G-1A = U[n#m]R[m#m] = [U[n#m] V[n#n-m]][R; 0] where [U V]H[U V] = I
    Note that (AHWA)-1 AHBA = (RHUHGHWGUR)-1 RHUHGHBGUR= R-1UHDUR
  • Now for any X[n#k] satisfying rank([A X])=m+k,
    • We form the QR decomposition VHG-1X = K[n-m#k]S[k#k] and we define the upper triangular matrix  T[m+k#m+k] = [R  UHG-1X; 0 S] giving [A X] = G[U  VK]TT must be non singular since rank([A X])=m+k.
    • Now we have det(([A X]HW[A X])-1 [A X]HB[A X])
      =  det((TH[U VK]HGHWG[U VK]T)-1 TH[U VK]HGHBG[U VK]T) since [A X] = G[U  VK]T
      =  det(T-1([U VK]H[U VK])-1 T-HTH[U VK]HD[U VK]T) since T is non-singular, GHWG = I
      =  det([U VK]HD[U VK]) since [U  VK]H[U  VK] = I[m+k#m+k]
      =  det([UHDU  UHDVK ;  KHVHDU  KHVHDVK )
      =
      det(UHDU)×det(KHVHDVK - KHVHDU (UHDU)-1UHDVK)  [3.1]
      = det(UHDU)×det(KH (VHDV - VHDU (UHDU)-1UHDV) K)
      = det((AHWA)-1 AHBA)×det(KH (VHDV - VHDU (UHDU)-1UHDV) K)
    • Thus we need to maximize det(KH (VHDV - VHDU (UHDU)-1UHDV) K) subject to KHK = I.
      From [4.12] (with W=I) the maximum equals the product of the top k eigenvalues of VHDV - VHDU (UHDU)-1UHDV and is attained by setting K to the corresponding eigenvectors. From K we can derive X = GVK as one possible X. Note that S does not affect our objective function and so can be taken to be the identity.
    • We can manipulate VHDV - VHDU (UHDU)-1UHDV
      =
      VHGHBGV - VHGHBGU (UHGHBGU)-1UHGHBGV
      = V
      HGHBGV - VHGHBGUR(RHUHGHBGUR)-1RHUHGHBGV
      = V
      HGHBGV - VHGHBA (AHBA)-1AHBGV
      = V
      HGH(I - BA (AHBA)-1AH)BGV
    • So if K contains eigenvectors of VHGH(I - BA (AHBA)-1AH)BGV, we have VHGH(I - BA (AHBA)-1AH)BGV K = K L for some diagonal L. Hence
      X L = GVK L = GVVHGH(I - BA (AHBA)-1AH)BGV K
      = G
      (I-UUH)GH(I - BA (AHBA)-1AH) BX
      =
      (GGH  - GUUHGH  - G GHBA (AHBA)-1AH  + GUUHGHBA (AHBA)-1AH )BX
      =
      (GGH  - AR-1R-HAH  - G GHBA (AHBA)-1AH  + AR-1R-HAHBA (AHBA)-1AH )BX
      =
      (GGH  - AR-1R-HAH  - G GHBA (AHBA)-1AH  + AR-1R-HAH )BX
      =
      (GGH  - G GHBA (AHBA)-1AH )BX
      =
      (W-1B  - W-1BA (AHBA)-1AHB )X
      = W
      -1B(I  - A (AHBA)-1AHB )X
    • So X consists of the eigenvectors of W-1B(I  - A (AHBA)-1AH ) corresponding to the k highest eigenvalues [there must be an easier proof than this methinks]
4.16 |xHy|2 = xHyyHx <= xHxyHy for any complex vectors x and y
  • If yHy=0 then the inequality is true since y=0 making both sides of the inequality zero. Hence we assume that yHy>0.
  • 0 <= |xyHy - yyHx|2
    = (yHyxH - xHyyH)(xyHy - yyHx)
    = yHyxHxyHy - xHyyHxyHy - yHyxHyyHx + xHyyHyyHx
    = xHxyHy - xHyyHx - xHyyHx + xHyyHx (after dividing all terms by yHy)
    = xHxyHy - xHyyHx
4.17 XHY(YHY)-1YHX <= XHX where <= represents the Loewner partial order.
  • Let a be an arbitrary vector
  • aHXHY(YHY)-1YHXa = aHXH  × Y(YHY)-1YHXa which is of the form uHv
  • Applying the scalar Cauchy-Schwarz inequality [4.16] gives
    (aHXHY(YHY)-1YHXa)2 <= aHXHXa  × aHXHY(YHY)-1YHY(YHY)-1YHXa
    = aHXHXa  × aHXHY(YHY)-1YHXa
  • If aHXHY(YHY)-1YHXa > 0 we can divide through to obtain
    aHXHY(YHY)-1YHXa <= aHXHXa = | Xa |2
  • If, however aHXHY(YHY)-1YHXa = 0, then the inequality is true in any case since the right side is >= 0.
  • Hence aHXHY(YHY)-1YHXa <= aHXHXa for any vector a.
  • Hence XHY(YHY)-1YHX <= XHX in the sense of the Loewner partial order.

This page is part of The Matrix Reference Manual. Copyright © 1998-2005 Mike Brookes, Imperial College, London, UK. See the file gfl.html for copying instructions. Please send any comments or suggestions to "mike.brookes" at "imperial.ac.uk".
Updated: $Id: proof004.html,v 1.18 2006/12/27 11:00:08 dmb Exp $