Go to: Introduction, Notation, Index
In all the expressions below, x is a vector of real or complex random
variables with whose mean vector and covariance matrix are given by:
E(x) = m and
Cov(x)=E((x-m)(x-m)H) = S.
Vectors and matrices a, A, b, B, c, C, d and D are constant
(i.e. not dependent on x).
- The covariance matrix S is Hermitian and positive semi-definite.
- S is strictly positive
definite unless there is a deterministic relation between the elements of
x of the form aHx = 0 for some non-zero
a.
- If the elements of x are uniformly spaced samples from a continuous
signal, then S is Toeplitz.
- The symmetric correlation coefficient matrix (also called
correlation matrix) is Corr(x) =
DIAG(S)-½ S
DIAG(S)-½.
WARNING: Correlation matrix is also used for the matrix
E(xxT) = S + mmT.
- The Correlation Coefficient between xi and
xj equals Corr(x)i,j =
E(xi
xj)/sqrt(E(xi2)E(xi2))
and has magnitude <= 1. [5.11]
- The precision matrix is T = S-1.
Special Distributions
The expressions for cubic and quartic expectations given below are
restricted to the following special distributions:
Independent
- [x:Real Independent] means that the components of x are real and
independent. In particular, we require that
E(xipxkq)=E(xip)E(xkq).
We define mr=E((x-m)r) where
the r’th power of the vector is elementwise. Note that
S=DIAG(m2) and
m1=0.
- [x:Real Gaussian] means that the
components of x[n] are Real and have a multivariate
Gaussian pdf: x ~ N(x ; m, S) =
|2×pi×S|-½exp(
-½(x-m)T S-1 (x-m) )
where S is symmetric and +ve semidefinite.
- ln(p(x)) = -½ ln(det(2pi×S)) -
½(x-m)T S-1 (x-m)
- If x is both Gaussian and Independent then
mr = diag(
(½S)½r
r! / (½r)!) for even r and 0 for odd
r.
- N(x ; m, S) = N(m ; x, S) =
|A| N(Ax+b ; Am+b, ASAT) for any
b and non-singular A.
- N(x ; m, S) = an
N(ax+b ; am+b, a2S)
for any b and non-zero a.
- N(x ; m, S) = N(-m ; -x, S) =
N(m ; x, S) = N(x+a ; m+a, S) for any
a.
- N(Ax; u, R) = N(x; m, S) ×
N(0; u, R) / N(0; m, S) where
S =
(ATR-1A)-1 and
m = SATR-1u for any
A (not necessarily square) with full column rank.
- If x ~ N(x; m, S) then
- y = Ax+b ~ N(y; Am+b,
ASAT) [5.10]
- y = ax+b ~ N(y;
am+b, a2S)
- y = F-T(x-m) ~ N(y;
0, I) where FTF= S. It is
not necessary for F to be symmetric but it can always be chosen to be
[see Hermitian].
- If SQ = QD with Q an orthonormal set of eigenvectors
and D diagonal the corresponding positive eigenvalues, then we can
define F= D½QT giving
F-T= QD-½.
- x | ATx=b ~ N(x;
(I-HAT)m+Hb,
(I-HAT)S) where
H=SA(ATSA)+ where
()+ denotes the pseudoinverse or, if non-singular, the
inverse. The symmetry of the covariance may be shown explicitly by writing
(I-HAT)S) = S -
SA(ATSA)+ATS.
[5.9]
- x | aTx=b ~ N(x,
m+(b -
aTm)(aTSa)-1×Sa,
S -
(aTSa)-1×SaaTS)
- Joint Distribution
If [x; y] ~ N([x; y]; [p; q],
[P RT; R Q]) then in the sections below,
we define the regression coefficient matrix of x on y as
F=RTQ-1 .
- Linear Sum
- z = Ax+By+c ~ N(z;
Ap+Bq+c, APAT +
BRAT +
ARTBT +
BQBT)
- Conditional Distruibutions
- x | y ~ N(x;
p+F(y-q), P - FR). [5.8]
- The mean, p+F(y-q), is the regression
function of x on y.
- The covariance, C = P - FR is is the Schur complement of Q in [P
RT; R Q].
- y | x ~ N(y;
q+RP-1(x-p), Q -
RP-1RT). The covariance is the
Schur complement of P in [P
RT; R Q].
- Independence: The following are equivalent:
- x and y are independent
- R = 0
- W = 0 in the precision matrix: T = S-1 = [P
RT; R Q]-1 = [U
WT; W V]
- N([x; y]; [p; q], [P
RT; R Q]) = N(x; p, P)
× N(y; q, Q)
- Multiple Correlation
Coefficients
The vector of multiple correlation coefficients between x and
y has the same dimension as x and is given by
gx|y = sqrt(diag(FR ÷
P) ) where the sqrt() function and ÷
are elementwise.
- The minimum (over A) of tr(Cov(x -
ATy)) is obtained when A =
FT and is equal to tr(P - FR).
- The maximum (over a) of the correlation between xi
and aTy is obtained when a =
(FT)i =
Q-1ri and is equal to
(gx|y)i. [5.13]
- All elements of gx|y lie in the range 0 to 1.
- diag(Cov(x|y) ÷ Cov(x)) =
diag((P - FR) ÷ P) = (1-
gx|y • gx|y) where •
and ÷ denote elementwise multiplication and division.
- Var(xi|y) =
(1-(gx|y)i2)
Var(xi) showing that conditioning reduces
variance.
- (gx|y)i2 = 1 -
pii-1 det([pii ,
riT; ri ,
Q]) det(Q)-1 [5.14]
- Precision Matrix:
The precision matrix T = S-1 = [P
RT; R Q]-1 = [U
WT; W V].
- If we write the and define, then [3.5]
- U = Cov(x | y)-1 = (P -
FR)-1
- W = -FTU
- V = Q-1+FTUF
- The elements of T are given by
- tii = Var(xi |
x\i) where x\i denotes the
vector x with xi deleted.
- tij = -Corr(xi, xj |
x\i,j)×(tii
tjj)½
- tij = 0 iff xi and xj
are conditionally independent given x\i,j.
- If I, J and K are a partitioning of the indices
1:n, then
- The submatrix TI,J = 0 iff
xI and xJ are conditionally
independent: p(xI, xJ
| xK) = p(xI |
xK) × p(xJ |
xK)
- Product of Gaussians:
N(x ; a, A) N(x ; b, B) = N(a
; b , A+B) × N(x ; c, C)
[5.1] where
- C = (A-1+B-1)-1 =
A(A+B)-1B =
B(A+B)-1A
- c =
C(A-1a+B-1b) =
A(A+B)-1b +
B(A+B)-1a
- Power of Gaussian:
- N(x ; a, A)m = N(0 ;
0,
m(2×pi)m-2Am-1) ×
N(x ; a, m-1A) [5.2]
- N(x ; a, A)2 = N(0 ; 0,
2A) × N(x ; a, ½A)
- Quotient of Gaussians:
N(x ; c, C) / N(x ; a, A) =
N(x ; b, B) / N(a ; b , A+B)
provided (A-C) is non-singular, where
- B = (C-1-A-1)-1
= A(A-C)-1C =
C(A-C)-1A
- b =
B(C-1c-A-1a) =
A(A-C)-1c -
C(A-C)-1a
- Characteristic Function: The characteristic function of x is
a function of the real vector t and is phi(t) =
E(exp(jtTx)) =
E(cos(tTx)) + j
E(sin(tTx)) =
exp(jtTm -
½tTSt) where j=sqrt(-1).
- Differential
Entropy
The differential entropy of x is h(x) =
-E{ln(p(x)} = ½ ln(det(2 pi e S)) nats =
½ log2(det(2 pi e S)) bits
- Cramer-Rao bound
Suppose
m[n#1] and S[n#n] are
functions of a parameter vector q[p#1] and that we
take k independent samples of x to form the columns of a data
matrix X[n#k]. In the expressions below,
¤ denotes the kroneker product, :
denotes vectorization and
dS/dq is a matrix of dimension
n2#p (see derivatives).
- ln(p(X)) = -½k ln(det(2pi×S)) -
½ tr((X-M)T S-1
(X-M)) where M[n#k] =
m×1[k#1]T.
- The Fisher Score vector, v, is defined by
vT = d/dq
(ln(p(X))
=
1[k#1]T(X-M)TS-1
dm/dq - ½(k S-1 -
S-1(X-M)(X-M)TS-1):T
dS/dq
=
1[k#1]T(X-M)TS-1
dm/dq - ½(k
S-1:T -
((X-M)(X-M)T):T
(S-1 ¤ S-1))
dS/dq [5.15]
- E(v) = 0
- [k=1] vT =
d/dq (ln(p(X)) =
(x-m)TS-1
dm/dq - ½(S-1 -
S-1(x-m)(x-m)TS-1):T
dS/dq
- The Fisher Information Matrix is defined by J =
E(vvT) = k
((dm/dq)T S-1
dm/dq +
½(dS/dq)T (S
¤ S)-1 dS/dq )
[5.16]
- The i,j element of J is given by Jij
= k ((dm/dqi)T
S-1 dm/dqj + ½
tr(RiS-1RjS-1))
where Ri satisfies Ri: =
dS/dqi
- Cramer-Rao bound: If f[r#1](X) is a
function of X with mean value g(q), then Cov(f)
>= dg/dq J-1
(dg/dq)T where >= represents
the Loewner partial order.
[5.17]
- If g(q) = aq then Cov(f) >=
a2 J-1
Definition: In this section, <=> represents the Complex-to-Real isomporphism and <->
represents the related vector mapping.
[x[n]:Complex Gaussian]
means that if x[n] <=>
y[2n] , then y ~ N(y ; a,
½K) for some complex
m[n] <-> a[2n] and +ve
definite hermitian S[n#n] <=>
K[2n#2n]. In other words, the real and
imaginary components of x are jointly gaussian with a symmetric
covariance matrix that lies in the range of the complex-to-real isomorphism.
- E(x) = m[n] <->
a[2n]
- Cov(x) =
E((x-m)(x-m)H) =
E(xxH) - mmH =
S[n#n] <=>
K[2n#2n] [5.3]
- N(y ; a, ½K) =
|pi×K|-½exp( -(y-a)T
K-1 (y-a) ) = |pi×S|-1exp(
-(x-m)H S-1 (x-m) )
- If S is diagonal (and hence also real) then N(x ; m,
S) = N(y ; a, ½K) =
N(|x-m| ; 0, ½S) ×
|pi×S|-½. Thus we can express a complex pdf as a
truncated real pdf of the same dimension if the components of x are
independent.
- K[2n#2n] may be divided into 2#2
toeplitz blocks of the form [a -b; b a]
(see Givens Rotation)
- All the 2#2 blocks of K that lie on the main diagonal are positive
multiples of I. That is, for each component of x, the real and
imaginary parts have the same variance and are uncorrelated.
Linear Expectations
- E(Ax + b) = Am + b
- E(Ax) = Am
- E(x + b) = m + b
- Cov(Ax + b) = ASAT
- E(tr(Y)) = tr(E(Y)) where Y depends on x.
Quadratic Expectations
- E((Ax + a)(Bx + b)H) =
ASBH + (Am+a)(Bm+b)H
- E(xxH) = S + mmH
- E(xaH x) = (S +
mmH)a
- E(xH axH) =
aH(S + mmH)
- E((Ax)(Ax)H) = A(S +
mmH)AH
- E((x + a)(x + a)H) = S +
(m+a)(m+a)H
- E((Ax+a)H (Bx+b)) =
tr(ASBH) + (Am+a)H
(Bm+b)
- E(xH x) = tr(S) +
mH m
- E(xHAx) = tr(AS) +
mHAm
- E((Ax)H (Ax)) =
tr(ASAH) + (Am)H
(Am)
- E((x+a)H (x+a)) = tr(S) +
(m+a)H (m+a)
- E((Ax + a) ¤ (Bx + b)) = (A ¤ B)
S: + (Am + a) ¤ (Bm + b)
For [x:Real Gaussian] :
- E(x • x) = diag(S) + m •
m = diag(S + m •
mT) [5.6]
- Cov(x • x) = 2 S • (S +
2mmT) [5.7]
For [x:Complex Gaussian] :
- E(xxT) = mmT
[5.3]
- E(x • xC) = diag(S) +
m • mC [5.4]
- Cov(x • xC) = E((x •
xC)(xT •
xH)) - E(x •
xC)E(x •
xC)T = S •
SC + 2(mmH •
ST)R [5.5]
Cubic Expectations
For [x:Real Independent] :
- E((Ax + a)(Bx + b)T (Cx + c)) = A
DIAG(BT C) m3 +
tr(BSCT)×(Am+a) +
ASCT (Bm+b) + (ASBT
+(Am+a)(Bm+b)T)(Cm+c)
- E(xxT x) = m3 +
2Sm + (tr(S)+ mT m)×m
- E((Ax + a)(Ax + a)T(Ax + a)) = A
DIAG(AT A) m3 +
(2ASAT +
(Am+a)(Am+a)T)(Am+a) +
tr(ASAT)×(Am+a)
- E((Ax + a)bT(Cx + c)(Dx
+ d)T ) =
(Am+a)bT(CSDT+(Cm+c)
(Dm+d)T) +
(ASCT+(Am+a)(Cm+c)T) b
(Dm+d)T + bT(Cm+c)*
(ASDT - (Am+a)(Dm+d)T)
-
E(xbTxxT)
= mbT(S+mmT) +
(S+mmT) bmT +
bTm* (S - mmT)
For [x:Real Gaussian] :
- E((Ax + a)(Bx + b)T(Cx + c)) =
ASBT(Cm+c) + ASCT(Bm+b)
+ tr(BSCT)×(Am+a) +
(Am+a)(Bm+b)T(Cm+c)
- E(xxTx) = 2Sm + (tr(S)+
mTm)×m
- E((Ax + a)(Ax + a)T(Ax + a)) =
(2ASAT +
(Am+a)(Am+a)T)(Am+a) +
tr(ASAT)×(Am+a)
Quartic Expectations
For [x:Independent] :
For [x:Real Gaussian] :
- E((Ax + a)(Bx + b)T(Cx + c) (Dx +
d)T) =
(ASBT+(Am+a)(Bm+b)T)(CSDT+(Cm+c)
(Dm+d)T) +
(ASCT+(Am+a)(Cm+c)T)(BSDT+(Bm+b)
(Dm+d)T) +
(Bm+b)T(Cm+c)×(ASDT
- (Am+a)(Dm+d)T) +
tr(BSCT)*(ASDT +
(Am+a)(Dm+d)T)
- E(xxTxxT) =
2(S+mmT)^2 +
mTm×(S - mmT)
+ tr(S)×(S + mmT)
- E(xxTAxxT) =
E((xTAx) * xxT)
=(S+mmT)(A+AT)(S+mmT)
+ mTAm * (S - mmT) +
tr(AS)×(S + mmT)
- E(xxTAxxT) = [m=0] SAS + SATS +
tr(AS)×S
- E((Ax + a)(Ax + a)T(Ax + a) (Ax +
a)T) =
2(ASAT+(Am+a)(Am+a)T)2
+
(Am+a)T(Am+a)×(ASAT
- (Am+a)(Am+a)T) +
tr(ASAT)×(ASAT
+ (Am+a)(Am+a)T)
- E((Ax + a)T(Bx + b) (Cx +
c)T(Dx + d)) =
tr(AS(CTD+DTC)SBT)
+ ((Am+a)TB +
(Bm+b)TA)S(CT(Dm+d) +
DT(Cm+c)) +
(tr(ASBT)+(Am+a)T(Bm+b))(tr(CSDT)+(Cm+c)T(Dm+d))
- E(xTxxTx) =
2tr(S2) + 4mTSm +
(tr(S) + mTm)2
- E(xTAxxTBx) =
tr(AS(B+BT)S) +
mT(A + AT)S(B +
BT)m +
(tr(AS)+mTAm)(tr(BS)+mTBm)
- E(xTAxxTBx) =
[m=0]
tr(AS(B+BT)S) +
tr(AS)×tr(BS)
-
E(aTxbTxcTxdTx)
=
(aT(S+mmT)b)(cT(S+mmT)d)+(aT(S+mmT)c)(bT(S+mmT)d)+(aT(S+mmT)d)(bT(S+mmT)c)-2aTmbTmcTmdTm
- E((Ax + a)T(Ax + a) (Ax +
a)T(Ax + a)) =
2tr(ASATASAT) +
4(Am+a)TASAT(Am+a) +
(tr(ASAT) +
(Am+a)T(Am+a))2
High Powers
For [x:Real Gaussian] :
- [n: odd]
E(prod(x[n]-m)) = 0. [5.18]
- [n:
even] E(prod(x[n]-m)) =
(½n)!-12-½n
sumv(sv(1),v(2)sv(3),v(4)...sv(n-1),v(n))
where the sum is over all n! permutations v of the numbers
1:n. [5.18]
Note that each term in the summation arises
(½n)!2½n times since the
½n factors sij can be rearranged in
(½n)! orders and for each factor sij =
sji since S is symmetric. Thus an equivalent formula
is to omit the normalizing factor,
(½n)!-12-½n, and restrict the
summation to all distinct pairings of the numbers 1:n. This is Wick's
theorem.
This page is part of The Matrix Reference
Manual. Copyright © 1998-2005 Mike Brookes, Imperial
College, London, UK. See the file gfl.html for copying
instructions. Please send any comments or suggestions to "mike.brookes" at
"imperial.ac.uk".
Updated: $Id: expect.html,v 1.41 2007/01/05 15:23:22 dmb Exp $