Statistical Signal Processing – Summary

Statistical Signal Processing

I. Parameter Estimation

1 Statistical Modeling and Quality Criteria

  • Statistical Estimation

The goal is to determine the probability distribution of the random variables based on avaliable samples. / The stochastic model is a set of probability spaces and the task of statistical estimation is to select the most appropriate candidate based on the observed outcomes of a random experiment.

  • Statistical Model

In a statistical model, data is assumed to be generated by some underlying probability distribution, and the goal is to estimate the parameters of this distribution.

X: a set of all possible observations

F: a subset of X that satisfies certain properties. Here, F is the collection of events that can be assigned probabilities

dF(;θ): a set of optional probability measures that assign probabilities to events in F, they depend on parameter θ

θ: selects the probability measure dF(;θ). They are unknown and need to be estimated based on the available data.

  • Statistics

Xi:ΩX, a random variable, also called a statistic, maps outcomes in the sample space Ω to values in set X, representing the values of interest that we want to study in our statistical analysis

xiX: actual values that we observe or measure in our study

T:XΘ, a special type of statistic, called an estimator, maps observations xiX to θ^Θ

  • Pθ() refers to the probability of an event according to the probability distribution selected by θ. The same holds for Eθ[] and Varθ[].

1.1 Introductory Example: Estimating upper bound θ

Given N i.i.d. statistics X1,,XN of a uniformly distributed random variable XU(0,θ) where θ is deterministic but unknown. We have that E[X]=θ/2,Var[X]=θ2/12.

Solution 1: Use E[X]=θ/2

T1:X1,,XNθ^1=21Ni=1NXi

Solution 2: Intuition θ^=maxi=1,,N{Xi}

1.2 Consistency and Unbiasedness

Consistent estimator T: limNT(x1,,xN)θ

T1 and T2 are both consistent.

T1:

Using Chebyshev inequality, we derive the law of large numbers,

Pr(|XE[X]|Var[X]ϵ)1ϵ2 (Chebyshev’s inequality)X¯n=1ni=1nXi, E[X¯n]=E[X], Var[X¯n]=Var[X]nPr(|X¯nE[X¯n]|Var[X¯n]ϵ)1ϵ2Pr(|X¯nE[X]|Var[X]nϵ)1ϵ2Pr(|X¯nE[X]|ϵ)Var[X]nϵ2 (Law of large numbers)

Using law of large numbers, we get

Pr(|T12θ2|ϵ)Var(X)Nϵ2N0

T2:

Pr(|T2θ|ϵ)=Pr(maxi=1,,N{Xi}θϵ)=Pr(X1θϵ,,XNθϵ)=iidi=1NPr(Xiθϵ)=i=1Nθϵθ=(θϵθ)NN0

Unbiased estimator T: E[T(X1,,XN)]=θ

  • T1 is unbiased:

E[T1]=E[2Ni=1NXi]=2Ni=1NE[Xi]=θ
  • T2 is asymptorically unbiased:

E[T2]=0θξfT2(ξ;θ)dξ=NN+1θ

Proof. For 0ξθ,

FT2(ξ;θ)=Pr(T2ξ)=(ξθ)NfT2(ξ;θ)=ddξFT2(ξ;θ)=ddξ(ξθ)N=NξN1θN
  • T2 is unbiased:

T2=N+1NT2

截屏2023-06-17 14.43.31

1.3 Variance: Var[T]=E[(TE[T])2]

A further quality measure for an estimator is its variance.

Var[T1]=Var[2Ni=1NXi]=4N2i=1NVar[Xi]=θ23NVar[T2]=θ2N(N+2)Var[T2]=Nθ2(N+1)2(N+2)

1.4 Mean Squared Error(MSE): E[ε2(T)]=E[(Tθ)2]

An extension of the variance is the MSE(mean squared error), where θ is the true parameter.

T1 and T2 are unbiased: E[T1]=E[T2]=θ, therefore,

E[ε2(T1)]=Var[T1]=θ23NE[ε2(T2)]=Var[T2]=θ2N(N+2)

T2 is biased, its MSE is obtained by

E[ε2(T2)]=E[T22]2E[T2]θ+θ2=2θ2(N+2)(N+1)

 

IMG_D15E3BA67391-1

1.5 Bias/Variance Trade-Off

MSE of an estimator T is defined as

E[ε2(T)]=E[(Tθ)2]

and can be decomposed into its bias and variance

E[ε2(T)]=E[(TE[T])2]+(E[T]θ)2=Var[T]+Bias[T]2
  • Choose α to get optimal αT2

Var[αT2]=α2Var[T2]=Nα2θ2(N+1)2(N+2)Bias[αT2]2=(E[αT2]θ)2=(αE[T2]θ)2=(αNN+11)2θ2argαminBias=N+1NargαminMSE=N+1N+1N+2

Therefore, an unbiased estimator is not necessarily the optimal estimator, but for large N, it is.

2 Maximum Likelihood Estimation

2.1 Maximum Likelihood Principle

The maximum likelihood principle suggests to select a candidate probability measure such that the observed outcomes of the experiment become most probable. A maximum likelihood estimator TML picks the θ that maximizes the likelihood function L(x1,,xN|θ).

截屏2023-05-16 22.35.39

The likelihood function depends on the statistical model, assuming all observations are iid, we obtain,

discrete R.V.: L(x1,,xN|θ)=pX1,,XN(x1,,xN|θ)=i=1NpXi(xi|θ)continuous R.V.: L(x1,,xN|θ)=fX1,,XN(x1,,xN|θ)=i=1NfXi(xi|θ)

Normally, we use log-likelihood function,

TML:x1,,xNθ^ML=argθmaxlogi=1NL(xi|θ)=argθmaxi=1NlogL(xi|θ)
  • In the slides, L(x;θ) is used rather than L(x|θ).

2.2 Parameter Estimation

Channel Estimation

Consider an AWGN channel y=hx+N with NN(0,σ2). We know that YN(hs,σ2), and with θ=h, we have

fY(y;θ)=12πσexp((yθs)22σ2)

Given N observations y and training signals x, the maximum-likelihood estimation is

hML=argθmaxi=1Nlog(12πσexp((yiθsi)22σ2))=argθmaxi=1N(yiθsi)22σ2=argθmini=1N(yiθsi)22σ2=argθmin||yθs||2=(sTs)1sTy
  • The ML estimator is obviously identical with the least squares estimator, which changes drastically when the statistics Yi are correlated or when N is non-Gaussian distributed.

Introductory Example: Estimating upper bound θ

Suppose the distribution of observations is uniform, the likelihood function of N observations is

L(x1,,xN;θ)={1θN;  if i, xiθ0;  otherwise 

1/θN>0, therefore, to maximize the likelihood function,

θML=argθmaxL(x1,,xN;θ)=max{x1,,xN}

N Bernoulli Experiments

Given N observations, the likelihood function of success number is

L(x;θ)=(Nx)θx(1θ)NxlogL(x;θ)=log(Nx)+xlogθ+(Nx)log(1θ)

and E[X]=Nθ,Var[X]=Nθ(1θ).

The ML-estimator is obtained by

θ^ML=argθmaxlogL(x;θ)ddθlogL(x;θ)=xθNx1θ=0θ^ML=xN

In the following, we analyze the quality of TML

E[TML]=E[XN]=E[X]N=θ   unbiasedVar[TML]=Var[X]N2=θ(1θ)N

Since the estimator is unbiased, the MSE is equal to the variance of the estimator,

E[ε2(TML)]=Var[TML]=θ(1θ)N

However, biased estimator can have less MSE and thus provide better estimates than unbiased estimator.

Alternative Solution:

T(x)=x+1N+2E[T]=E[X+1N+2]=Nθ+1N+2Bias[T]=|12θN+2|E[ε2(T)]=Var[T]+Bias[T]2=Var[X+1N+2]+(12θN+2)2=Var[X](N+2)2+(12θN+2)2=Nθ(1θ)+(12θ)2(N+2)2

截屏2023-05-09 22.13.24

2.3 Best Unbiased Estimator

ML estimators are not necessarily the best estimators. However, a wide class of estimators is defined by minimizing the MSE under an unbiasedness constraint.

We call an estimator T Best Unbiased Estimator if E[T(X1,,XN)]=θ and

Var[T(X1,,XN)]Var[S(X1,,XN)]

for any alternative unbiased estimator S.

Best unbiased estimators are also referred to as UMVU(Uniformly Minimum Variance Unbiased) estimators.

3. Fisher's Information Inequality

An universal lower bound for the variance of an estimator can be introduced, if the following consition is fulfilled:

L(x;θ)>0, xX,θΘL(x;θ) differentiable with respect to θXθL(x;θ)dx=θXL(x;θ)dx

We define the score function as the slope of logL(x;θ) with respect to θ:

g(x;θ)=θlogL(x;θ)=θL(x;θ)L(x;θ)

3.1 Cramer-Rao Lower Bound

With E[g(X;θtrue)]=E[θlogL(x;θ)]=0, the Fisher Information term is defined as

IF(θtrue)=Var[g(X;θtrue)]=E[(g(X;θtrue)E[g(X;θtrue)])2]=E[g(X;θtrue)2]=appendixEX[2θ2logL(X;θtrue)]

which can be interpreted as the negative mean curvature of the log-likelihood function at θtrue.

The variance of an estimator can be lower bounded by the Cramer-Rao lower bound

Var[T(X)]appendix(E[T(X)]θ)21IF(θ)|θ=θtrue

If T is unbiased, E[T(X)]=θ, then

Var[T(X)]1IF(θtrue)

Properties of the Fisher Information:

  • IF(θ) depends on given observations x1,,xN and the unknown parameter θ

  • A large value of IF(θ) corresponds to a strong curvature and more information in x1,,xN. A small value of IF(θ) corresponds to a weak curvature and little information in x1,,xN.

  • IF(θ) is monotonically increasing with the number of independent observation N statistics.

IF(N)(θ)=NIF(1)(θ)

截屏2023-05-16 23.44.52

3.2 Exponential Models

A exponential model is a statistical model with

fX(x;θ)=h(x)exp(a(θ)t(x))expb(θ)expb(θ)=Xh(x)exp(a(θ)t(x))dx

bθ is for normalization. Then the respective Fisher Information can be directly obtained by

IF(θ)=a(θ)θE[t(X)]θ

If fX(x;θ) can be further arranged such that E[t(X)]=θ, the unbiased estimator can directly be obtained as T=t(X).

Mean Estimation Example

Consider the estimation of the unkown mean value θ=μ of XN(μ,σ2) based on a single observation.

fX(x;θ)=!L(x;θ)=12πσexp12σ2(xθ)2)=exp[θxσ2(θ22σ2+12log(2πσ2))]exp[12σ2x2]=exp[a(θ)t(x)b(θ)]h(x) with t(x)=x, a(θ)=θ/σ2

Since E[X]=μ=θ, the single UMVU estimator is T(x)=x, which minimizes the variance, i.e., the MSE and gets

IF=a(θ)θVar[T]=1IF=(a(θ)θ)1=σ2

3.3 Asymptotically Efficient Estimators

An estimator is asymptotically efficient if (convergence in distribution)

N(T(X1,,XN)θ)NN(0,IF(1)(θ)1)
  • A ML estimator is asymptotically efficient if …

II. Examples

4. ML Principle for Direction of Arrival Estimation

4.1 Signal Model and ML Estimation

We consider the estimation of the Direction Of Arrival(DoA) θ of an impinging planar waveform by means of an attenna array with M arrays and distance interval d. The signal vector at time instant t is

X(t)=ξas(t)+N(t)CM 

Then the signal at the m-th antenna is

Xm(t)=ξexp(j2πmdλsinθ)s(t)+Nm(t), m=0,,M1

where λ is the wavelength of the assumed narrowband signal. The signal is assumed to be corrupted by the Gaussian noise vector N(t)N(0,CN). ξ represents the attenuation which the transmitted signal s(t) experiences over the transmission path.

Furthermore, we assume a Uniform Linear Array(ULA) with M antenna elements, i.e.,

a=[α0α1αM1], with α=exp(j2πdλsinθ)

In the case of a single observation and AWGN N(t)N(0,σN2I), we have

L(x;θ)=1(πσN2)Mexp((xξas)H(xξas)σN2)maxθL(x;θ)=minθ(xξas)H(xξas)=minθxHx2Re{ξxHas}+ξaHas2=!maxθRe{xHa(θ)}

! αα=1aHa=M1.

4.2 Cramer-Rao Bound for DOA Estimation

The likelihood function of the given estimation problem obviously belongs to the family of exponential distributions, where θ is the parameter to be estimated.

L(x;θ)=1(πσN2)Mexp((xξas)H(xξas)σN2)=h(x)exp(cH(θ)t(x))exp(b(θ))h(x)=1(πσN2)Mexp(xHxσN2)c(θ)=1σN2[ξas(ξas)]t(x)=[xx]b(θ)=ξ2s2aHaσN2=Mξ2s2σN2IF(θ)=(c(θ)θ)HE[t(X)]θ=2ξ2s2σN2||aθ||2(dλ)2ξ2s2σN2(cosθ)2M3

III. Estimation of Random Variables

5. Bayesian Estimation

In the previous ML principle, we don't have any statistical information about the parameter θ, and only use the likelihood function to decide θ^.

The Bayesian Estimation Method is based on a specific Statistical Model for the unknown parameter, here, it is θ. So we assume the statistical information about θ, consequently, θ is now considered as a random variable, i.e.,

PDF: fΘ(θ;σ)

where σ is the parameter for the statistical model of R.V. Θ, for example, a specific PDF.

Furthermore, instead of using notation fX(x;θ), we have to use the conditional PDF, because fX can only be obtained after Θ is realized and a certain value of θ is used

conditional PDF: fX|Θ(x|θ)

Now, the MSE of estimator T(x) is calculated with respect to both random variables X and Θ,

E[ε2(T(X),Θ)]=E[(T(X)Θ)2]=ΘX(T(x)θ)2fX|Θ(x|θ)fΘ(θ)dxdθ

5.1 Conditional Mean Estimator/Bayes Estimator – Minimizing the MSE

Conditional mean estimator TCM

TCM:xE[Θ|X=x]=ΘθfΘ|X(θ|x)dθ
  • Theorem

The conditional mean estimator (Bayes estimator) TCM is MSE optimal, i.e., minimizes the MSE cost criterion:

E[ε2(T(X),Θ)]=EΘ[EX[(T(X)Θ)2|Θ]]
  • Alternative cost criterion

Although the mean MSE is the most popular cost criterion, other criteria have been proposed and applied. The mean modulus is defined as

E[|ε(T(X),Θ)|]=ΘX|T(x)θ|fX|Θ(x|θ)fΘ(θ)dxdθ

The conditional median estimator minimizes it,

T(x)=Median[Θ|X=x]

 

5.2 Binomial Experiment

The ML estimator is obtained as

TML:x1,,xNθ^ML=argθmaxlogL(x;θ)=xN

Now we assume fΘ(θ)=1 if θ[0,1], then the MSE estimator is

TCM(x)=E[Θ|X=x]=01θfΘ|X(θ|x)dθ=01θpX|Θ(x|θ)fΘ(θ)pX(x)dθ

Note that

pX|Θ(x|θ)=(Nx)θx(1θ)(Nx), θ[0,1]pX(x)=ΘpX|Θ(x|θ)fΘ(θ)dθ, θ[0,1]=Θ(Nx)θx(1θ)(Nx)fΘ(θ)dθ=Θ(Nx)θx(1θ)(Nx)dθ=1N+1

We get

TCM(x)=1pX(x)01θpX|Θ(x|θ)dθ=1pX(x)x+1N+21N+1=x+1N+2=NN+2TML(x)+1N+2

截屏2023-05-17 22.37.06

5.3 Mean Estimation Example

We consider the estimation of the unknown mean value θ of Gaussian Distributed random variable

XN(θ,σX|Θ=θ2)

assuming ΘN(m,σΘ2), given N observations X1,,XN taken from the joint conditional likelihood of iid observations

Conditional Distribution:X=θ+NμX|θ=θ,σX|θ2=σN2Unconditional Distribution:X=Θ+NμX=E[Θ+N]=E[Θ]=mσX2=Θ,N uncorrVar[Θ]+Var[N]=σΘ2+σN2Cov[Θ,X]=Cov[Θ,Θ+N]=Cov[Θ,Θ]+Cov[Θ,N]=Var[Θ]=σΘ2

Conditional mean estimator (two steps needed):

  • Computing conditional PDF

fΘ|X1,,XN(θ|x1,,xN)=fX1,,XN|Θ(x1,,xN|θ)fΘ(θ)fX1,,XN(x1,,xN)=γexp(12σX|Θ=θ2i=1N(xiθ)2)exp(12σΘ2(θm)2)exp(12σX|Θ=θ2σΘ2(i=1N(xiθ)2σΘ2+(θm)2σX|Θ=θ2))exp(12σX|Θ=θ2σΘ2(i=1N(xi2σΘ22xiθσΘ2+θ2σΘ2)+(θ22mθ+m2)σX|Θ=θ2))exp(12σX|Θ=θ2σΘ2(i=1N(2xiθσΘ2)+Nθ2σΘ2+θ2σX|Θ=θ22mθσX|Θ=θ2))exp(12σX|Θ=θ2σΘ2(θ2(NσΘ2+σX|Θ=θ2)2θ(σΘ2i=1Nxi+mσX|Θ=θ2)))exp((NσΘ2+σX|Θ=θ2)2σX|Θ=θ2σΘ2(θσΘ2i=1Nxi+mσX|Θ=θ2(NσΘ2+σX|Θ=θ2))2)directly read out:μΘ|X1,,XN=σΘ2i=1Nxi+mσX|Θ=θ2(NσΘ2+σX|Θ=θ2)  (5.34)σΘ|X1,,XN2=σX|Θ=θ2σΘ2NσΘ2+σX|Θ=θ2
  • Computing conditional mean

θ^CM=E[Θ|x1,,xN]=σΘ2σΘ2+σX|Θ=θ2/Nθ^ML+σX|Θ=θ2/NσΘ2+σX|Θ=θ2/Nmwith θ^ML=1Ni=1Nxi

Discussion

  • Given large N or small conditional variance σX|Θ=θ2 or large variance σΘ2, it is recommended to rely on the ML estimator.

  • Given small variance σΘ2 or large conditional variance σX|Θ=θ2, it is recommended to rely on the mean value of Θ.

Minimum Mean Square Error

MSE is minimized at TCM, and is given by

EΘ[EX[(TCMΘ)2|Θ]]=EΘ[EX[(TCMΘ)2|X]]=EΘ[EX[(EΘ[Θ|X]Θ)2|X]]=EΘ[EX[(EΘ2[Θ|X]2EΘ[Θ|X]Θ+Θ2)|X]]=EΘ[EΘ,X2[Θ|X]2EΘ,X[Θ|X]EΘ,X[Θ|X]+EΘ,X[Θ2|X]]=EΘ[EΘ,X[Θ2|X]EΘ,X2[Θ|X]]=EΘ[Var[Θ|X]]=σX|Θ=θ2σΘ2NσΘ2+σX|Θ=θ2

which can be obtained once we have the information about the joint distribution (X,θ)T

E[[X  Θ]T]=[μX  μΘ]T=[m  m]TVar[[X  Θ]T]=[cX,XcX,ΘcΘ,XcΘ,Θ]cX,X=Var[X]=Var[Θ+N]=Var[Θ]+Var[N]=σΘ2+σX|Θ=θ2cΘ,X=EΘ,N[(ΘμΘ)(Θ+NμΘ)]=Var[Θ]=σΘ2

5.4 Jointly Gaussian Random Variables – Multivariate Case

Given random vectors and the covariance matrix from the joint distribution:

X=Θ+NXN(μX,CX)ΘN(μΘ,CΘ)CZ[CXCX,ΘCΘ,XCΘ]

the multivariate conditional mean estimator is obtained as:

TCM:xE[Θ|X=x]=μΘ+CΘ,XCX1(xμx)  (5.48)

The respective MMSE is equal to the trace of the conditional covariance matrix CΘ|X:

E[||TCMΘ||2]=tr[CΘ|X=x]=tr[CΘCΘ,XCX1CX,Θ]
  • Yes, the estimator in (5.48) is the solution of the integral in (5.34), where θ and X are substituted by the respective vectors. The derivation of the estimator is however not as straightforward than for the scalar case due to the different rules for operations of vectors and matrices instead of scalars.

    If you refer to the last paragraph on slide 75, you find the case for N=1. There, if you consider 1/σX2 to be equal with CX1, you certainly see the strong similarity to the Eq. (5.48).

    Silde 75: For N=1, we obtain

    TCM=σΘ2i=1Nxi+mσX|Θ=θ2(NσΘ2+σX|Θ=θ2)=σΘ2x+mσX|Θ=θ2σΘ2+σX|Θ=θ2=m+σΘ2σΘ2+σX|Θ=θ2(xm)
  • Given Jointly Gaussian Random Variables X and Y, the Conditional Mean Estimator E[Y|X] is a Linear Function in X (=ρX), and the marginal distribution of X is also Gaussian independent of the correlation coefficient. This does not hold for arbitrarily jointly distributed random variables!

Example case:

Ni are i.i.d. and follow N(0,σN2). Noise θ is the same for each signal Xi. We have

X=[X1XN]N(μX,CX), ΘN(m,σΘ2)

where

E[X]=[mm]=m1Cov[Θ,X]=E[(Θm)(Xm1)T]=E[ΘXT]mE[XT]=E[Θ(Θ1+N)]E[Θ]21=E[Θ2]1E[Θ]21=σΘ21Cov[X]=E[(Θ1+NE[Θ]1)(Θ1+NE[Θ]1)T]=σΘ211T+σN2I

Apply (5.48), we have the CM estimator

E[Θ|X=x]=μΘ+CΘ,XCX1(xμx)=m+σΘ21(σΘ211T+σN2I)1(xm1)=i=1Nxi/σN2+m/σΘ2N/σN2+1/σΘ2

5.5 Orthogonality Principle

The stochastic orthogonality is an inherent property of the conditional mean estimator, it describes the inherent stochastic orthogonality between the CM estimator error and any observations statistics thereof:

E[(TCM(X1,,XN)Θ)h(X1,,XN)]=0TCM(X1,,XN)Θh(X1,,XN)   

the CM estimation error is stochastically orthogonal to any observations statistics.截屏2023-06-17 22.57.10where H is the subspace spanned by a random variable h(X) which is based on the random variable X, TCM(X) is the CM estimator. The error ε=TCMΘ should be orthogonal to H to be minimal.

Mean Estimation Example

Given

[XΘ]N([mm],[σΘ2+σN2σΘ2σΘ2σΘ2+σN2])

and functionals h1,,hN

hi=XiμX

Recall that the MSE optimal estimator is linear in X due to the jointly Gaussian distribution, we substitute TCM by a linear model

TCM(X)=aTX+b

Apply the orthogonal principle,

E[(TCMΘ)h(X1,,XN)]=0T

we obtain

E[(aTX+bΘ)(XμX1)T]=E[(aT(XμX1)+b(ΘμΘ)+μXaT1μΘ)(XμX1)T]=aTCxCΘ,XaT=CΘ,XCx1

The missing parameter b is eventually found by introducing another function hN+1=1:

b=μXaT1+μΘ

Finally we get

TCM(X)=CΘ,XCx1(XμX1)+μΘ

 

  • Alternative solution: h=θ^ML

We choose h=S=θ^ML=i=1NXi/N, and the MSE optimal estimator is linear in X due to the jointly Gaussian distribution, we then substitute TCM by a linear model

TCM(X)=aS+b

and apply orthogonality principle:

E[(aS+bΘ)S]=0a=cΘ,SσS2, b=μΘaμSθ^CM=i=1Nxi/σN2+m/σΘ2N/σN2+1/σΘ2

 

 

IV. Linear Estimation

6. Linear Estimation

We focus on linear models (linear regression)

TLin:xy^=xTt+my=[y1yN]  X=[x1TxNT]

6.1 Least Square Estimation

mint{i=1N(yixiTt)2}minx||yXt||22

The standard approach for this is based on convexity of the objective function,

t||yXt||22=2XTXt2XTy=0tLS=(XTX)1XTt

Geometrical Perspective

截屏2023-06-17 23.24.38

mint||yXt||22yXt  range[X]yXtnull[XT] XT(yXt)=0    tLS=(XTX)1XTt

2nd Geometrical Perspective

Using orthogonal projector X(XTX)1XT, we project y onto range[X]:

y^=X(XTX)1XTy

Mean Estimation Example

In order to estimate the mean μX from an unknown distribution fX(x) based on N observations by means of MSE, we introduce the linear model

TLin:xit=μ^X

and the LS optimization problem

mint{i=1N(xit)2}mint||x1t||22μ^X=tLS=(1T1)11Tx=1Ni=1Nxi

Affine Linear Regression

Given traning set, a linear estimator is defined as

TLin:xy^=xttLS=xTyxTx

Another linear estimator is

TLin:xy^=xt1+t2=[x 1][t1t2]=xTttLS=(XTX)1XTy

SVD Perspective

X=[U,U][Σ0]VT||yXt||22=||[UTU,T]y[Σ0]VTt||22UTy=ΣVTtLStLS=VΣ1UTy=i=1d1σX,i(uiTy)vi

6.2 Least Equares Estimation with Regularization (Ridge Regression)

In many real-world applications, LS problems may be ill-conditioned. In such cases, regularization techniques can provide an alternative handling of the problem.

Tikhonov/Ridge regression:

mint{||Xty||2+γ||Φt||2}

where Φ is the approproately chosen regularization operator. It can be shown that the solution is

(XTX+γΦTΦ)tγ^=XTy

Furthermore,

X=UΣVT=i=1dσX,iuiviTXTX=VΣUTUΣVT=VΣ2VT=i=1dσX,i2viviTΦTΦ=i=1dσΦ,i2viviT(i=1dσX,i2viviT+γi=1dσΦ,i2viviT)t^γ=i=1dσX,iviuiTyt^γ=i=1dF(γ,σX,i,σΦ,i)1σX,i(uiTy)vi

6.3 Linear Minimum Mean Square Error Estimation

Based on a linear model for the estimator, the LMMSE minimizes the optimization problem

TLin:xy^=xTt+mmint,mE[||ytTxm||2]

Given the joint mean values and covariance of the random variables z=(x,y)T

μz=[μxμy]  Cz=[Cxcx,ycy,xcy]

we can get the LMMSE estimator by

Cx=E[(xμx)(xμx)T]=E[xxT]E[xμxT]E[μxxT]+μxμxT=E[xxT]μxμxTcy,x=E[(xμx)T(yμy)]=E[xTy]μyE[xT]μxTE[y]+μxTμy=E[xTy]μxTμyE[(ytTxm)xT]=E[yxT]tTE[xxT]mE[xT]=cy,x+μxTμytT(Cx+μxμxT)mμxT=!0tT=cy,xCx1y^  =cy,xCx1x+μycy,xCx1μx=cy,xCx1(xμx)+μy

! apply orthogonality principle

  • In the case of zero-mean random variables, the LMMSE estimator is y^=cy,xCx1x. The minimum MSE is

E[|yy^|2]=cytTcx,y=cycy,xCx1cx,y
  • Given that the random variables x and y are jointly Gaussian distributed, the LMMSE is obviously identically with the CME.

V. Examples

7. Estimator of a Matrix Channel

We consider the estimation of a time-inveriant non-dispersive MIMO channel y=hs+n with K antenna elements at the transmitter and M at the receiver, which means KM transmission channels are estimated.

And we compare three linear estimators

h^=TyCKM
  • MMSE, mean square error estimator

  • ML, maximum likelihood estimator

  • MF, matched filter estimator

7.1 Channel Model

The task

is to find good estimators of the channel coefficients

hm,k with m=1,,M and k=1,,K

where hm,k denotes the channel coefficient from the kth transmitter to the mth receiver.

The traning signals

consist of N vectors snCK,n=1,,K.

The estimation

of the channel coefficients hm,k is based on the received signal vectors

yn=Hsn+nn, n=1,,N

where H denotes the matrix of channel coefficients, yn the received signal vector, and nn the noise corruption at the receiver for the nth traning vector.

The model

for the training channel is

[y1,,yN]=H[s1,,sN]+[n1,,nN]

i.e.,

Y=HS+N

By stacking the column vectors of the matrices, we obtain

y=!(STIM)h+n

! here we use

AXB=C(BTA)vec(X)=vec(C)

Further Assumptions

We further assume that the stacked vector are Gaussian distributed

hN(0,Ch)nN(0,Cn)

where

Cn=σn2INM

And the channel vector h and the noise vector n are assumed to be stochastically independent. and thus uncorrelated.

Cov[h,nH]=0

Matrix S

is used as an abbreviation for STIM.

Consequently, due to the linear channel model y=Sh+n, we conclude z=[yT,hT]T are jointly Gaussian distributed. The covariance is equal to

Cz=[CyCy,hCh,yCh]

with

Cy=SChSH+CnCy,h=SChCh,y=ChSH
  • In the following, we assume full knowledge of the covariance matrices Cy and Cy,h.

7.2 Three Linear Estimator

1. MMSE (minimum mean square error estimation)

 

留下评论

您的电子邮箱地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据