Statistical Signal Processing – Summary

Statistical Signal Processing

I. Parameter Estimation

1 Statistical Modeling and Quality Criteria

  • Statistical Estimation

The goal is to determine the probability distribution of the random variables based on avaliable samples. / The stochastic model is a set of probability spaces and the task of statistical estimation is to select the most appropriate candidate based on the observed outcomes of a random experiment.

  • Statistical Model

In a statistical model, data is assumed to be generated by some underlying probability distribution, and the goal is to estimate the parameters of this distribution.

X: a set of all possible observations

F: a subset of X that satisfies certain properties. Here, F is the collection of events that can be assigned probabilities

dF(;θ): a set of optional probability measures that assign probabilities to events in F, they depend on parameter θ

θ: selects the probability measure dF(;θ). They are unknown and need to be estimated based on the available data.

  • Statistics

Xi:ΩX, a random variable, also called a statistic, maps outcomes in the sample space Ω to values in set X, representing the values of interest that we want to study in our statistical analysis

xiX: actual values that we observe or measure in our study

T:XΘ, a special type of statistic, called an estimator, maps observations xiX to θ^Θ

  • Pθ() refers to the probability of an event according to the probability distribution selected by θ. The same holds for Eθ[] and Varθ[].

1.1 Introductory Example: Estimating upper bound θ

Given N i.i.d. statistics X1,,XN of a uniformly distributed random variable XU(0,θ) where θ is deterministic but unknown. We have that E[X]=θ/2,Var[X]=θ2/12.

Solution 1: Use E[X]=θ/2

T1:X1,,XNθ^1=21Ni=1NXi

Solution 2: Intuition θ^=maxi=1,,N{Xi}

1.2 Consistency and Unbiasedness

Consistent estimator T: limNT(x1,,xN)θ

T1 and T2 are both consistent.

T1:

Using Chebyshev inequality, we derive the law of large numbers,

Pr(|XE[X]|Var[X]ϵ)1ϵ2 (Chebyshev’s inequality)X¯n=1ni=1nXi, E[X¯n]=E[X], Var[X¯n]=Var[X]nPr(|X¯nE[X¯n]|Var[X¯n]ϵ)1ϵ2Pr(|X¯nE[X]|Var[X]nϵ)1ϵ2Pr(|X¯nE[X]|ϵ)Var[X]nϵ2 (Law of large numbers)

Using law of large numbers, we get

Pr(|T12θ2|ϵ)Var(X)Nϵ2N0

T2:

Pr(|T2θ|ϵ)=Pr(maxi=1,,N{Xi}θϵ)=Pr(X1θϵ,,XNθϵ)=iidi=1NPr(Xiθϵ)=i=1Nθϵθ=(θϵθ)NN0

Unbiased estimator T: E[T(X1,,XN)]=θ

  • T1 is unbiased:

E[T1]=E[2Ni=1NXi]=2Ni=1NE[Xi]=θ
  • T2 is asymptorically unbiased:

E[T2]=0θξfT2(ξ;θ)dξ=NN+1θ

Proof. For 0ξθ,

FT2(ξ;θ)=Pr(T2ξ)=(ξθ)NfT2(ξ;θ)=ddξFT2(ξ;θ)=ddξ(ξθ)N=NξN1θN
  • T2 is unbiased:

T2=N+1NT2

截屏2023-06-17 14.43.31

1.3 Variance: Var[T]=E[(TE[T])2]

A further quality measure for an estimator is its variance.

Var[T1]=Var[2Ni=1NXi]=4N2i=1NVar[Xi]=θ23NVar[T2]=θ2N(N+2)Var[T2]=Nθ2(N+1)2(N+2)

1.4 Mean Squared Error(MSE): E[ε2(T)]=E[(Tθ)2]

An extension of the variance is the MSE(mean squared error), where θ is the true parameter.

T1 and T2 are unbiased: E[T1]=E[T2]=θ, therefore,

E[ε2(T1)]=Var[T1]=θ23NE[ε2(T2)]=Var[T2]=θ2N(N+2)

T2 is biased, its MSE is obtained by

E[ε2(T2)]=E[T22]2E[T2]θ+θ2=2θ2(N+2)(N+1)

 

IMG_D15E3BA67391-1

1.5 Bias/Variance Trade-Off

MSE of an estimator T is defined as

E[ε2(T)]=E[(Tθ)2]

and can be decomposed into its bias and variance

E[ε2(T)]=E[(TE[T])2]+(E[T]θ)2=Var[T]+Bias[T]2
  • Choose α to get optimal αT2

Var[αT2]=α2Var[T2]=Nα2θ2(N+1)2(N+2)Bias[αT2]2=(E[αT2]θ)2=(αE[T2]θ)2=(αNN+11)2θ2argαminBias=N+1NargαminMSE=N+1N+1N+2

Therefore, an unbiased estimator is not necessarily the optimal estimator, but for large N, it is.

2 Maximum Likelihood Estimation

2.1 Maximum Likelihood Principle

The maximum likelihood principle suggests to select a candidate probability measure such that the observed outcomes of the experiment become most probable. A maximum likelihood estimator TML picks the θ that maximizes the likelihood function L(x1,,xN|θ).

截屏2023-05-16 22.35.39

The likelihood function depends on the statistical model, assuming all observations are iid, we obtain,

discrete R.V.: L(x1,,xN|θ)=pX1,,XN(x1,,xN|θ)=i=1NpXi(xi|θ)continuous R.V.: L(x1,,xN|θ)=fX1,,XN(x1,,xN|θ)=i=1NfXi(xi|θ)

Normally, we use log-likelihood function,

TML:x1,,xNθ^ML=argθmaxlogi=1NL(xi|θ)=argθmaxi=1NlogL(xi|θ)
  • In the slides, L(x;θ) is used rather than L(x|θ).

2.2 Parameter Estimation

Channel Estimation

Consider an AWGN channel y=hx+N with NN(0,σ2). We know that YN(hs,σ2), and with θ=h, we have

fY(y;θ)=12πσexp((yθs)22σ2)

Given N observations y and training signals x, the maximum-likelihood estimation is

hML=argθmaxi=1Nlog(12πσexp((yiθsi)22σ2))=argθmaxi=1N(yiθsi)22σ2=argθmini=1N(yiθsi)22σ2=argθmin||yθs||2=(sTs)1sTy
  • The ML estimator is obviously identical with the least squares estimator, which changes drastically when the statistics Yi are correlated or when N is non-Gaussian distributed.

Introductory Example: Estimating upper bound θ

Suppose the distribution of observations is uniform, the likelihood function of N observations is

L(x1,,xN;θ)={1θN;  if i, xiθ0;  otherwise 

1/θN>0, therefore, to maximize the likelihood function,

θML=argθmaxL(x1,,xN;θ)=max{x1,,xN}

N Bernoulli Experiments

Given N observations, the likelihood function of success number is

L(x;θ)=(Nx)θx(1θ)NxlogL(x;θ)=log(Nx)+xlogθ+(Nx)log(1θ)

and E[X]=Nθ,Var[X]=Nθ(1θ).

The ML-estimator is obtained by

θ^ML=argθmaxlogL(x;θ)ddθlogL(x;θ)=xθNx1θ=0θ^ML=xN

In the following, we analyze the quality of TML

E[TML]=E[XN]=E[X]N=θ   unbiasedVar[TML]=Var[X]N2=θ(1θ)N

Since the estimator is unbiased, the MSE is equal to the variance of the estimator,

E[ε2(TML)]=Var[TML]=θ(1θ)N

However, biased estimator can have less MSE and thus provide better estimates than unbiased estimator.

Alternative Solution:

T(x)=x+1N+2E[T]=E[X+1N+2]=Nθ+1N+2Bias[T]=|12θN+2|E[ε2(T)]=Var[T]+Bias[T]2=Var[X+1N+2]+(12θN+2)2=Var[X](N+2)2+(12θN+2)2=Nθ(1θ)+(12θ)2(N+2)2

截屏2023-05-09 22.13.24

2.3 Best Unbiased Estimator

ML estimators are not necessarily the best estimators. However, a wide class of estimators is defined by minimizing the MSE under an unbiasedness constraint.

We call an estimator T Best Unbiased Estimator if E[T(X1,,XN)]=θ and

Var[T(X1,,XN)]Var[S(X1,,XN)]

for any alternative unbiased estimator S.

Best unbiased estimators are also referred to as UMVU(Uniformly Minimum Variance Unbiased) estimators.

3. Fisher's Information Inequality

An universal lower bound for the variance of an estimator can be introduced, if the following consition is fulfilled:

L(x;θ)>0, xX,θΘL(x;θ) differentiable with respect to θXθL(x;θ)dx=θXL(x;θ)dx

We define the score function as the slope of logL(x;θ) with respect to θ:

g(x;θ)=θlogL(x;θ)=θL(x;θ)L(x;θ)

3.1 Cramer-Rao Lower Bound

With E[g(X;θtrue)]=E[θlogL(x;θ)]=0, the Fisher Information term is defined as

IF(θtrue)=Var[g(X;θtrue)]=E[(g(X;θtrue)E[g(X;θtrue)])2]=E[g(X;θtrue)2]=appendixEX[2θ2logL(X;θtrue)]

which can be interpreted as the negative mean curvature of the log-likelihood function at θtrue.

The variance of an estimator can be lower bounded by the Cramer-Rao lower bound

Var[T(X)]appendix(E[T(X)]θ)21IF(θ)|θ=θtrue

If T is unbiased, E[T(X)]=θ, then

Var[T(X)]1IF(θtrue)

Properties of the Fisher Information:

  • IF(θ) depends on given observations x1,,xN and the unknown parameter θ

  • A large value of IF(θ) corresponds to a strong curvature and more information in x1,,xN. A small value of IF(θ) corresponds to a weak curvature and little information in x1,,xN.

  • IF(θ) is monotonically increasing with the number of independent observation N statistics.

IF(N)(θ)=NIF(1)(θ)

截屏2023-05-16 23.44.52

3.2 Exponential Models

A exponential model is a statistical model with