Statistical Signal Processing

I. Parameter Estimation

1 Statistical Modeling and Quality Criteria

Statistical Estimation

The goal is to determine the probability distribution of the random variables based on avaliable samples. / The stochastic model is a set of probability spaces and the task of statistical estimation is to select the most appropriate candidate based on the observed outcomes of a random experiment.

Statistical Model

In a statistical model, data is assumed to be generated by some underlying probability distribution, and the goal is to estimate the parameters of this distribution.

$\mathbb{X}$ : a set of all possible observations

$\mathbb{F}$ $\mathbb{X}$ $\mathbb{F}$ is the collection of events that can be assigned probabilities

$\mathrm{d}F(\cdot;\theta)$ $\mathbb{F}$ $\theta$

$\theta$ $\mathrm{d}F(\cdot;\theta)$ . They are unknown and need to be estimated based on the available data.

Statistics

$X_i: \Omega \rightarrow \mathbb{X}$ $\Omega$ $\mathbb{X}$ , representing the values of interest that we want to study in our statistical analysis

$x_i\in\mathbb{X}$ : actual values that we observe or measure in our study

$T:\mathbb{X}\rightarrow \Theta$ $x_i\in\mathbb{X}$ $\hat{\theta}\in\Theta$

$P_{\theta}(\cdot)$ $\theta$ $E_{\theta}[\cdot]$ $\text{Var}_{\theta}[\cdot]$ .

$\theta$

$N$ $X_1,\dots,X_N$ $X\sim \mathcal U (0,\theta)$ $\theta$ $E[X]=\theta/2,Var[X]=\theta^2/12$ .

$E[X]=\theta/2$

\begin{matrix} T_{1} : X_{1}, \dots, X_{N} \mapsto {\hat{θ}}_{1} = 2 \cdot \frac{1}{N} \sum_{i = 1}^{N} X_{i} \end{matrix}

$\hat{\theta}=\max_{i=1,\dots,N}\{X_i\}$

1.2 Consistency and Unbiasedness

$T$ $\lim_{N\rightarrow\infin}T(x_1,\dots,x_N)\rightarrow \theta$

$T_1$ $T_2$ are both consistent.

$T_1$ :

Using Chebyshev inequality, we derive the law of large numbers,

\begin{aligned} Pr (| X - E [X] | \geq \sqrt{V a r [X]} \cdot ϵ) \leq \frac{1}{ϵ^{2}} (Chebyshev’s inequality) \\ {\bar{X}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}, E [{\bar{X}}_{n}] = E [X], V a r [{\bar{X}}_{n}] = \frac{V a r [X]}{n} \\ \Rightarrow & Pr (| {\bar{X}}_{n} - E [{\bar{X}}_{n}] | \geq \sqrt{V a r [{\bar{X}}_{n}]} \cdot ϵ) \leq \frac{1}{ϵ^{2}} \\ \Rightarrow & Pr (| {\bar{X}}_{n} - E [X] | \geq \sqrt{\frac{V a r [X]}{n}} \cdot ϵ) \leq \frac{1}{ϵ^{2}} \\ \Rightarrow & Pr (| {\bar{X}}_{n} - E [X] | \geq ϵ) \leq \frac{V a r [X]}{n ϵ^{2}} (Law of large numbers) \end{aligned}

Using law of large numbers, we get

\begin{array}{r} Pr (| \frac{T_{1}}{2} - \frac{θ}{2} | \geq ϵ) \leq \frac{Var (X)}{N ϵ^{2}} \overset{N \to \infty}{⟶} 0 \end{array}

$T_2$ :

\begin{aligned} Pr (| T_{2} - θ | \geq ϵ) & = Pr (max_{i = 1, \dots, N} {X_{i}} \leq θ - ϵ) \\ = Pr (X_{1} \leq θ - ϵ, \dots, X_{N} \leq θ - ϵ) \\ \overset{iid}{=} \prod_{i = 1}^{N} Pr (X_{i} \leq θ - ϵ) \\ = \prod_{i = 1}^{N} \frac{θ - ϵ}{θ} \\ = (\frac{θ - ϵ}{θ})^{N} \overset{N \to \infty}{⟶} 0 \end{aligned}

$T$ $E[T(X_1,\dots,X_N)]=\theta$

$T_1$ is unbiased:

\begin{matrix} E [T_{1}] = E [\frac{2}{N} \sum_{i = 1}^{N} X_{i}] = \frac{2}{N} \sum_{i = 1}^{N} E [X_{i}] = θ \end{matrix}

$T_2$ is asymptorically unbiased:

\begin{matrix} E [T_{2}] = \int_{0}^{θ} ξ \cdot f_{T_{2}} (ξ; θ) d ξ = \frac{N}{N + 1} θ \end{matrix}

Proof. $0\leq \xi\leq\theta$ ,

\begin{aligned} F_{T_{2}} (ξ; θ) & = Pr (T_{2} \leq ξ) = (\frac{ξ}{θ})^{N} \\ f_{T_{2}} (ξ; θ) & = \frac{d}{d ξ} F_{T_{2}} (ξ; θ) = \frac{d}{d ξ} (\frac{ξ}{θ})^{N} = N \cdot \frac{ξ^{N - 1}}{θ^{N}} \end{aligned}

$T_2'$ is unbiased:

\begin{matrix} T_{2}^{'} = \frac{N + 1}{N} T_{2} \end{matrix}

截屏2023-06-17 14.43.31

$\text{Var}[T]=E[(T-E[T])^2]$

A further quality measure for an estimator is its variance.

\begin{aligned} Var [T_{1}] & = Var [\frac{2}{N} \sum_{i = 1}^{N} X_{i}] = \frac{4}{N^{2}} \sum_{i = 1}^{N} Var [X_{i}] = \frac{θ^{2}}{3 N} \\ Var [T_{2}^{'}] & = \frac{θ^{2}}{N (N + 2)} \\ Var [T_{2}] & = \frac{N θ^{2}}{(N + 1)^{2} (N + 2)} \end{aligned}

$E[\varepsilon^2(T)]=E[(T-\theta)^2]$

$\theta$ is the true parameter.

$T_1$ $T_2'$ $E[T_1]=E[T_2']=\theta$ , therefore,

\begin{aligned} E [ε^{2} (T_{1})] & = Var [T_{1}] = \frac{θ^{2}}{3 N} \\ E [ε^{2} (T_{2}^{'})] & = Var [T_{2}^{'}] = \frac{θ^{2}}{N (N + 2)} \end{aligned}

$T_2$ is biased, its MSE is obtained by

\begin{aligned} E [ε^{2} (T_{2})] & = E [T_{2}^{2}] - 2 E [T_{2}] θ + θ^{2} \\ = \frac{2 θ^{2}}{(N + 2) (N + 1)} \end{aligned}

IMG_D15E3BA67391-1

1.5 Bias/Variance Trade-Off

$T$ is defined as

\begin{matrix} E [ε^{2} (T)] = E [(T - θ)^{2}] \end{matrix}

and can be decomposed into its bias and variance

\begin{aligned} E [ε^{2} (T)] & = E [(T - E [T])^{2}] + (E [T] - θ)^{2} \\ = V a r [T] + B i a s [T]^{2} \end{aligned}

$\alpha$ $\alpha T_2$

\begin{aligned} V a r [α T_{2}] & = α^{2} V a r [T_{2}] = \frac{N α^{2} θ^{2}}{(N + 1)^{2} (N + 2)} \\ B i a s [α T_{2}]^{2} & = (E [α T_{2}] - θ)^{2} \\ = (α E [T_{2}] - θ)^{2} \\ = (\frac{α N}{N + 1} - 1)^{2} θ^{2} \\ \arg_{α} min B i a s & = \frac{N + 1}{N} \\ \arg_{α} min M S E & = \frac{N + 1}{N + \frac{1}{N + 2}} \end{aligned}

Therefore, an unbiasedoptimal $N$ , it is.

2 Maximum Likelihood Estimation

2.1 Maximum Likelihood Principle

The maximum likelihood principle suggests to select a candidate probability measure such that the observed outcomesbecome most probable $T_{ML}$ $\theta$ $L(x_1,\dots,x_N|\theta)$ .

截屏2023-05-16 22.35.39

The likelihood function depends on the statistical model, assuming all observations are iid, we obtain,

\begin{aligned} discrete R.V.: & L (x_{1}, \dots, x_{N} | θ) = p_{X_{1}, \dots, X_{N}} (x_{1}, \dots, x_{N} | θ) = \prod_{i = 1}^{N} p_{X_{i}} (x_{i} | θ) \\ continuous R.V.: & L (x_{1}, \dots, x_{N} | θ) = f_{X_{1}, \dots, X_{N}} (x_{1}, \dots, x_{N} | θ) = \prod_{i = 1}^{N} f_{X_{i}} (x_{i} | θ) \end{aligned}

Normally, we use log-likelihood function,

\begin{aligned} T_{M L} : x_{1}, \dots, x_{N} \mapsto {\hat{θ}}_{M L} & = \arg_{θ} max \log \prod_{i = 1}^{N} L (x_{i} | θ) \\ = \arg_{θ} max \sum_{i = 1}^{N} \log L (x_{i} | θ) \end{aligned}

$L(x;\theta)$ $L(x|\theta)$ .

2.2 Parameter Estimation

Channel Estimation

$y=hx+N$ $N\sim \mathcal{N}(0,\sigma^2)$ $Y\sim \mathcal{N}(hs,\sigma^2)$ $\theta=h$ , we have

\begin{matrix} f_{Y} (y; θ) = \frac{1}{\sqrt{2 π} σ} \exp (- \frac{(y - θ s)^{2}}{2 σ^{2}}) \end{matrix}

$N$ $y$ $x$ , the maximum-likelihood estimation is

\begin{aligned} h_{M L} & = \arg_{θ} max \sum_{i = 1}^{N} \log (\frac{1}{\sqrt{2 π} σ} \exp (- \frac{(y_{i} - θ s_{i})^{2}}{2 σ^{2}})) \\ = \arg_{θ} max \sum_{i = 1}^{N} - \frac{(y_{i} - θ s_{i})^{2}}{2 σ^{2}} \\ = \arg_{θ} min \sum_{i = 1}^{N} \frac{(y_{i} - θ s_{i})^{2}}{2 σ^{2}} \\ = \arg_{θ} min | | y - θ s | |^{2} \\ = (s^{T} s)^{- 1} s^{T} y \end{aligned}

$Y_i$ $N$ is non-Gaussian distributed.

$\theta$

$N$ observations is

\begin{matrix} \begin{matrix} L (x_{1}, \dots, x_{N}; θ) = {\begin{aligned} \frac{1}{θ^{N}}; if \forall i, x_{i} \leq θ \\ 0; otherwise \end{aligned} \end{matrix} \end{matrix}

$1/\theta^N>0$ , therefore, to maximize the likelihood function,

\begin{matrix} \begin{matrix} θ_{M L} = \arg_{θ} max L (x_{1}, \dots, x_{N}; θ) \\ = max {x_{1}, \dots, x_{N}} \end{matrix} \end{matrix}

$N$ Bernoulli Experiments

$N$ observations, the likelihood function of success number is

\begin{matrix} \begin{matrix} L (x; θ) = (\binom{N}{x}) θ^{x} (1 - θ)^{N - x} \\ \log L (x; θ) = \log (\binom{N}{x}) + x \log θ + (N - x) \log (1 - θ) \end{matrix} \end{matrix}

$E[X]=N\theta,Var[X]=N\theta(1-\theta)$ .

The ML-estimator is obtained by

\begin{aligned} {\hat{θ}}_{M L} = \arg_{θ} & max \log L (x; θ) \\ \frac{d}{d θ} \log L (x; θ) & = \frac{x}{θ} - \frac{N - x}{1 - θ} \\ = 0 \\ \Rightarrow {\hat{θ}}_{M L} & = \frac{x}{N} \end{aligned}

$T_{ML}$

\begin{aligned} E [T_{M L}] & = E [\frac{X}{N}] = \frac{E [X]}{N} = θ unbiased \\ V a r [T_{M L}] & = \frac{V a r [X]}{N^{2}} = \frac{θ (1 - θ)}{N} \end{aligned}

Since the estimator is unbiased, the MSE is equal to the variance of the estimator,

\begin{matrix} E [ε^{2} (T_{M L})] = V a r [T_{M L}] = \frac{θ (1 - θ)}{N} \end{matrix}

However, biased estimator can have less MSE and thus provide better estimates than unbiased estimator.

Alternative Solution:

\begin{aligned} T^{'} (x) & = \frac{x + 1}{N + 2} \\ E [T^{'}] & = E [\frac{X + 1}{N + 2}] = \frac{N θ + 1}{N + 2} \\ \Rightarrow B i a s [T^{'}] & = | \frac{1 - 2 θ}{N + 2} | \\ E [ε^{2} (T^{'})] & = V a r [T^{'}] + B i a s [T^{'}]^{2} \\ = V a r [\frac{X + 1}{N + 2}] + (\frac{1 - 2 θ}{N + 2})^{2} \\ = \frac{V a r [X]}{(N + 2)^{2}} + (\frac{1 - 2 θ}{N + 2})^{2} \\ = \frac{N θ (1 - θ) + (1 - 2 θ)^{2}}{(N + 2)^{2}} \end{aligned}

截屏2023-05-09 22.13.24

2.3 Best Unbiased Estimator

ML estimators are not necessarily the best estimators. However, a wide class of estimators is defined by minimizing the MSE under an unbiasedness constraint.

$T$ Best Unbiased Estimator $E[T(X_1,\dots,X_N)]=\theta$ and

\begin{matrix} V a r [T (X_{1}, \dots, X_{N})] \leq V a r [S (X_{1}, \dots, X_{N})] \end{matrix}

unbiased $S$ .

Best unbiased estimators are also referred to as UMVU(Uniformly Minimum Variance Unbiased) estimators.

3. Fisher's Information Inequality

An universal lower bound for the variance of an estimator can be introduced, if the following consition is fulfilled:

\begin{aligned} ∙ L (x; θ) > 0, \forall x \in X, θ \in Θ \\ ∙ L (x; θ) differentiable with respect to θ \\ ∙ \int_{X} \frac{\partial}{\partial θ} L (x; θ) d x = \frac{\partial}{\partial θ} \int_{X} L (x; θ) d x \end{aligned}

We define the score functionslope $\log L(x;\theta)$ $\theta$ :

\begin{matrix} g (x; θ) = \frac{\partial}{\partial θ} \log L (x; θ) = \frac{\frac{\partial}{\partial θ} L (x; θ)}{L (x; θ)} \end{matrix}

3.1 Cramer-Rao Lower Bound

$E[g(X;\theta_{true})]=E[\frac{\partial}{\partial\theta}\log L(x;\theta)]=0$ , the Fisher Information term is defined as

\begin{aligned} I_{F} (θ_{t r u e}) & = V a r [g (X; θ_{t r u e})] \\ = E [(g (X; θ_{t r u e}) - E [g (X; θ_{t r u e})])^{2}] \\ = E [g (X; θ_{t r u e})^{2}] \\ \overset{a p p e n d i x}{=} - E_{X} [\frac{\partial^{2}}{\partial θ^{2}} \log L (X; θ_{t r u e})] \end{aligned}

negative mean curvature of the log-likelihood function $\theta_{true}$ .

The variance of an estimator can be lower bounded by the Cramer-Rao lower bound

\begin{matrix} V a r [T (X)] \overset{a p p e n d i x}{\geq} (\frac{\partial E [T (X)]}{\partial θ})^{2} \frac{1}{I_{F} (θ)} |_{θ = θ_{t r u e}} \end{matrix}

$T$ $E[T(X)]=\theta$ , then

\begin{matrix} V a r [T (X)] \geq \frac{1}{I_{F} (θ_{t r u e})} \end{matrix}

Properties of the Fisher Information:

$I_F(\theta)$ $x_1,\dots,x_N$ $\theta$
$I_F(\theta)$ $x_1,\dots,x_N$ $I_F(\theta)$ $x_1,\dots,x_N$ .
$I_F(\theta)$ $N$ statistics.

\begin{matrix} I_{F}^{(N)} (θ) = N \cdot I_{F}^{(1)} (θ) \end{matrix}

截屏2023-05-16 23.44.52

3.2 Exponential Models

A exponential model is a statistical model with

\begin{matrix} \begin{matrix} f_{X} (x; θ) = \frac{h (x) \exp (a (θ) t (x))}{\exp b (θ)} \\ \exp b (θ) = \int_{X} h (x) \exp (a (θ) t (x)) d x \end{matrix} \end{matrix}

$b_{\theta}$ is for normalization. Then the respective Fisher Information can be directly obtained by

\begin{matrix} I_{F} (θ) = \frac{\partial a (θ)}{\partial θ} \frac{\partial E [t (X)]}{\partial θ} \end{matrix}

$f_X(x;\theta)$ such that $E[t(X)]=\theta$ $T=t(X)$ .

Mean Estimation Example

$\theta=\mu$ $X\sim\mathcal{N}(\mu,\sigma^2)$ based on a single observation.

\begin{aligned} f_{X} (x; θ) & \overset{!}{=} L (x; θ) = \frac{1}{\sqrt{2 π} σ} \exp - \frac{1}{2 σ^{2}} (x - θ)^{2}) \\ = \exp [\frac{θ x}{σ^{2}} - (\frac{θ^{2}}{2 σ^{2}} + \frac{1}{2} \log (2 π σ^{2}))] \exp [- \frac{1}{2 σ^{2}} x^{2}] \\ = \exp [a (θ) t (x) - b (θ)] \cdot h (x) \\ with t (x) & = x, a (θ) = θ / σ^{2} \end{aligned}

$E[X]=\mu=\theta$ single UMVU estimator $T(x)=x$ , which minimizes the variance, i.e., the MSE and gets

\begin{aligned} I_{F} & = \frac{\partial a (θ)}{\partial θ} \\ V a r [T] & = \frac{1}{I_{F}} = (\frac{\partial a (θ)}{\partial θ})^{- 1} = σ^{2} \end{aligned}

3.3 Asymptotically Efficient Estimators

An estimator is asymptotically efficient if (convergence in distribution)

\begin{matrix} \sqrt{N} (T (X_{1}, \dots, X_{N}) - θ) \overset{N \to \infty}{\sim} N (0, I_{F}^{(1)} (θ)^{- 1}) \end{matrix}

A ML estimator is asymptotically efficient if …

II. Examples

4. ML Principle for Direction of Arrival Estimation

4.1 Signal Model and ML Estimation

$\theta$ $M$ $d$ $t$ is

\begin{matrix} X (t) = ξ a s (t) + N (t) \in C^{M} \end{matrix}

$m$ -th antenna is

\begin{matrix} X_{m} (t) = ξ \exp (- j 2 π m \frac{d}{λ} \sin θ) s (t) + N_{m} (t), m = 0, \dots, M - 1 \end{matrix}

$\lambda$ $\textbf{N}(t)\sim\mathcal{N}(\textbf{0},C_\textbf{N})$ $\xi$ $s(t)$ experiences over the transmission path.

$M$ antenna elements, i.e.,

\begin{matrix} \begin{matrix} a = [\begin{matrix} α^{0} \\ α^{1} \\ ⋮ \\ α^{M - 1} \end{matrix}], with α = \exp (- j 2 π \frac{d}{λ} \sin θ) \end{matrix} \end{matrix}

$\textbf{N}(t)\sim\mathcal{N}(\textbf{0},\sigma^2_N\textbf{I})$ , we have

\begin{aligned} L (x; θ) & = \frac{1}{(π σ_{N}^{2})^{M}} \exp (- \frac{(x - ξ a s)^{H} (x - ξ a s)}{σ_{N}^{2}}) \\ max_{θ} L (x; θ) & = min_{θ} (x - ξ a s)^{H} (x - ξ a s) \\ = min_{θ} x^{H} x - 2 R e {ξ x^{H} a s} + ξ a^{H} a s^{2} \\ \overset{!}{=} max_{θ} R e {x^{H} a (θ)} \end{aligned}

! $\alpha^*\alpha=1\Rightarrow \textbf{a}^H\textbf{a}=M-1$ .

4.2 Cramer-Rao Bound for DOA Estimation

$\theta$ is the parameter to be estimated.

\begin{aligned} L (x; θ) & = \frac{1}{(π σ_{N}^{2})^{M}} \exp (- \frac{(x - ξ a s)^{H} (x - ξ a s)}{σ_{N}^{2}}) \\ = \frac{h (x) \exp (c^{H} (θ) t (x))}{\exp (b (θ))} \\ h (x) & = \frac{1}{(π σ_{N}^{2})^{M}} \exp (- \frac{x^{H} x}{σ_{N}^{2}}) \\ c (θ) & = \frac{1}{σ_{N}^{2}} [\begin{array}{c} ξ a s \\ (ξ a s) * \end{array}] \\ t (x) & = [\begin{array}{c} x \\ x^{*} \end{array}] \\ b (θ) & = \frac{ξ^{2} s^{2} a^{H} a}{σ_{N}^{2}} = \frac{M ξ^{2} s^{2}}{σ_{N}^{2}} \\ \Rightarrow I_{F} (θ) & = (\frac{\partial c (θ)}{\partial θ})^{H} \frac{\partial E [t (X)]}{\partial θ} \\ = 2 \frac{ξ^{2} s^{2}}{σ_{N}^{2}} | | \frac{\partial a}{\partial θ} | |^{2} \\ \propto (\frac{d}{λ})^{2} \frac{ξ^{2} s^{2}}{σ_{N}^{2}} (\cos θ)^{2} M^{3} \end{aligned}

III. Estimation of Random Variables

5. Bayesian Estimation

$\theta$ $\hat{\theta}$ .

$\theta$ $\theta$ $\theta$ is now considered as a random variable, i.e.,

\begin{matrix} PDF: f_{Θ} (θ; σ) \end{matrix}

$\sigma$ $\Theta$ , for example, a specific PDF.

$f_X(x;\theta)$ $f_X$ $\Theta$ is realized and $\theta$ is used

\begin{matrix} conditional PDF: f_{X | Θ} (x | θ) \end{matrix}

$T(x)$ $X$ $\Theta$ ,

\begin{matrix} E [ε^{2} (T (X), Θ)] = E [(T (X) - Θ)^{2}] = \int_{Θ} \int_{X} (T (x) - θ)^{2} f_{X | Θ} (x | θ) f_{Θ} (θ) d x d θ \end{matrix}

5.1 Conditional Mean Estimator/Bayes Estimator – Minimizing the MSE

$T_{CM}$

\begin{matrix} T_{C M} : x \mapsto E [Θ | X = x] = \int_{Θ} θ f_{Θ | X} (θ | x) d θ \end{matrix}

Theorem

$T_{CM}$ is MSE optimal, i.e., minimizes the MSE cost criterion:

\begin{matrix} E [ε^{2} (T (X), Θ)] = E_{Θ} [E_{X} [(T (X) - Θ)^{2} | Θ]] \end{matrix}

Alternative cost criterion

Although the mean MSE is the most popular cost criterion, other criteria have been proposed and applied. The mean modulus is defined as

\begin{matrix} E [| ε (T (X), Θ) |] = \int_{Θ} \int_{X} | T (x) - θ | f_{X | Θ} (x | θ) f_{Θ} (θ) d x d θ \end{matrix}

The conditional median estimator minimizes it,

\begin{matrix} T (x) = Median [Θ | X = x] \end{matrix}

5.2 Binomial Experiment

The ML estimator is obtained as

\begin{matrix} T_{M L} : x_{1}, \dots, x_{N} \mapsto {\hat{θ}}_{M L} = \arg_{θ} max \log L (x; θ) = \frac{x}{N} \end{matrix}

$f_{\Theta}(\theta)=1$ $\theta\in[0,1]$ , then the MSE estimator is

\begin{aligned} T_{C M} (x) & = E [Θ | X = x] \\ = \int_{0}^{1} θ f_{Θ | X} (θ | x) d θ \\ = \int_{0}^{1} θ \frac{p_{X | Θ} (x | θ) f_{Θ} (θ)}{p_{X} (x)} d θ \end{aligned}

Note that

\begin{aligned} p_{X | Θ} (x | θ) & = (\binom{N}{x}) θ^{x} (1 - θ)^{(N - x)}, θ \in [0, 1] \\ p_{X} (x) & = \int_{Θ} p_{X | Θ} (x | θ) f_{Θ} (θ) d θ, θ \in [0, 1] \\ = \int_{Θ} (\binom{N}{x}) θ^{x} (1 - θ)^{(N - x)} f_{Θ} (θ) d θ \\ = \int_{Θ} (\binom{N}{x}) θ^{x} (1 - θ)^{(N - x)} d θ \\ = \frac{1}{N + 1} \end{aligned}

We get

\begin{aligned} T_{C M} (x) & = \frac{1}{p_{X} (x)} \int_{0}^{1} θ p_{X | Θ} (x | θ) d θ \\ = \frac{1}{p_{X} (x)} \frac{x + 1}{N + 2} \frac{1}{N + 1} \\ = \frac{x + 1}{N + 2} \\ = \frac{N}{N + 2} T_{M L} (x) + \frac{1}{N + 2} \end{aligned}

截屏2023-05-17 22.37.06

5.3 Mean Estimation Example

$θ$ of Gaussian Distributed random variable

\begin{matrix} X \sim N (θ, σ_{X | Θ = θ}^{2}) \end{matrix}

$\Theta\sim\mathcal{N}(m,\sigma^2_{\Theta})$ $N$ $X_1,\dots,X_N$ taken from the joint conditional likelihood of iid observations

\begin{aligned} Conditional Distribution: \\ X = θ + N \\ μ_{X | θ} = θ, σ_{X | θ}^{2} = σ_{N}^{2} \\ Unconditional Distribution: \\ X = Θ + N \\ μ_{X} = E [Θ + N] = E [Θ] = m \\ σ_{X}^{2} \overset{Θ, N uncorr}{=} V a r [Θ] + V a r [N] = σ_{Θ}^{2} + σ_{N}^{2} \\ C o v [Θ, X] = C o v [Θ, Θ + N] = C o v [Θ, Θ] + C o v [Θ, N] = V a r [Θ] = σ_{Θ}^{2} \end{aligned}

Conditional mean estimator (two steps needed):

Computing conditional PDF

\begin{aligned} f_{Θ | X_{1}, \dots, X_{N}} (θ | x_{1}, \dots, x_{N}) & = \frac{f_{X_{1}, \dots, X_{N} | Θ} (x_{1}, \dots, x_{N} | θ) f_{Θ} (θ)}{f_{X_{1}, \dots, X_{N}} (x_{1}, \dots, x_{N})} \\ = γ \exp (- \frac{1}{2 σ_{X | Θ = θ}^{2}} \sum_{i = 1}^{N} (x_{i} - θ)^{2}) \exp (- \frac{1}{2 σ_{Θ}^{2}} (θ - m)^{2}) \\ \propto \exp (- \frac{1}{2 σ_{X | Θ = θ}^{2} σ_{Θ}^{2}} (\sum_{i = 1}^{N} (x_{i} - θ)^{2} σ_{Θ}^{2} + (θ - m)^{2} σ_{X | Θ = θ}^{2})) \\ \propto \exp (- \frac{1}{2 σ_{X | Θ = θ}^{2} σ_{Θ}^{2}} (\sum_{i = 1}^{N} (x_{i}^{2} σ_{Θ}^{2} - 2 x_{i} θ σ_{Θ}^{2} + θ^{2} σ_{Θ}^{2}) + (θ^{2} - 2 m θ + m^{2}) σ_{X | Θ = θ}^{2})) \\ \propto \exp (- \frac{1}{2 σ_{X | Θ = θ}^{2} σ_{Θ}^{2}} (\sum_{i = 1}^{N} (- 2 x_{i} θ σ_{Θ}^{2}) + N θ^{2} σ_{Θ}^{2} + θ^{2} σ_{X | Θ = θ}^{2} - 2 m θ σ_{X | Θ = θ}^{2})) \\ \propto \exp (- \frac{1}{2 σ_{X | Θ = θ}^{2} σ_{Θ}^{2}} (θ^{2} (N σ_{Θ}^{2} + σ_{X | Θ = θ}^{2}) - 2 θ (σ_{Θ}^{2} \sum_{i = 1}^{N} x_{i} + m σ_{X | Θ = θ}^{2}))) \\ \propto \exp (- \frac{(N σ_{Θ}^{2} + σ_{X | Θ = θ}^{2})}{2 σ_{X | Θ = θ}^{2} σ_{Θ}^{2}} (θ - \frac{σ_{Θ}^{2} \sum_{i = 1}^{N} x_{i} + m σ_{X | Θ = θ}^{2}}{(N σ_{Θ}^{2} + σ_{X | Θ = θ}^{2})})^{2}) \\ directly read out: \\ \Rightarrow μ_{Θ | X_{1}, \dots, X_{N}} & = \frac{σ_{Θ}^{2} \sum_{i = 1}^{N} x_{i} + m σ_{X | Θ = θ}^{2}}{(N σ_{Θ}^{2} + σ_{X | Θ = θ}^{2})} (5.34) \\ σ_{Θ | X_{1}, \dots, X_{N}}^{2} & = \frac{σ_{X | Θ = θ}^{2} σ_{Θ}^{2}}{N σ_{Θ}^{2} + σ_{X | Θ = θ}^{2}} \end{aligned}

Computing conditional mean

\begin{aligned} {\hat{θ}}_{C M} & = E [Θ | x_{1}, \dots, x_{N}] \\ = \frac{σ_{Θ}^{2}}{σ_{Θ}^{2} + σ_{X | Θ = θ}^{2} / N} {\hat{θ}}_{M L} + \frac{σ_{X | Θ = θ}^{2} / N}{σ_{Θ}^{2} + σ_{X | Θ = θ}^{2} / N} m \\ with {\hat{θ}}_{M L} & = \frac{1}{N} \sum_{i = 1}^{N} x_{i} \end{aligned}

Discussion

$N$ $\sigma^2_{X|\Theta=\theta}$ $\sigma^2_{\Theta}$ , it is recommended to rely on the ML estimator.
$\sigma^2_{\Theta}$ $\sigma^2_{X|\Theta=\theta}$ $\Theta$ .

Minimum Mean Square Error

$T_{CM}$ , and is given by

\begin{aligned} E_{Θ} [E_{X} [(T_{C M} - Θ)^{2} | Θ]] & = E_{Θ} [E_{X} [(T_{C M} - Θ)^{2} | X]] \\ = E_{Θ} [E_{X} [(E_{Θ} [Θ | X] - Θ)^{2} | X]] \\ = E_{Θ} [E_{X} [(E_{Θ}^{2} [Θ | X] - 2 E_{Θ} [Θ | X] Θ + Θ^{2}) | X]] \\ = E_{Θ} [E_{Θ, X}^{2} [Θ | X] - 2 E_{Θ, X} [Θ | X] E_{Θ, X} [Θ | X] + E_{Θ, X} [Θ^{2} | X]] \\ = E_{Θ} [E_{Θ, X} [Θ^{2} | X] - E_{Θ, X}^{2} [Θ | X]] \\ = E_{Θ} [V a r [Θ | X]] \\ = \frac{σ_{X | Θ = θ}^{2} σ_{Θ}^{2}}{N σ_{Θ}^{2} + σ_{X | Θ = θ}^{2}} \end{aligned}

$(X,\theta)^\mathsf{T}$

\begin{aligned} E [[X Θ]^{T}] & = [μ_{X} μ_{Θ}]^{T} = [m m]^{T} \\ V a r [[X Θ]^{T}] & = [\begin{array}{c} c_{X, X} & c_{X, Θ} \\ c_{Θ, X} & c_{Θ, Θ} \end{array}] \\ c_{X, X} & = V a r [X] = V a r [Θ + N] \\ = V a r [Θ] + V a r [N] = σ_{Θ}^{2} + σ_{X | Θ = θ}^{2} \\ c_{Θ, X} & = E_{Θ, N} [(Θ - μ_{Θ}) (Θ + N - μ_{Θ})] \\ = V a r [Θ] = σ_{Θ}^{2} \end{aligned}

5.4 Jointly Gaussian Random Variables – Multivariate Case

Given random vectors and the covariance matrix from the joint distribution:

\begin{aligned} X & = Θ + N \\ X & \sim N (μ_{X}, C_{X}) \\ Θ & \sim N (μ_{Θ}, C_{Θ}) \\ C_{Z} & \sim [\begin{array}{c} C_{X} & C_{X, Θ} \\ C_{Θ, X} & C_{Θ} \end{array}] \end{aligned}

the multivariate conditional mean estimator is obtained as:

\begin{aligned} T_{C M} : x \mapsto E [Θ | X = x] & = μ_{Θ} + C_{Θ, X} C_{X}^{- 1} (x - μ_{x}) (5.48) \end{aligned}

$\textbf{C}_{\bold\Theta|\textbf{X}}$ :

\begin{matrix} E [| | T_{C M} - Θ | |^{2}] = t r [C_{Θ | X = x}] = t r [C_{Θ} - C_{Θ, X} C_{X}^{- 1} C_{X, Θ}] \end{matrix}

$θ$ $X$ are substituted by the respective vectors. The derivation of the estimator is however not as straightforward than for the scalar case due to the different rules for operations of vectors and matrices instead of scalars.
$75$ $N=1$ $1/σ^2_X$ $C^{−1}_X$ , you certainly see the strong similarity to the Eq. (5.48).
$75$ $N=1$ , we obtain
$\begin{matrix} \begin{matrix} T_{C M} = \frac{σ_{Θ}^{2} \sum_{i = 1}^{N} x_{i} + m σ_{X | Θ = θ}^{2}}{(N σ_{Θ}^{2} + σ_{X | Θ = θ}^{2})} = \frac{σ_{Θ}^{2} x + m σ_{X | Θ = θ}^{2}}{σ_{Θ}^{2} + σ_{X | Θ = θ}^{2}} \\ = m + \frac{σ_{Θ}^{2}}{σ_{Θ}^{2} + σ_{X | Θ = θ}^{2}} (x - m) \end{matrix} \end{matrix}$
Jointly Gaussian Random Variables $X$ $Y$ $E [Y |X ]$ $X$ $=\rho X$ $X$ is also Gaussian independent of the correlation coefficient. This does not hold for arbitrarily jointly distributed random variables!

Example case:

$N_i$ $\mathcal{N}(0,\sigma_N^2)$ $\theta$ $X_i$ . We have

\begin{array}{r} X = [\begin{array}{c} X_{1} \\ ⋮ \\ X_{N} \end{array}] \sim N (μ_{X}, C_{X}), Θ \sim N (m, σ_{Θ}^{2}) \end{array}

where

\begin{aligned} E [X] & = [\begin{array}{c} m \\ ⋮ \\ m \end{array}] = m 1 \\ C o v [Θ, X] & = E [(Θ - m) (X - m 1)^{T}] \\ = E [Θ X^{T}] - m E [X^{T}] \\ = E [Θ (Θ 1 + N)] - E [Θ]^{2} 1 \\ = E [Θ^{2}] 1 - E [Θ]^{2} 1 \\ = σ_{Θ}^{2} 1 \\ C o v [X] & = E [(Θ 1 + N - E [Θ] 1) (Θ 1 + N - E [Θ] 1)^{T}] \\ = σ_{Θ}^{2} 1 1^{T} + σ_{N}^{2} I \end{aligned}

Apply (5.48), we have the CM estimator

\begin{aligned} E [Θ | X = x] & = μ_{Θ} + C_{Θ, X} C_{X}^{- 1} (x - μ_{x}) \\ = m + σ_{Θ}^{2} 1 (σ_{Θ}^{2} 1 1^{T} + σ_{N}^{2} I)^{- 1} (x - m 1) \\ = \frac{\sum_{i = 1}^{N} x_{i} / σ_{N}^{2} + m / σ_{Θ}^{2}}{N / σ_{N}^{2} + 1 / σ_{Θ}^{2}} \end{aligned}

5.5 Orthogonality Principle

The stochastic orthogonality is an inherent property of the conditional mean estimator, it describes the inherent stochastic orthogonality between the CM estimator error and any observations statistics thereof:

\begin{matrix} \begin{matrix} E [(T_{C M} (X_{1}, \dots, X_{N}) - Θ) h (X_{1}, \dots, X_{N})] = 0 \\ \Leftrightarrow \\ T_{C M} (X_{1}, \dots, X_{N}) - Θ ⊥ h (X_{1}, \dots, X_{N}) \end{matrix} \end{matrix}

截屏2023-06-17 22.57.10 $\mathcal H$ $h(X)$ $X$ $T_{CM}(X)$ $\varepsilon=T_{CM}-\Theta$ $\mathcal H$ to be minimal.

Mean Estimation Example

Given

\begin{matrix} [\begin{matrix} X \\ Θ \end{matrix}] \sim N ([\begin{matrix} m \\ m \end{matrix}], [\begin{matrix} σ_{Θ}^{2} + σ_{N}^{2} & σ_{Θ}^{2} \\ σ_{Θ}^{2} & σ_{Θ}^{2} + σ_{N}^{2} \end{matrix}]) \end{matrix}

$h_1,\dots,h_N$

\begin{matrix} h_{i} = X_{i} - μ_{X} \end{matrix}

$\boldsymbol X$ $T_{CM}$ by a linear model

\begin{matrix} T_{C M} (X) = a^{T} X + b \end{matrix}

Apply the orthogonal principle,

\begin{matrix} E [(T_{C M} - Θ) h (X_{1}, \dots, X_{N})] = 0^{T} \end{matrix}

we obtain

\begin{aligned} E [(a^{T} X + b - Θ) (X - μ_{X} 1)^{T}] & = E [(a^{T} (X - μ_{X} 1) + b - (Θ - μ_{Θ}) + μ_{X} a^{T} 1 - μ_{Θ}) (X - μ_{X} 1)^{T}] \\ = a^{T} C_{x} - C_{Θ, X} \\ \Rightarrow a^{T} & = C_{Θ, X} C_{x}^{- 1} \end{aligned}

$b$ $h_{N+1}=1$ :

\begin{matrix} b = - μ_{X} a^{T} 1 + μ_{Θ} \end{matrix}

Finally we get

\begin{matrix} T_{C M} (X) = C_{Θ, X} C_{x}^{- 1} (X - μ_{X} 1) + μ_{Θ} \end{matrix}

$h=\hat\theta_{ML}$

$h=S=\hat\theta_{ML}=\sum_{i=1}^NX_i/N$ $\boldsymbol X$ $T_{CM}$ by a linear model

\begin{matrix} T_{C M} (X) = a S + b \end{matrix}

and apply orthogonality principle:

\begin{matrix} \begin{matrix} E [(a S + b - Θ) S] = 0 \\ \Rightarrow a = \frac{c_{Θ, S}}{σ_{S}^{2}}, b = μ_{Θ} - a μ_{S} \\ \Rightarrow {\hat{θ}}_{C M} = \frac{\sum_{i = 1}^{N} x_{i} / σ_{N}^{2} + m / σ_{Θ}^{2}}{N / σ_{N}^{2} + 1 / σ_{Θ}^{2}} \end{matrix} \end{matrix}

IV. Linear Estimation

6. Linear Estimation

We focus on linear models (linear regression)

\begin{matrix} \begin{matrix} T_{L i n} : x \to \hat{y} = x^{T} t + m \\ y = [\begin{matrix} y_{1} \\ ⋮ \\ y_{N} \end{matrix}] X = [\begin{matrix} x_{1}^{T} \\ ⋮ \\ x_{N}^{T} \end{matrix}] \end{matrix} \end{matrix}

6.1 Least Square Estimation

\begin{matrix} min_{t} {\sum_{i = 1}^{N} (y_{i} - x_{i}^{T} t)^{2}} \Leftrightarrow min_{x} | | y - X t | |_{2}^{2} \end{matrix}

The standard approach for this is based on convexity of the objective function,

\begin{matrix} \begin{matrix} \frac{\partial}{\partial t} | | y - X t | |_{2}^{2} = 2 X^{T} X t - 2 X^{T} y = 0 \\ \Rightarrow t_{L S} = (X^{T} X)^{- 1} X^{T} t \end{matrix} \end{matrix}

Geometrical Perspective

截屏2023-06-17 23.24.38

\begin{matrix} \begin{matrix} min_{t} | | y - X t | |_{2}^{2} \\ \Rightarrow y - X t ⊥ range [X] \\ \Rightarrow y - X t \in null [X^{T}] \\ \Rightarrow X^{T} (y - X t) = 0 \\ \Rightarrow t_{L S} = (X^{T} X)^{- 1} X^{T} t \end{matrix} \end{matrix}

2nd Geometrical Perspective

$\bold X(\bold X^\mathsf{T}\bold X)^{-1}\bold X^\mathsf{T}$ $\bold y$ $\text{range}[\bold X]$ :

\begin{matrix} \hat{y} = X (X^{T} X)^{- 1} X^{T} y \end{matrix}

Mean Estimation Example

$\mu_X$ $f_X(x)$ $N$ observations by means of MSE, we introduce the linear model

\begin{matrix} T_{L i n} : x_{i} \mapsto t = {\hat{μ}}_{X} \end{matrix}

and the LS optimization problem

\begin{matrix} \begin{matrix} min_{t} {\sum_{i = 1}^{N} (x_{i} - t)^{2}} \Leftrightarrow min_{t} | | x - 1 t | |_{2}^{2} \\ {\hat{μ}}_{X} = t_{L S} = (1^{T} 1)^{- 1} 1^{T} x = \frac{1}{N} \sum_{i = 1}^{N} x_{i} \end{matrix} \end{matrix}

Affine Linear Regression

Given traning set, a linear estimator is defined as

\begin{matrix} \begin{matrix} T_{L i n} : x \mapsto \hat{y} = x t \\ \Rightarrow t_{L S} = \frac{x^{T} y}{x^{T} x} \end{matrix} \end{matrix}

Another linear estimator is

\begin{matrix} \begin{matrix} T_{L i n} : x \mapsto \hat{y} = x t_{1} + t_{2} = [x 1] [\begin{matrix} t_{1} \\ t_{2} \end{matrix}] = x^{T} t \\ \Rightarrow t_{L S} = (X^{T} X)^{- 1} X^{T} y \end{matrix} \end{matrix}

SVD Perspective

\begin{aligned} X & = [U, U^{⊥}] [\begin{array}{c} Σ \\ 0 \end{array}] V^{T} \\ | | y & - X t | |_{2}^{2} = | | [\begin{array}{c} U^{T} \\ U^{⊥, T} \end{array}] y - [\begin{array}{c} Σ \\ 0 \end{array}] V^{T} t | |_{2}^{2} \\ \Rightarrow & U^{T} y = Σ V^{T} t_{L S} \\ \Rightarrow & t_{L S} = V Σ^{- 1} U^{T} y \\ = \sum_{i = 1}^{d} \frac{1}{σ_{X, i}} (u_{i}^{T} y) v_{i} \end{aligned}

6.2 Least Equares Estimation with Regularization (Ridge Regression)

In many real-world applications, LS problems may be ill-conditioned. In such cases, regularization techniques can provide an alternative handling of the problem.

Tikhonov/Ridge regression:

\begin{matrix} min_{t} {| | X t - y | |^{2} + γ | | Φ t | |^{2}} \end{matrix}

$\bold \Phi$ is the approproately chosen regularization operator. It can be shown that the solution is

\begin{matrix} (X^{T} X + γ Φ^{T} Φ) \hat{t_{γ}} = X^{T} y \end{matrix}

Furthermore,

\begin{aligned} X & = U Σ V^{T} = \sum_{i = 1}^{d} σ_{X, i} u_{i} v_{i}^{T} \\ X^{T} X & = V Σ U^{T} U Σ V^{T} = V Σ^{2} V^{T} = \sum_{i = 1}^{d} σ_{X, i}^{2} v_{i} v_{i}^{T} \\ Φ^{T} Φ & = \sum_{i = 1}^{d} σ_{Φ, i}^{2} v_{i} v_{i}^{T} \\ \Rightarrow (\sum_{i = 1}^{d} σ_{X, i}^{2} v_{i} v_{i}^{T} & + γ \sum_{i = 1}^{d} σ_{Φ, i}^{2} v_{i} v_{i}^{T}) {\hat{t}}_{γ} = \sum_{i = 1}^{d} σ_{X, i} v_{i} u_{i}^{T} y \\ \Rightarrow {\hat{t}}_{γ} = \sum_{i = 1}^{d} F (γ, & σ_{X, i}, σ_{Φ, i}) \cdot \frac{1}{σ_{X, i}} (u_{i}^{T} y) v_{i} \end{aligned}

6.3 Linear Minimum Mean Square Error Estimation

Based on a linear model for the estimator, the LMMSE minimizes the optimization problem

\begin{matrix} \begin{matrix} T_{L i n} : x \to \hat{y} = x^{T} t + m \\ min_{t, m} E [| | y - t^{T} x - m | |^{2}] \end{matrix} \end{matrix}

$\bold z=(\bold x,y)^\mathsf{T}$

\begin{matrix} \begin{matrix} μ_{z} = [\begin{matrix} μ_{x} \\ μ_{y} \end{matrix}] C_{z} = [\begin{matrix} C_{x} & c_{x, y} \\ c_{y, x} & c_{y} \end{matrix}] \end{matrix} \end{matrix}

we can get the LMMSE estimator by

\begin{aligned} C_{x} & = E [(x - μ_{x}) (x - μ_{x})^{T}] \\ = E [x x^{T}] - E [x μ_{x}^{T}] - E [μ_{x} x^{T}] + μ_{x} μ_{x}^{T} \\ = E [x x^{T}] - μ_{x} μ_{x}^{T} \\ c_{y, x} & = E [(x - μ_{x})^{T} (y - μ_{y})] \\ = E [x^{T} y] - μ_{y} E [x^{T}] - μ_{x}^{T} E [y] + μ_{x}^{T} μ_{y} \\ = E [x^{T} y] - μ_{x}^{T} μ_{y} \\ E [(y - t^{T} x - m) x^{T}] & = E [y x^{T}] - t^{T} E [x x^{T}] - m E [x^{T}] \\ = c_{y, x} + μ_{x}^{T} μ_{y} - t^{T} (C_{x} + μ_{x} μ_{x}^{T}) - m μ_{x}^{T} \\ \overset{!}{=} 0 \\ \Rightarrow t^{T} & = c_{y, x} C_{x}^{- 1} \\ \hat{y} & = c_{y, x} C_{x}^{- 1} x + μ_{y} - c_{y, x} C_{x}^{- 1} μ_{x} \\ = c_{y, x} C_{x}^{- 1} (x - μ_{x}) + μ_{y} \end{aligned}

! apply orthogonality principle

$\hat y=c_{y,\bold x}\bold C_\bold x^{-1}\bold x$ . The minimum MSE is

\begin{matrix} E [| y - \hat{y} |^{2}] = c_{y} - t^{T} c_{x, y} = c_{y} - c_{y, x} C_{x}^{- 1} c_{x, y} \end{matrix}

$\bold x$ $y$ are jointly Gaussian distributed, the LMMSE is obviously identically with the CME.

V. Examples

7. Estimator of a Matrix Channel

time-inveriant non-dispersive $y=hs+n$ $K$ $M$ $KM$ transmission channels are estimated.

And we compare three linear estimators

\begin{matrix} \hat{h} = T y \in C^{K M} \end{matrix}

MMSE, mean square error estimator
ML, maximum likelihood estimator
MF, matched filter estimator

7.1 Channel Model

The task

is to find good estimators of the channel coefficients

\begin{matrix} h_{m, k} with m = 1, \dots, M and k = 1, \dots, K \end{matrix}

$h_{m,k}$ $k$ $m$ th receiver.

The traning signals

$N$ $\bold s_n\in\C^K,n=1,\dots,K$ .

The estimation

$h_{m,k}$ is based on the received signal vectors

\begin{matrix} y_{n} = H s_{n} + n_{n}, n = 1, \dots, N \end{matrix}

$\bold H$ $\bold y_n$ $\bold n_n$ $n$ th traning vector.

The model

for the training channel is

\begin{matrix} [y_{1}, \dots, y_{N}] = H [s_{1}, \dots, s_{N}] + [n_{1}, \dots, n_{N}] \end{matrix}

i.e.,

\begin{matrix} Y = H S + N \end{matrix}

By stacking the column vectors of the matrices, we obtain

\begin{matrix} y \overset{!}{=} (S^{T} \otimes I_{M}) h + n \end{matrix}

! here we use

\begin{matrix} A X B = C \Leftrightarrow (B^{T} \otimes A) vec (X) = vec (C) \end{matrix}

Further Assumptions

We further assume that the stacked vector are Gaussian distributed

\begin{matrix} \begin{matrix} h \sim N (0, C_{h}) \\ n \sim N (0, C_{n}) \end{matrix} \end{matrix}

where

\begin{matrix} C_{n} = σ_{n}^{2} I_{N M} \end{matrix}

$\bold h$ $\bold n$ are assumed to be stochastically independent. and thus uncorrelated.

\begin{matrix} Cov [h, n^{H}] = 0 \end{matrix}

$S$

$\bold S^\mathsf{T}\otimes\bold I_M$ .

$\bold y=\bold S\bold h+\bold n$ $\bold z=[\bold y^\mathsf{T},\bold h^\mathsf{T}]^\mathsf{T}$ are jointly Gaussian distributed. The covariance is equal to

\begin{matrix} \begin{matrix} C_{z} = [\begin{matrix} C_{y} & C_{y, h} \\ C_{h, y} & C_{h} \end{matrix}] \end{matrix} \end{matrix}

with

\begin{aligned} C_{y} = S C_{h} S^{H} + C_{n} \\ C_{y, h} = S C_{h} \\ C_{h, y} = C_{h} S^{H} \end{aligned}

$\bold C_{\bold {y}}$ $\bold C_{\bold {y,h}}$ .

I. Parameter Estimation

1 Statistical Modeling and Quality Criteria

1.1 Introductory Example: Estimating upper bound θ\theta

Solution 1: Use E[X]=θ/2E[X]=\theta/2

Solution 2: Intuition θ^=maxi=1,…,N{Xi}\hat{\theta}=\max_{i=1,\dots,N}\{X_i\}

1.2 Consistency and Unbiasedness

Consistent estimator TT: limN→∞T(x1,…,xN)→θ\lim_{N\rightarrow\infin}T(x_1,\dots,x_N)\rightarrow \theta

Unbiased estimator TT: E[T(X1,…,XN)]=θE[T(X_1,\dots,X_N)]=\theta

1.3 Variance: Var[T]=E[(T−E[T])2]\text{Var}[T]=E[(T-E[T])^2]

1.4 Mean Squared Error(MSE): E[ε2(T)]=E[(T−θ)2]E[\varepsilon^2(T)]=E[(T-\theta)^2]

1.5 Bias/Variance Trade-Off

2 Maximum Likelihood Estimation

2.1 Maximum Likelihood Principle

2.2 Parameter Estimation

Channel Estimation

Introductory Example: Estimating upper bound θ\theta

NN Bernoulli Experiments

2.3 Best Unbiased Estimator

3. Fisher's Information Inequality

3.1 Cramer-Rao Lower Bound

Properties of the Fisher Information:

3.2 Exponential Models

Mean Estimation Example

3.3 Asymptotically Efficient Estimators

II. Examples

4. ML Principle for Direction of Arrival Estimation

4.1 Signal Model and ML Estimation

4.2 Cramer-Rao Bound for DOA Estimation

III. Estimation of Random Variables

5. Bayesian Estimation

5.1 Conditional Mean Estimator/Bayes Estimator – Minimizing the MSE

Conditional mean estimator TCMT_{CM}

5.2 Binomial Experiment

5.3 Mean Estimation Example

Conditional mean estimator (two steps needed):

Discussion

Minimum Mean Square Error

5.4 Jointly Gaussian Random Variables – Multivariate Case

Example case:

5.5 Orthogonality Principle

Mean Estimation Example

IV. Linear Estimation

6. Linear Estimation

6.1 Least Square Estimation

Geometrical Perspective

2nd Geometrical Perspective

Mean Estimation Example

Affine Linear Regression

SVD Perspective

6.2 Least Equares Estimation with Regularization (Ridge Regression)

6.3 Linear Minimum Mean Square Error Estimation

V. Examples

7. Estimator of a Matrix Channel

7.1 Channel Model

The task

The traning signals

The estimation

The model

Further Assumptions

Matrix SS

7.2 Three Linear Estimator

1. MMSE (minimum mean square error estimation)

留下评论 取消回复

$\theta$

$E[X]=\theta/2$

$\hat{\theta}=\max_{i=1,\dots,N}\{X_i\}$

$T$ $\lim_{N\rightarrow\infin}T(x_1,\dots,x_N)\rightarrow \theta$

$T$ $E[T(X_1,\dots,X_N)]=\theta$

$\text{Var}[T]=E[(T-E[T])^2]$

$E[\varepsilon^2(T)]=E[(T-\theta)^2]$

$\theta$

$N$ Bernoulli Experiments

$T_{CM}$

$S$

留下评论取消回复