Maximum Likelihood Estimator

We have a background estimate \(\vec{x}_b\) of the \(N\) fluxes drawn from a multivariate normal distribution such that

\[\epsilon_b := \vec{x}_b - \vec{x}_t \sim \mathcal{N}(0, B)\]

We also have \(M\) measurements \(\vec{y}\) drawn from an independent multivariate normal distribution such that

\[\epsilon_o := \vec{y} - H \vec{x}_t \sim \mathcal{N}(0, R)\]

We seek the estimator \(\vec{x}_a\) of \(\vec{x}_t\) that maximizes the likelihood of both events occurring. The likelihood of \(\vec{x}_a\) given \(\vec{x}_b\) and \(\vec{y}\) is the joint probability of \(\vec{x}_b\) and \(\vec{y}\) given \(\vec{x}_t = \vec{x}_a\):

\[\begin{split}L(\vec{x}_a; \vec{x}_b, \vec{y}) &= P(\vec{x}_b, \vec{y} | \vec{x}_a) \\ &= (2 \pi)^{\frac{N}{2}} \det(B)^{-\frac{1}{2}} exp[-\frac{1}{2} (\vec{x}_b - \vec{x}_a)^T B^{-1} (\vec{x}_b - \vec{x}_a)] \\ &\quad\times (2 \pi)^{\frac{M}{2}} \det(R)^{-\frac{1}{2}} exp[-\frac{1}{2} (\vec{y} - H \vec{x}_a)^T R^{-1} (\vec{y} - H \vec{x}_a)] \\ &= (2 \pi)^{\frac{N + M}{2}} \det(B)^{-\frac{1}{2}} \det(R)^{-\frac{1}{2}} \\ &\quad\times exp\{-\frac{1}{2} [(\vec{x}_b - \vec{x}_a)^T B^{-1} (\vec{x}_b - \vec{x}_a) + (\vec{y} - H \vec{x}_a)^T R^{-1} (\vec{y} - H \vec{x}_a)]\}\end{split}\]

Since the logarithm is an increasing function, maximizing the logarithm of a function is equivalent to maximizing the original function. It is therefore convenient to work with the log-likelihood \(\ell(\vec{x}_a; \vec{x}_b, \vec{y})\)

\[\begin{split}\ell(\vec{x}_a; \vec{x}_b, \vec{y}) &= \frac{N + M}{2} \ln (2 \pi) - \frac{1}{2} \ln[\det(B)] - \frac{1}{2} \ln[\det(R)] \\ &\quad - \frac{1}{2} [(\vec{x}_b - \vec{x}_a)^T B^{-1} (\vec{x}_b - \vec{x}_a) + (\vec{y} - H \vec{x}_a)^T R^{-1} (\vec{y} - H \vec{x}_a)]\end{split}\]

To maximize the log-likelihood, and therefore the likelihood, we need to find the derivative of the log-likelihood with respect to \(\vec{x}_a\) and set it equal to zero.

\[\begin{split}0 &= \frac{d}{d \vec{x}_a} \ell(\vec{x}_a; \vec{x}_b, \vec{y}) \\ &= -\frac{1}{2} \frac{d}{d \vec{x}_a} [(\vec{x}_b - \vec{x}_a)^T B^{-1} (\vec{x}_b - \vec{x}_a) + (\vec{y} - H \vec{x}_a)^T R^{-1} (\vec{y} - H \vec{x}_a)] \\ &= -\frac{1}{2} [-2 B^{-1} (\vec{x}_b - \vec{x}_a) - 2 H^T R^{-1} (\vec{y} - H \vec{x}_a)] \\ &= B^{-1} (\vec{x}_b - \vec{x}_a) + H^T R^{-1} (\vec{y} - H \vec{x}_a) \\ &= B^{-1} \vec{x}_b + H^T R^{-1} \vec{y} - (B^{-1} + H^T R^{-1} H) \vec{x}_a \\ (B^{-1} + H^T R^{-1} H) \vec{x}_a &= B^{-1} \vec{x}_b + H^T R^{-1} \vec{y} \\ \vec{x}_a &= (B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} \vec{x}_b + H^T R^{-1} \vec{y})\end{split}\]

This is the form of the estimator I use for the uncertainty estimate, but it can also be rearranged to make it look more like that derived as the best linear unbiased estimator.

\[\begin{split}\vec{x}_a &= (B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} \vec{x}_b + H^T R^{-1} H \vec{x}_b - H^T R^{-1} H \vec{x}_b + H^T R^{-1} \vec{y}) \\ &= (B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} + H^T R^{-1} H) \vec{x}_b + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} (\vec{y} - H \vec{x}_b) \\ &= \vec{x}_b + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} (\vec{y} - H \vec{x}_b)\end{split}\]

The covariance matrix of this estimator is then:

\[\begin{split}\DeclareMathOperator{\Cov}{Cov} \Cov[\vec{x}_a, \vec{x}_a] &= \Cov[(B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} \vec{x}_b + H^T R^{-1} \vec{y}), \\ &\qquad\qquad (B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} \vec{x}_b + H^T R^{-1} \vec{y})]\end{split}\]

We the exploit the bilinearity of the covariance operator to get

\[\begin{split}\Cov[\vec{x}_a, \vec{x}_a] &= \Cov[(B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \vec{x}_b, (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \vec{x}_b] \\ &\quad + \Cov[(B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \vec{x}_b, (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \vec{y}] \\ &\quad + \Cov[(B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \vec{y}, (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \vec{x}_b] \\ &\quad + \Cov[(B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \vec{y}, (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \vec{y}]\end{split}\]

At this point, using the property that \(\Cov[A X, B Y] = A \Cov[X, Y] B^T\) allows us to simplify this to

\[\begin{split}\Cov[\vec{x}_a, \vec{x}_a] &= (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \Cov[\vec{x}_b, \vec{x}_b] B^{-T} (B^{-1} + H^T R^{-1} H)^{-T} \\ &\quad + (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \Cov[\vec{x}_b, \vec{y}] R^{-T} H (B^{-1} + H^T R^{-1} H)^{-T} \\ &\quad + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \Cov[\vec{y}, \vec{x}_b] B^{-T} (B^{-1} + H^T R^{-1} H)^{-T} \\ &\quad + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \Cov[\vec{y}, \vec{y}] R^{-T} H (B^{-1} + H^T R^{-1} H)^{-T} \\ &= (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} B B^{-T} (B^{-1} + H^T R^{-1} H)^{-T} + 0 + 0 \\ &\quad + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} R R^{-T} H (B^{-1} + H^T R^{-1} H)^{-T}\end{split}\]

At this point, we can use the symmetry of the covariance matrices \(B\) and \(R\) to further simplify this, obtaining

\[\begin{split}\Cov[\vec{x}_a, \vec{x}_a] &= (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} (B^{-1} + H^T R^{-1} H)^{-T} \\ &\quad + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} H (B^{-1} + H^T R^{-1} H)^{-T} \\ &= (B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} + H^T R^{-1} H) (B^{-1} + H^T R^{-1} H)^{-1} \\ &= (B^{-1} + H^T R^{-1} H)^{-1}\end{split}\]