Maximum Likelihood Estimator
We have a background estimate \(\vec{x}_b\) of the \(N\)
fluxes drawn from a multivariate normal distribution such that
\[\epsilon_b := \vec{x}_b - \vec{x}_t \sim \mathcal{N}(0, B)\]
We also have \(M\) measurements \(\vec{y}\) drawn from an
independent multivariate normal distribution such that
\[\epsilon_o := \vec{y} - H \vec{x}_t \sim \mathcal{N}(0, R)\]
We seek the estimator \(\vec{x}_a\) of \(\vec{x}_t\) that
maximizes the likelihood of both events occurring. The likelihood of
\(\vec{x}_a\) given \(\vec{x}_b\) and \(\vec{y}\) is the
joint probability of \(\vec{x}_b\) and \(\vec{y}\) given
\(\vec{x}_t = \vec{x}_a\):
\[\begin{split}L(\vec{x}_a; \vec{x}_b, \vec{y}) &= P(\vec{x}_b, \vec{y} | \vec{x}_a) \\
&= (2 \pi)^{\frac{N}{2}} \det(B)^{-\frac{1}{2}} exp[-\frac{1}{2} (\vec{x}_b - \vec{x}_a)^T B^{-1} (\vec{x}_b - \vec{x}_a)] \\
&\quad\times (2 \pi)^{\frac{M}{2}} \det(R)^{-\frac{1}{2}} exp[-\frac{1}{2} (\vec{y} - H \vec{x}_a)^T R^{-1} (\vec{y} - H \vec{x}_a)] \\
&= (2 \pi)^{\frac{N + M}{2}} \det(B)^{-\frac{1}{2}} \det(R)^{-\frac{1}{2}} \\
&\quad\times exp\{-\frac{1}{2} [(\vec{x}_b - \vec{x}_a)^T B^{-1} (\vec{x}_b - \vec{x}_a) + (\vec{y} - H \vec{x}_a)^T R^{-1} (\vec{y} - H \vec{x}_a)]\}\end{split}\]
Since the logarithm is an increasing function, maximizing the
logarithm of a function is equivalent to maximizing the original
function. It is therefore convenient to work with the log-likelihood
\(\ell(\vec{x}_a; \vec{x}_b, \vec{y})\)
\[\begin{split}\ell(\vec{x}_a; \vec{x}_b, \vec{y})
&= \frac{N + M}{2} \ln (2 \pi) - \frac{1}{2} \ln[\det(B)] - \frac{1}{2} \ln[\det(R)] \\
&\quad - \frac{1}{2} [(\vec{x}_b - \vec{x}_a)^T B^{-1} (\vec{x}_b - \vec{x}_a) + (\vec{y} - H \vec{x}_a)^T R^{-1} (\vec{y} - H \vec{x}_a)]\end{split}\]
To maximize the log-likelihood, and therefore the likelihood, we need
to find the derivative of the log-likelihood with respect to
\(\vec{x}_a\) and set it equal to zero.
\[\begin{split}0 &= \frac{d}{d \vec{x}_a} \ell(\vec{x}_a; \vec{x}_b, \vec{y}) \\
&= -\frac{1}{2} \frac{d}{d \vec{x}_a} [(\vec{x}_b - \vec{x}_a)^T B^{-1} (\vec{x}_b - \vec{x}_a) + (\vec{y} - H \vec{x}_a)^T R^{-1} (\vec{y} - H \vec{x}_a)] \\
&= -\frac{1}{2} [-2 B^{-1} (\vec{x}_b - \vec{x}_a) - 2 H^T R^{-1} (\vec{y} - H \vec{x}_a)] \\
&= B^{-1} (\vec{x}_b - \vec{x}_a) + H^T R^{-1} (\vec{y} - H \vec{x}_a) \\
&= B^{-1} \vec{x}_b + H^T R^{-1} \vec{y} - (B^{-1} + H^T R^{-1} H) \vec{x}_a \\
(B^{-1} + H^T R^{-1} H) \vec{x}_a &= B^{-1} \vec{x}_b + H^T R^{-1} \vec{y} \\
\vec{x}_a &= (B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} \vec{x}_b + H^T R^{-1} \vec{y})\end{split}\]
This is the form of the estimator I use for the uncertainty estimate,
but it can also be rearranged to make it look more like that derived
as the best linear unbiased estimator.
\[\begin{split}\vec{x}_a &= (B^{-1} + H^T R^{-1} H)^{-1}
(B^{-1} \vec{x}_b + H^T R^{-1} H \vec{x}_b
- H^T R^{-1} H \vec{x}_b + H^T R^{-1} \vec{y}) \\
&= (B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} + H^T R^{-1} H) \vec{x}_b +
(B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} (\vec{y} - H \vec{x}_b) \\
&= \vec{x}_b +
(B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} (\vec{y} - H \vec{x}_b)\end{split}\]
The covariance matrix of this estimator is then:
\[\begin{split}\DeclareMathOperator{\Cov}{Cov}
\Cov[\vec{x}_a, \vec{x}_a]
&= \Cov[(B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} \vec{x}_b + H^T R^{-1} \vec{y}), \\
&\qquad\qquad (B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} \vec{x}_b + H^T R^{-1} \vec{y})]\end{split}\]
We the exploit the bilinearity of the covariance operator to get
\[\begin{split}\Cov[\vec{x}_a, \vec{x}_a]
&= \Cov[(B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \vec{x}_b, (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \vec{x}_b] \\
&\quad + \Cov[(B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \vec{x}_b, (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \vec{y}] \\
&\quad + \Cov[(B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \vec{y}, (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \vec{x}_b] \\
&\quad + \Cov[(B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \vec{y}, (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \vec{y}]\end{split}\]
At this point, using the property that \(\Cov[A X, B Y] = A \Cov[X,
Y] B^T\) allows us to simplify this to
\[\begin{split}\Cov[\vec{x}_a, \vec{x}_a]
&= (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \Cov[\vec{x}_b, \vec{x}_b] B^{-T} (B^{-1} + H^T R^{-1} H)^{-T} \\
&\quad + (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} \Cov[\vec{x}_b, \vec{y}] R^{-T} H (B^{-1} + H^T R^{-1} H)^{-T} \\
&\quad + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \Cov[\vec{y}, \vec{x}_b] B^{-T} (B^{-1} + H^T R^{-1} H)^{-T} \\
&\quad + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} \Cov[\vec{y}, \vec{y}] R^{-T} H (B^{-1} + H^T R^{-1} H)^{-T} \\
&= (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} B B^{-T} (B^{-1} + H^T R^{-1} H)^{-T} + 0 + 0 \\
&\quad + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} R R^{-T} H (B^{-1} + H^T R^{-1} H)^{-T}\end{split}\]
At this point, we can use the symmetry of the covariance matrices
\(B\) and \(R\) to further simplify this, obtaining
\[\begin{split}\Cov[\vec{x}_a, \vec{x}_a]
&= (B^{-1} + H^T R^{-1} H)^{-1} B^{-1} (B^{-1} + H^T R^{-1} H)^{-T} \\
&\quad + (B^{-1} + H^T R^{-1} H)^{-1} H^T R^{-1} H (B^{-1} + H^T R^{-1} H)^{-T} \\
&= (B^{-1} + H^T R^{-1} H)^{-1} (B^{-1} + H^T R^{-1} H) (B^{-1} + H^T R^{-1} H)^{-1} \\
&= (B^{-1} + H^T R^{-1} H)^{-1}\end{split}\]