Bayesian Estimator
Our prior information about the surface fluxes \(\vec{x}\) can be
represented by a multivariate normal prior distribution:
\[\vec{x} \sim \mathcal{N}(\vec{x}_b, B)\]
The likelihood of observing a set of atmospheric measurements
\(\vec{y}\) given surface fluxes \(\vec{x}\) is another
multivariate normal:
\[\vec{y} | \vec{x} \sim \mathcal{N}(H \vec{x}, R)\]
We can combine these to obtain the posterior distribution for the
surface fluxes \(\vec{x}\).
Set up Bayes’s rule:
\[\begin{split}P(\vec{x} | \vec{y})
&= \frac{P(\vec{y} | \vec{x}) P(\vec{x})}{\int_{\vec{x}} P(\vec{y} | \vec{x}) d\vec{x}} \\
&\propto P(\vec{y} | \vec{x}) P(\vec{x})\end{split}\]
Substitute in the prior and likelihood from above:
\[\begin{split}P(\vec{x} | \vec{y})
&\propto (2\pi)^{-\frac{M}{2}} \det(R)^{-\frac{1}{2}}
\exp[-\frac{1}{2} (\vec{y} - H \vec{x})^T R^{-1} (\vec{y} - H \vec{x})] \\
&\quad \times (2\pi)^{-\frac{N}{2}} \det(B)^{-\frac{1}{2}}
\exp[-\frac{1}{2} (\vec{x} - \vec{x}_b)^T B^{-1} (\vec{x} - \vec{x}_b)]\end{split}\]
Rearrange:
\[\begin{split}P(\vec{x} | \vec{y})
&\propto (2\pi)^{-\frac{M + N}{2}} \det(R)^{-\frac{1}{2}} \det(B)^{-\frac{1}{2}} \\
&\quad \times \exp[-\frac{1}{2} (\vec{y} - H \vec{x})^T R^{-1} (\vec{y} - H \vec{x}_b)
- \frac{1}{2} (\vec{x} - \vec{x}_b)^T B^{-1} (\vec{x} - \vec{x}_b)]]\end{split}\]
Drop the constant factors out front and expand the quadratic forms:
\[\begin{split}P(\vec{x} | \vec{y})
&\propto \exp[-\frac{1}{2} (
\vec{y}^T R^{-1} \vec{y} - \vec{x}^T H^T R^{-1} \vec{y}
- \vec{y}^T R^{-1} H \vec{x} + \vec{x}^T H^T R^{-1} H \vec{x} \\
&\qquad\qquad\qquad + \vec{x}^T B^{-1} \vec{x} - \vec{x}^T B^{-1} \vec{x_b}
- \vec{x}_b^T B^{-1} \vec{x} + \vec{x}_b^T B^{-1} \vec{x}_b)]\end{split}\]
Then rearrange the terms in the exponential until they look familiar.
\[\begin{split}P(\vec{x} | \vec{y})
&\propto \exp\{-\frac{1}{2} [
\vec{x}^T (B^{-1} + H R^{-1} H^T) \vec{x} + \vec{y}^T R^{-1} \vec{y}
- \vec{x}^T H^T R^{-1} \vec{y} - \vec{y}^T R^{-1} H \vec{x} \\
&\qquad\qquad\qquad - \vec{x}^T B^{-1} \vec{x}_b
- \vec{x}_b^T B^{-1} \vec{x} + \vec{x}_b^T B^{-1} \vec{x}_b]\}\end{split}\]
At this point, it is apparent the posterior is a multivariate Gaussian
with covariance \(A = (B^{-1} + H R^{-1} H^T)^{-1}\), but the mean
is not yet evident . Since the mean of a multivariate Gaussian is
also the mode, the location of the maximum of the posterior
probability is the posterior mean. The exponential function is
monotonic increasing and the negation is monotonic decreasing, so this
is equivalent to finding the minimum of
\[\begin{split}J(x) &= \frac{1}{2} [
\vec{x}^T (B^{-1} + H R^{-1} H^T) \vec{x} + \vec{y}^T R^{-1} \vec{y}
- \vec{x}^T H^T R^{-1} \vec{y} - \vec{y}^T R^{-1} H \vec{x} \\
&\qquad\qquad - \vec{x}^T B^{-1} \vec{x}_b
- \vec{x}_b^T B^{-1} \vec{x} + \vec{x}_b^T B^{-1} \vec{x}_b]\end{split}\]
The derivative of this function with respect to \(\vec{x}\) is
\[\frac{dJ}{d \vec{x} } = (B^{-1} + H R^{-1} H^T) \vec{x} - H^T R^{-1} \vec{y} - B^{-1} \vec{x}_b\]
Setting this equal to zero to find the maximum of the posterior
probability density \(\vec{x}_a\) and solving for
\(\vec{x}_a\) gives
\[\begin{split}0 = (B^{-1} + H R^{-1} H^T) \vec{x}_a - H^T R^{-1} \vec{y} - B^{-1} \vec{x}_b \\
(B^{-1} + H R^{-1} H^T) \vec{x}_a = H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b \\
\vec{x}_a = (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)\end{split}\]
This can be rearranged to look more like the answers from other methods:
\[\begin{split}\vec{x}_a &= (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} \vec{y} - H^T R^{-1} H \vec{x}_b
+ H^T R^{-1} H \vec{x}_b + B^{-1} \vec{x}_b) \\
&= (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} \vec{y} - H^T R^{-1} H \vec{x}_b)
+ (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} H \vec{x}_b + B^{-1} \vec{x}_b) \\
&= (B^{-1} + H R^{-1} H^T)^{-1} H^T R^{-1} (\vec{y} - H \vec{x}_b)
+ (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} H + B^{-1}) \vec{x}_b \\
&= (B^{-1} + H R^{-1} H^T)^{-1} H^T R^{-1} (\vec{y} - H \vec{x}_b) + \vec{x}_b \\
&= \vec{x}_b + (B^{-1} + H R^{-1} H^T)^{-1} H^T R^{-1} (\vec{y} - H \vec{x}_b).\end{split}\]
Given this, the full posterior probability density is
\[\begin{split}P(\vec{x} | \vec{y}) &= (2 \pi)^{-\frac{N}{2}} \det(A)^{-\frac{1}{2}}
\exp[-\frac{1}{2} (\vec{x} - \vec{x}_a)^T A^{-1} (\vec{x} - \vec{x}_a)] \\
&\propto \exp\{-\frac{1}{2} [\vec{x} - (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)]^T \\
&\qquad\qquad (B^{-1} + H R^{-1} H^T) [\vec{x} - (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)] \\
&= \exp\{-\frac{1}{2} [
\vec{x}^T (B^{-1} + H R^{-1} H^T) \vec{x} + \\
&\qquad\qquad (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)^T (B^{-1} + H R^{-1} H^T)^{-T} (B^{-1} + H R^{-1} H^T) \\
&\qquad\qquad (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b) - \\
&\qquad\qquad (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)^T (B^{-1} + H R^{-1} H^T)^{-T} (B^{-1} + H R^{-1} H^T) \vec{x} - \\
&\qquad\qquad \vec{x} (B^{-1} + H R^{-1} H^T) (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)
]\}\end{split}\]
Simplifying this and using the symmetry of \(B\) and \(R\) gives
\[\begin{split} P(\vec{x} | \vec{y}) &\propto \exp\{-\frac{1}{2} [
\vec{x}^T (B^{-1} + H R^{-1} H^T) \vec{x} + \\
&\qquad\qquad (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)^T (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b) - \\
&\qquad\qquad (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)^T \vec{x} - \\
&\qquad\qquad \vec{x} (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)
]\} \\
&= \exp[-\frac{1}{2}
(H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)^T (B^{-1} + H R^{-1} H^T)^{-1} (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)
] \\
&\quad \exp\{-\frac{1}{2} [
\vec{x}^T (B^{-1} + H R^{-1} H^T) \vec{x} -
(H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)^T \vec{x} -
\vec{x}^T (H^T R^{-1} \vec{y} + B^{-1} \vec{x}_b)
]\} \\
&\propto \exp\{-\frac{1}{2} [
\vec{x}^T (B^{-1} + H R^{-1} H^T) \vec{x}
- \vec{y}^T R^{-1} H \vec{x} - \vec{x}_b^T B^{-1} \vec{x}
- \vec{x}^T H^T R^{-1} \vec{y} - \vec{x}^T B^{-1} \vec{x}_b
]\},\end{split}\]
which you will recognize from the line that was recognized as a
multivariate Gaussian.