Best Linear Unbiased Estimator
This derivation follows the Grubbs and Weaver (1947) definition of the
Best Linear Unbiased Estimator.
We have two estimates related to the true flux \(\vec{x}_t\): some
flux estimate \(\vec{x}_b\) and an independent set of atmospheric
measurements \(\vec{y}\). We think the difference
\(\epsilon_b = \vec{x}_b - \vec{x}_a\) has mean zero and
variance-covariance matrix \(B\). Similarly, we think the
measurement errors \(\epsilon_o = \vec{y} - H \vec{x}_t\) have
mean zero and variance-covariance \(R\). We would like to find
the linear combination of these estimates that is an unbiased
estimator for \(\vec{x}_t\) and has the lowest variance of such
estimators.
That is, we seek \(\vec{x}_a\) such that:
\[ \begin{align}\begin{aligned}\begin{split}\vec{x}_a = A_1 \vec{x}_b + A_2 \vec{y} + A_3 \\\end{split}\\\begin{split}E[\vec{x}_a - \vec{x}_t] = 0 \\\end{split}\\\vec{x}_a = \operatorname{argmin}( E[\vec{x}_a \cdot \vec{x}_a] )\end{aligned}\end{align} \]
for some matrices \(A_1\) and \(A_2\) and some vector \(A_3\).
Let us start with the first requirement. To obtain an unbiased
estimator, we must have:
\[\begin{split}\vec{x}_t &= E[\vec{x}_a] \\
&= E[A_1 \vec{x}_b + A_2 \vec{y} + A_3] \\
&= E[A_1 (\vec{x}_t + \epsilon_b)] + E[A_2 (H \vec{x}_t + \epsilon_o)] + E[A_3] \\
&= A_1 E[\vec{x}_t] + A_1 E[\epsilon_b] + A_2 H E[\vec{x}_t] + A_2 H E[\epsilon_o] + A_3 \\
&= A_1 \vec{x}_t + 0 + A_2 H \vec{x}_t + 0 + A_3 \\
&= (A_1 + A_2 H) \vec{x}_t + A_3\end{split}\]
Since we want this equality to hold for all values of
\(\vec{x}_t\), we must have \(A_3 = 0\) and \(A_1 + A_2 H = I\).
With this relationship between \(A_1\) and \(A_2\), we can
start on the second requirement. We want to minimize
\(E[\vec{x}_a \cdot \vec{x}_a]\), where \(\vec{x}_a\) is now
given by \((I - A_2 H) \vec{x}_b + A_2 \vec{y}\). The quantity we
are trying to minimize is the trace of the covariance matrix of
\(\vec{x}_a\), which is given by the covariance of
\(\vec{x}_a\) with itself:
\[\begin{split}\DeclareMathOperator{\Cov}{Cov}
\Cov[\vec{x}_a, \vec{x}_a] &= \Cov[(I - A_2 H) \vec{x}_b + A_2 \vec{y}, (I - A_2 H) \vec{x}_b + A_2 \vec{y}] \\
&= \Cov[(I - A_2 H) \vec{x}_b, (I - A_2 H) \vec{x}_b] + \Cov[(I - A_2 H) \vec{x}_b, A_2 \vec{y}] + \\
&\qquad\qquad \Cov[A_2 \vec{y}, (I - A_2 H) \vec{x}_b] + \Cov[A_2 \vec{y}, A_2 \vec{y}] \\
&= (I - A_2 H) \Cov[\vec{x}_b, \vec{x}_b] (I - A_2 H)^T + (I - A_2 H) \Cov[\vec{x}_b, \vec{y}] A_2^T + \\
&\qquad\qquad A_2 \Cov[\vec{y}, \vec{x}_b] (I - A_2 H)^T + A_2 \Cov[\vec{y}, \vec{y}] A_2^T \\
&= (I - A_2 H) B (I - H^T A_2^T) + 0 + 0 + A_2 T A_2^T \\
&= B - B H^T A_2^T - A_2 H B + A_2 H B H^T A_2^T + A_2 R A_2^T\end{split}\]
To find the minimum, we will need to find the derivative of the trace
of this covariance matrix with respect to A_2.
\[\begin{split}\DeclareMathOperator{\trace}{tr}
\frac{d}{dA_2} \trace(B - B H^T A_2^T - A_2 H B + A_2 H B H^T A_2^T + A_2 R A_2^T) = \\
\frac{d}{dA_2} [\trace(B) - \trace(B H^T A_2^T) - \trace(A_2 H B) + \trace(A_2 H B H^T A_2^T) + \trace(A_2 R A_2^T) = \\
\frac{d}{dA_2} \trace(B) - \frac{d}{dA_2} 2 \trace(B H^T A_2^T) + \frac{d}{dA_2} \trace(A_2 H B H^T A_2^T) + \frac{d}{d A_2} \trace(A_2 R A_2^T) = \\
0 - 2 \trace(B H^T) + 2 \trace(A_2 H B H^T) + 2 \trace(A_2 R) = \\
2 \trace[-B H^T + A_2 (H B H^T + R)]\end{split}\]
We find the minimum where the derivative is equal to zero. The
simplest way to find that is to set everything inside the trace to be
equal to zero and solve for \(A_2\).
\[\begin{split}0 &= -B H^T + A_2 (HBH^T + R) \\
B H^T &= A_2 (H B H^T + R) \\
B H^T (H B H^T + R)^{-1} &= A_2 (H B H^T + R) (H B H^T + R)^{-1} \\
A_2 &= B H^T (H B H^T + R)^{-1}\end{split}\]
That is, the best linear unbiased estimate \(\vec{x}_a\) of the
true fluxes \(\vec{x}_t\) given a previous estimate
\(\vec{x}_b\) and observations \(\vec{y}\) with uncertainties
\(B\) and \(R\), respectively, is
\[\begin{split}\vec{x}_a &= (I - A_2 H) \vec{x}_b + A_2 \vec{y} \\
&= \vec{x}_b + A_2 (\vec{y} - H \vec{x}_b) \\
&= \vec{x}_b + B H^T (H B H^T + R)^{-1} (\vec{y} - H \vec{x}_b)\end{split}\]
with covariance matrix given by
\[\begin{split}A &= (I - A_2 H) B (I - H^T A_2^T) + A_2 R A_2^T \\
&= (I - A_2 H) B - (I - A_2 H) B H^T A_2^T + A_2 R A_2^T \\
&= (I - A_2 H) B - [(I - A_2 H) B H^T - A_2 R] A_2^T \\
&= (I - A_2 H) B - [B H^T - A_2 (H B H^T + R)] A_2^T \\
&= (I - A_2 H) B - [B H^T - B H^T (H B H^T + R)^{-1} (H B H^T + R)] A_2^T \\
&= (I - A_2 H) B - [B H^T - B H^T I] A_2^T \\
&= (I - A_2 H) B - 0 \\
&= B - B H^T (H B H^T + R)^{-1} H B\end{split}\]
References
Grubbs, F., and Weaver, C. (1947). The Best Unbiased Estimate of
Population Standard Deviation Based on Group Ranges. Journal of the
American Statistical Association, 42(238), 224–241.
doi:10.2307/2280652, jstor:2280652