Projection uncertainty: mean-frames

A key part of the projection uncertainty propagation method is to compute a function \(\vec q^+\left(\vec b\right)\) to represent the change in projected pixel \(\vec q\) as the optimization vector \(\vec b\) moves around.

mrcal has several methods of doing this, and the legacy mean-frames method is described here. This is accessible by calling mrcal.projection_uncertainty(method = "mean-frames"). This method is simple, and has some issues that are resolved by newer formulations: starting with mrcal 3.0 the improved cross-reprojection uncertainty method is recommended.

Mean-frames uncertainty

The state vector \(\vec b\) is a random variable, and we know its distribution. To evaluate the projection uncertainty we want to project a fixed point, to see how this projection \(\vec q\) moves around as the chessboards and cameras and intrinsics shift due to the uncertainty in \(\vec b\). In other words, we want to project a point defined in the coordinate system of the camera housing, as the origin of the mathematical camera moves around inside this housing:

uncertainty.svg

How do we operate on points in a fixed coordinate system when all the coordinate systems we have are floating random variables? We use the most fixed thing we have: chessboards. As with the camera housing, the chessboards themselves are fixed in space. We have noisy camera observations of the chessboards that implicitly produce estimates of the fixed transformation \(T_{\mathrm{cf}_i}\) for each chessboard \(i\). The explicit transformations that we actually have in \(\vec b\) all relate to a floating reference coordinate system: \(T_\mathrm{cr}\) and \(T_\mathrm{rf}\). That coordinate system doesn't have any physical meaning, and it's useless in producing our fixed point.

Thus if we project points from a chessboard frame, we would be unaffected by the untethered reference coordinate system. So points in a chessboard frame are somewhat "fixed" for our purposes.

To begin, let's focus on just one chessboard frame: frame 0. We want to know the uncertainty at a pixel coordinate \(\vec q\), so let's unproject and transform \(\vec q\) out to frame 0:

\[ \vec p_{\mathrm{frame}_0} = T_{\mathrm{f}_0\mathrm{r}} T_\mathrm{rc} \mathrm{unproject}\left( \vec q \right) \]

We then transform and project \(\vec p_{\mathrm{frame}_0}\) back to the imager to get \(\vec q^+\). But here we take into account the uncertainties of each transformation to get the desired projection uncertainty \(\mathrm{Var}\left(\vec q^+ - \vec q\right)\). The full data flow looks like this, with all the perturbed quantities marked with a \(+\) superscript.

\[ \xymatrix{ \vec q^+ & & \vec p^+_\mathrm{camera} \ar[ll]_-{\vec b_\mathrm{intrinsics}^+} & \vec p^+_{\mathrm{reference}_0} \ar[l]_-{T^+_\mathrm{cr}} & \vec p_{\mathrm{frame}_0} \ar[l]_-{T^+_{\mathrm{rf}_0}} & \vec p_\mathrm{reference} \ar[l]_-{T_\mathrm{fr}} & \vec p_\mathrm{camera} \ar[l]_-{T_\mathrm{rc}} & & \vec q \ar[ll]_-{\vec b_\mathrm{intrinsics}} } \]

This works, but it depends on \(\vec p_{\mathrm{frame}_0}\) being "fixed". We can do better. We're observing more than one chessboard, and in aggregate all the chessboard frames can represent an even-more "fixed" frame. Currently we take a very simple approach towards combinining the frames: we compute the mean of all the \(\vec p^+_\mathrm{reference}\) estimates from each frame. The full data flow then looks like this:

\[ \xymatrix{ & & & & \vec p^+_{\mathrm{reference}_0} \ar[dl]_{\mathrm{mean}} & \vec p^+_{\mathrm{frame}_0} \ar[l]_-{T^+_{\mathrm{rf}_0}} \\ \vec q^+ & & \vec p^+_\mathrm{camera} \ar[ll]_-{\vec b_\mathrm{intrinsics}^+} & \vec p^+_{\mathrm{reference}} \ar[l]_-{T^+_\mathrm{cr}} & \vec p^+_{\mathrm{reference}_1} \ar[l]_{\mathrm{mean}} & \vec p^+_{\mathrm{frame}_1} \ar[l]_-{T^+_{\mathrm{rf}_1}} & \vec p_\mathrm{reference} \ar[l]_-{T_\mathrm{f_1 r}} \ar[lu]_-{T_\mathrm{f_0 r}} \ar[ld]^-{T_\mathrm{f_2 r}} & \vec p_\mathrm{camera} \ar[l]_-{T_\mathrm{rc}} & & \vec q \ar[ll]_-{\vec b_\mathrm{intrinsics}} \\ & & & & \vec p^+_{\mathrm{reference}_2} \ar[ul]^{\mathrm{mean}} & \vec p^+_{\mathrm{frame}_2} \ar[l]_-{T^+_{\mathrm{rf}_2}} } \]

So to summarize, to compute the projection uncertainty at a pixel \(\vec q\) we

  1. Unproject \(\vec q\) and transform to each chessboard coordinate system to obtain \(\vec p_{\mathrm{frame}_i}\)
  2. Transform and project back to \(\vec q^+\), using the mean of all the \(\vec p_{\mathrm{reference}_i}\) and taking into account uncertainties

We have \(\vec q^+\left(\vec b\right) = \mathrm{project}\left( T_\mathrm{cr} \, \mathrm{mean}_i \left( T_{\mathrm{rf}_i} \vec p_{\mathrm{frame}_i} \right) \right)\) where the transformations \(T\) and the intrinsics used in \(\mathrm{project}()\) come directly from the optimization state vector \(\vec b\). This function can be used in the higher-level projection uncertainty propagation method to compute \(\mathrm{Var}\left( \vec q \right)\).

Problems with "mean-frames" uncertainty

This "mean-frames" uncertainty method works well, but has several issues:

Aphysical \(T_{\mathrm{r}^+\mathrm{r}}\) transform

The computation above indirectly computes the transform that relates the unperturbed and perturbed reference coordinate systems:

\[ T_{\mathrm{r}^+\mathrm{r}} = \mathrm{mean}_i \left( T_{\mathrm{r}^+\mathrm{f}_i} T_{\mathrm{f}_i\mathrm{r}} \right) \]

Each transformation \(T\) includes a rotation matrix \(R\), so the above constructs a new rotation as a mean of multiple rotation matrices. This is aphysical: the resulting matrix is not a valid rotation. In practice, the perturbations are tiny, and this is sufficiently close. Usually, but not always.

Pessimistic response to disparate observed chessboard ranges

Because of this aphysical transform, the mean-frames method produces fictitiously high uncertainties when gives a mix of low-range and high-range observations. Far-away chessboard observations don't contain much information, so adding some far-away chessboards to a dataset shouldn't improve the uncertainty much at the distance, but it shouldn't make it any worse. However, with the mean-frames method, far-away observations do make the uncertainty worse. We can clearly see this in the dance study:

analyses/dancing/dance-study.py           \
    --scan num_far_constant_Nframes_near  \
    --range 2,10                          \
    --Ncameras 1                          \
    --Nframes-near 100                    \
    --observed-pixel-uncertainty 2        \
    --ymax 4                              \
    --uncertainty-at-range-sampled-max 35 \
    --Nscan-samples 4                     \
    --method mean-frames                  \
    opencv8.cameramodel

dance-study-scan-num-far-constant-num-near--mean-frames.svg

This is a one-camera calibration computed off 100 chessboard observations at 2m out, with a few observations added at a longer range of 10m. Each curve represents the projection uncertainty at the center of the image, at different distances. The purple curve is the uncertainty with no 10m chessboards at all. As we add observations at 10m, we see the uncertainty get worse.

The issue is the averaging in 3D point space. Observation noise causes the far-off geometry to move much more than the nearby chessboards, and that far-off motion then dominates the average. If we use the newer cross-reprojection--rrp-Jfp method, this issue goes away:

analyses/dancing/dance-study.py           \
    --scan num_far_constant_Nframes_near  \
    --range 2,10                          \
    --Ncameras 1                          \
    --Nframes-near 100                    \
    --observed-pixel-uncertainty 2        \
    --ymax 4                              \
    --uncertainty-at-range-sampled-max 35 \
    --Nscan-samples 4                     \
    --method cross-reprojection--rrp-Jfp  \
    opencv8.cameramodel

dance-study-scan-num-far-constant-num-near--cross-reprojection--rrp-Jfp.svg

As expected, the low-range uncertainty is unaffected by the 10m observations, but the far-range uncertainty is improved.

Chessboards are a hard requirement

The "mean-frames" metho has a hard requirement on chessboards being used in the solve. In fact, the assumption of stationary cameras observing a moving chessboard is baked into the formulation. So any other case (moving cameras or calibrating off discrete points for instance) is not supported. The newer cross-reprojection--rrp-Jfp method lifts this restriction.