Projection uncertainty of intrinsics: mean-pcam

A key part of the projection uncertainty propagation method is to compute a function \(\vec q^+\left(\vec b\right)\) to represent the change in projected pixel \(\vec q\) as the optimization vector \(\vec b\) moves around.

mrcal has several methods of doing this, and the mean-pcam method is described here. This is the method implemented in mrcal initially, and is recommended until cross-reprojection uncertainty replaces it in mrcal 3.0. This is accessible by calling mrcal.projection_uncertainty(method = "mean-pcam") (this is the default). This method is simple, and has some issues that will be resolved by newer formulations.

Mean-pcam uncertainty

The inputs of a calibration is chessboard observations, which are noisy, and we treat them as samples of a random distribution. Thus the optimized state vector \(\vec b\) is a random variable, and we know its distribution. To evaluate the projection uncertainty we want to project a point fixed in space, to see how its projection \(\vec q\) moves around as everything shifts due to the uncertainty in \(\vec b\). In other words, we want to project a point defined in the coordinate system of the camera housing, as the origin of the mathematical camera moves around inside this housing:

uncertainty.svg

How do we operate on points in a fixed coordinate system when all the coordinate systems we have are floating random variables? We use the most fixed thing we have: the chessboards.

To begin, let's focus on just one chessboard observation: frame 0. We want to know the uncertainty at a pixel coordinate \(\vec q\), so let's unproject and transform \(\vec q\) out to frame 0:

\[ \vec p_{\mathrm{frame}_0} = T_{\mathrm{f}_0\mathrm{r}} T_\mathrm{rc} \mathrm{unproject}\left( \vec q \right) \]

We then transform and project \(\vec p_{\mathrm{frame}_0}\) back to the imager to get \(\vec q^+\). But here we take into account the uncertainties of each transformation to get the desired projection uncertainty \(\mathrm{Var}\left(\vec q^+ - \vec q\right)\). The full data flow looks like this, with all the noisy quantities marked with a \(+\) superscript.

\[ \xymatrix{ \vec q^+ & & \vec p^+_\mathrm{camera} \ar[ll]_-{\vec b_\mathrm{intrinsics}^+} & \vec p^+_{\mathrm{reference}_0} \ar[l]_-{T^+_\mathrm{cr}} & \vec p_{\mathrm{frame}_0} \ar[l]_-{T^+_{\mathrm{rf}_0}} & \vec p_\mathrm{reference} \ar[l]_-{T_\mathrm{fr}} & \vec p_\mathrm{camera} \ar[l]_-{T_\mathrm{rc}} & & \vec q \ar[ll]_-{\vec b_\mathrm{intrinsics}} } \]

This works, but we may have multiple chessboard observations, each with its own transform \(T_{\mathrm{rf}}\). And our camera might be moving, so we might have multiple \(T_{\mathrm{cr}}\). The "mean-pcam" method combines all these simply by computing the mean of all the camera-coordinate points \(\vec p^+_\mathrm{camera}\). The full data flow then looks like this:

\[ \xymatrix{ & & & \vec p^+_{\mathrm{camera}_0} \ar[dl]_{\mathrm{mean}} & \vec p^+_{\mathrm{reference}_0} \ar[l]_-{T^+_{\mathrm{c_0 r}}} & \vec p_{\mathrm{frame}_0} \ar[l]_-{T^+_{\mathrm{rf_0}}} & \vec p_{\mathrm{reference}_0} \ar[l]_-{T_\mathrm{f_0 r}} \\ \vec q^+ & & \vec p^+_\mathrm{camera} \ar[ll]_-{\vec b_\mathrm{intrinsics}^+} & \vec p^+_{\mathrm{camera}_1} \ar[l] _{\mathrm{mean}} & \vec p^+_{\mathrm{reference}_1} \ar[l]_-{T^+_{\mathrm{c_1 r}}} & \vec p_{\mathrm{frame}_1} \ar[l]_-{T^+_{\mathrm{rf_1}}} & \vec p_{\mathrm{reference}_1} \ar[l]_-{T_\mathrm{f_1 r}} & \vec p_\mathrm{camera} \ar[l]_-{T_\mathrm{r_1 c}} \ar[lu]_-{T_\mathrm{r_0 c}} \ar[ld]^-{T_\mathrm{r_2 c}} & & \vec q \ar[ll]_-{\vec b_\mathrm{intrinsics}} \\ & & & \vec p^+_{\mathrm{camera}_2} \ar[ul]^{\mathrm{mean}} & \vec p^+_{\mathrm{reference}_2} \ar[l]_-{T^+_{\mathrm{c_2 r}}} & \vec p_{\mathrm{frame}_2} \ar[l]_-{T^+_{\mathrm{rf_2}}} & \vec p_{\mathrm{reference}_2} \ar[l]_-{T_\mathrm{f_2 r}} } \]

This is simplified because we want to evaluate all the observed combinations of cameras and frames. So to summarize, to compute the projection uncertainty at a pixel \(\vec q\) we

  1. Unproject \(\vec q\) and transform to each chessboard coordinate system to obtain \(\vec p_{\mathrm{frame}_i}\)
  2. Transform and project back to \(\vec q^+\), using the mean of all the \(\vec p_{\mathrm{camera}_i}\) and taking into account uncertainties

We have \(\vec q^+\left(\vec b\right) = \mathrm{project}\left( \mathrm{mean}_{i,j} \left( T_\mathrm{\mathrm{c}_j \mathrm{r}} T_{\mathrm{rf}_i} \vec p_{\mathrm{frame}_i} \right) \right)\) where the transformations \(T\) and the intrinsics used in \(\mathrm{project}()\) come directly from the optimization state vector \(\vec b\). This function can be used in the higher-level projection uncertainty propagation method to compute \(\mathrm{Var}\left( \vec q \right)\).

Note that the projection method implemented prior to mrcal 3.0 computes the mean of \(\vec p_\mathrm{reference}\), not \(\vec p_\mathrm{camera}\). But since those earlier releases of mrcal would only compute uncertainty for stationary-camera solves, \(T_{\mathrm{cr}}\) was constant for each physical camera, and the two approaches were equivalent. Taking the mean of \(\vec p_\mathrm{camera}\) instead allows us to handle moving cameras.

Problems with "mean-pcam" uncertainty

This "mean-pcam" uncertainty method works well, but has several issues. All of these will be fixed in the newer cross-reprojection-rrp-Jfp method.

Chessboards are a hard requirement

The first step in the formulation presented above is to unproject the query pixel \(\vec q\) out to the coordinate system of the chessboards. But this is impossible if we have no chessboards (we're looking at discrete points only, for instance). The "cross-reprojection" uncertainty method was developed specifically to address this shortcoming.

Aphysical \(T_{\mathrm{r}^+\mathrm{r}}\) transform

The computation above indirectly computes the transform that relates the unperturbed and perturbed reference coordinate systems:

\[ T_{\mathrm{r}^+\mathrm{r}} = \mathrm{mean}_i \left( T_{\mathrm{r}^+\mathrm{f}_i} T_{\mathrm{f}_i\mathrm{r}} \right) \]

Each transformation \(T\) includes a rotation matrix \(R\), so the above constructs a new rotation as a mean of multiple rotation matrices. This is aphysical: the resulting matrix is not a valid rotation. This is often OK (mrcal < 3.0 does produce usable results), but it can break.

Poorly-defined \(T_{\mathrm{r}^+\mathrm{r}}\) transform

In addition to this transform being aphysical, it's not even uniquely defined: each query point \(q\) will produce a different \(T_{\mathrm{r}^+\mathrm{r}}\). This makes no sense: this should be a function of the calibration (original and perturbed) only.

Pessimistic response to disparate observed chessboard ranges

Because of this aphysical transform, the mean-pcam method produces fictitiously high uncertainties when gives a mix of low-range and high-range observations. Far-away chessboard observations don't contain much information, so adding some far-away chessboards to a dataset shouldn't improve the uncertainty much at any distance, but it shouldn't make it any worse either. However, with the mean-pcam method, far-away observations do make the uncertainty worse. We can clearly see this in the dance study:

analyses/dancing/dance-study.py           \
    --scan num_far_constant_Nframes_near  \
    --range 2,10                          \
    --Ncameras 1                          \
    --Nframes-near 100                    \
    --observed-pixel-uncertainty 2        \
    --ymax 4                              \
    --uncertainty-at-range-sampled-max 35 \
    --Nscan-samples 4                     \
    --method mean-pcam                  \
    opencv8.cameramodel

dance-study-scan-num-far-constant-num-near--mean-pcam.svg

This is a one-camera calibration computed off 100 chessboard observations at 2m out, with a few observations added at a longer range of 10m. Each curve represents the projection uncertainty at the center of the image, at different distances. The purple curve is the uncertainty with no 10m chessboards at all. As we add observations at 10m, we see the uncertainty get worse.

The issue is the averaging in 3D point space. Observation noise causes the far-off geometry to move much more than the nearby chessboards, and that far-off motion then dominates the average. If we use the newer cross-reprojection-rrp-Jfp method, this issue goes away:

analyses/dancing/dance-study.py           \
    --scan num_far_constant_Nframes_near  \
    --range 2,10                          \
    --Ncameras 1                          \
    --Nframes-near 100                    \
    --observed-pixel-uncertainty 2        \
    --ymax 4                              \
    --uncertainty-at-range-sampled-max 35 \
    --Nscan-samples 4                     \
    --method cross-reprojection-rrp-Jfp  \
    opencv8.cameramodel

dance-study-scan-num-far-constant-num-near--cross-reprojection-rrp-Jfp.svg

As expected, the low-range uncertainty is unaffected by the 10m observations, but the far-range uncertainty is improved.