Projection uncertainty

Table of Contents

After a calibration has been computed, it is essential to get a sense of how good the calibration is. Traditional calibration routines rely on one metric of calibration quality: the residual fit error. This is clearly inadequate because we can always improve this metric by throwing away input data, and it doesn't make sense that using less data would make a calibration better.

mrcal addresses this with some tools to gauge how well the data fits the model and with a method to estimate th projection uncertainty, accessed via the mrcal.projection_uncertainty() function. This tells us how good a calibration is (we aim for low projection uncertainties), and it tells us how good the downstream results are (by allowing the user to propagate projection uncertainties through their data pipeline).

How do we estimate the projection uncertainty? A summary of the process:

  1. Estimate the noise in the chessboard observations input to the optimization routine
  2. Propagate that uncertainty to the optimal parameters \(\vec p\) reported by the calibration routine
  3. Propagate the uncertainty in calibration parameters \(\vec p\) through the projection function to get uncertainty in the resulting pixel coordinate \(\vec q\)

This overall approach is sound, but it implies some limitations:

Estimating the input noise

This is described in the optimization problem formulation.

Propagating input noise to the state vector

We solved the least squares problem, so we have the optimal state vector \(\vec p^*\). Let's find the uncertainty.

We apply a perturbation to the observations \(\vec q_\mathrm{ref}\), reoptimize this slightly-perturbed least-squares problem (assuming everything is linear) and look at what happens to the optimal state vector \(\vec p^*\).

We have

\[ E \equiv \left \Vert \vec x \right \Vert ^2 \] \[ J \equiv \frac{\partial \vec x}{\partial \vec p} \] \[ \frac{\partial E}{\partial \vec p} \left(\vec p = \vec p^* \right) = 2 J^T \vec x^* = 0 \]

We perturb the problem:

\[ E( \vec p + \Delta \vec p, \vec q_\mathrm{ref} + \Delta \vec q_\mathrm{ref})) \approx \left \Vert \vec x + J \Delta \vec p + \frac{\partial \vec x}{\partial \vec q_\mathrm{ref}} {\Delta \vec q_\mathrm{ref}} \right \Vert ^2 \]

And we reoptimize:

\[ \frac{\mathrm{d}E}{\mathrm{d}\Delta \vec p} \approx 2 \left( \vec x + J \Delta \vec p + \frac{\partial \vec x}{\partial \vec q_\mathrm{ref}} {\Delta \vec q_\mathrm{ref}} \right)^T J = 0\]

we started at an optimum, so \(J^T \vec x^* = 0\), and thus

\[ J^T J \Delta \vec p = -J^T \frac{\partial \vec x}{\partial \vec q_\mathrm{ref}} {\Delta \vec q_\mathrm{ref}} \]

As defined on the input noise page, we have

\[ \vec x_\mathrm{observations} = W (\vec q - \vec q_\mathrm{ref}) \]

where \(W\) is a diagonal matrix of weights. Let's assume the non-observation elements of \(\vec x\) are at the end, so

\[ \frac{\partial \vec x}{\partial \vec q_\mathrm{ref}} = \left[ \begin{array}{cc} - W \\ 0 \end{array} \right] \]

and thus

\[ J^T J \Delta \vec p = J_\mathrm{observations}^T W \Delta \vec q_\mathrm{ref} \]

So if we perturb the input observation vector \(q_\mathrm{ref}\) by \(\Delta q_\mathrm{ref}\), the resulting effect on the optimal parameters is \(\Delta \vec p = M \Delta \vec q_\mathrm{ref}\). Where

\[ M = \left( J^T J \right)^{-1} J_\mathrm{observations}^T W \]

As usual,

\[ \mathrm{Var}(\vec p) = M \mathrm{Var}\left(\vec q_\mathrm{ref}\right) M^T \]

As stated on the input noise page, we're assuming independent noise on all observed pixels, with a standard deviation inversely proportional to the weight:

\[ \mathrm{Var}\left( \vec q_\mathrm{ref} \right) = \sigma^2 W^{-2} \]

so

\begin{aligned} \mathrm{Var}\left(\vec p\right) &= \sigma^2 M W^{-2} M^T \\ &= \sigma^2 \left( J^T J \right)^{-1} J_\mathrm{observations}^T W W^{-2} W J_\mathrm{observations} \left( J^T J \right)^{-1} \\ &= \sigma^2 \left( J^T J \right)^{-1} J_\mathrm{observations}^T J_\mathrm{observations} \left( J^T J \right)^{-1} \end{aligned}

If we have no regularization, then we can simplify this even further. All measurements are then pixel errors and \(J_\mathrm{observations} = J\) so

\[\mathrm{Var}\left(\vec p\right) = \sigma^2 \left( J^T J \right)^{-1} \]

Note that this does not explicitly depend on \(W\). However, the weights are a part of \(J\). So if an observation \(i\) were to become less precise, \(w_i\) and \(x_i\) and \(J_i\) would all decrease. And as a result, \(\mathrm{Var}\left(\vec p\right)\) would increase, as expected.

Propagating the state vector noise through projection

We now have the variance of the full optimization state \(\vec p\), and we want to propagate this through projection to end up with an estimate of uncertainty at any given pixel \(\vec q\).

The state vector \(\vec p\) is a random variable, and we know its distribution. To evaluate the projection uncertainty we want to project a fixed point, to see how this projection \(\vec q\) moves around as the chessboards and cameras and intrinsics shift due to the uncertainty in \(\vec p\). In other words, we want to project a point defined in the coordinate system of the camera housing, as the origin of the mathematical camera moves around inside this housing:

Sorry, your browser does not support SVG.

So how do we operate on points in a fixed coordinate system when all the coordinate systems we have are floating random variables? We can use the most fixed thing we have: chessboards. As with the camera housing, the chessboards themselves are fixed in space. We have noisy camera observations of the chessboards that implicitly produce estimates of the fixed transformation \(T_{\mathrm{cf}_i}\) for each chessboard \(i\). The explicit transformations that we actually have in \(\vec p\) all relate to a floating reference coordinate system: \(T_\mathrm{cr}\) and \(T_\mathrm{rf}\). That coordinate system doesn't have any physical meaning, and it's useless in producing our fixed point.

Thus if we project points from a chessboard frame, we would be unaffected by the untethered reference coordinate system. So points in a chessboard frame are somewhat "fixed" for our purposes.

To begin, let's focus on just one chessboard frame: frame 0. We want to know the uncertainty at a pixel coordinate \(\vec q\), so let's unproject and transform \(\vec q\) out to frame 0:

\[ \vec p_{\mathrm{frame}_0} = T_{\mathrm{f}_0\mathrm{r}} T_\mathrm{rc} \mathrm{unproject}\left( \vec q \right) \]

We then transform and project \(\vec p_{\mathrm{frame}_0}\) back to the imager to get \(\vec q^+\). But here we take into account the uncertainties of each transformation to get the desired projection uncertainty \(\mathrm{Var}\left(\vec q^+ - \vec q\right)\). The full data flow looks like this, with all the perturbed quantities marked with a \(+\).

\[ \vec q^+ \xleftarrow{\mathrm{intrinsics}^+} \vec p^+_\mathrm{camera} \xleftarrow{T^+_\mathrm{cr}} \vec p^+_{\mathrm{reference}_0} \xleftarrow{T^+_{\mathrm{rf}_0}} \vec p_{\mathrm{frame}_0} \xleftarrow{T_\mathrm{fr}} \vec p_\mathrm{reference} \xleftarrow{T_\mathrm{rc}} \vec p_\mathrm{camera} \xleftarrow{\mathrm{intrinsics}} \vec q \]

This works, but it depends on \(\vec p_{\mathrm{frame}_0}\) being "fixed". Can we do better? Yes. We're observing more than one chessboard, and in aggregate all the chessboard frames can represent an even-more "fixed" frame. Currently we take a very simple approach towards combinining the frames: we compute the mean of all the \(\vec p^+_\mathrm{reference}\) estimates from each frame. The full data flow then looks like this:

\begin{aligned} & \swarrow & \vec p^+_{\mathrm{reference}_0} & \xleftarrow{T^+_{\mathrm{rf}_0}} & \vec p_{\mathrm{frame}_0} & \nwarrow & \\ \vec q^+ \xleftarrow{\mathrm{intrinsics}^+} \vec p^+_\mathrm{camera} \xleftarrow{T^+_\mathrm{cr}} \vec p^+_\mathrm{reference} & \xleftarrow{\mathrm{mean}} & \vec p^+_{\mathrm{reference}_1} & \xleftarrow{T^+_{\mathrm{rf}_1}} & \vec p_{\mathrm{frame}_1} & \xleftarrow{T_\mathrm{fr}} & \vec p_\mathrm{reference} \xleftarrow{T_\mathrm{rc}} \vec p_\mathrm{camera} \xleftarrow{\mathrm{intrinsics}} \vec q \\ & \nwarrow & \vec p^+_{\mathrm{reference}_2} & \xleftarrow{T^+_{\mathrm{rf}_2}} & \vec p_{\mathrm{frame}_2} & \swarrow \end{aligned}

This is better, but there's another issue. What is the transformation relating the original and perturbed reference coordinate systems?

\[ T_{\mathrm{r}^+\mathrm{r}} = \mathrm{mean}_i \left( T_{\mathrm{r}^+\mathrm{f}_i} T_{\mathrm{f}_i\mathrm{r}} \right) \]

Each transformation \(T\) includes a rotation matrix \(R\), so the above constructs a new rotation as a mean of multiple rotation matrices, which is aphysical: the resulting matrix is not a valid rotation. In practice, the perturbations are tiny, and this is sufficiently close. Extreme geometries do break it, and this will be fixed in the future.

So to summarize, to compute the projection uncertainty at a pixel \(\vec q\) we

  1. Unproject \(\vec q\) and transform to each chessboard coordinate system to obtain \(\vec p_{\mathrm{frame}_i}\)
  2. Transform and project back to \(\vec q^+\), taking the mean of \(\vec p_{\mathrm{reference}_i}\) and taking into account uncertainties

We have \(\vec q^+\left(\vec p\right) = \mathrm{project}\left( T_\mathrm{cr} \, \mathrm{mean}_i \left( T_{\mathrm{rf}_i} \vec p_{\mathrm{frame}_i} \right) \right)\) where the transformations \(T\) and the intrinsics used in \(\mathrm{project}()\) come directly from the optimization state vector \(\vec p\). So

\[ \mathrm{Var}\left( \vec q \right) = \frac{\partial \vec q^+}{\partial \vec p} \mathrm{Var}\left( \vec p \right ) \frac{\partial \vec q^+}{\partial \vec p}^T \]

We computed \(\mathrm{Var}\left( \vec p \right )\) earlier, and \(\frac{\partial \vec q^+}{\partial \vec p}\) comes from the projection expression above.

The mrcal.projection_uncertainty() function implements this logic. For the special-case of visualizing the uncertianties, call the any of the uncertainty visualization functions:

or use the mrcal-show-projection-uncertainty tool.

The effect of range

We glossed over an important detail in the above derivation. Unlike a projection operation, an unprojection is ambiguous: given some camera-coordinate-system point \(\vec p\) that projects to a pixel \(\vec q\), we have \(\vec q = \mathrm{project}\left(k \vec v\right)\) for all \(k\). So an unprojection gives you a direction, but no range. The direct implication of this is that we can't ask for an "uncertainty at pixel coordinate \(\vec q\)". Rather we must ask about "uncertainty at pixel coordinate \(\vec q\) looking \(x\) meters out".

And a surprising consequence of that is that while projection is invariant to scaling (\(k \vec v\) projects to the same \(\vec q\) for any \(k\)), the uncertainty of projection is not invariant to this scaling:

Sorry, your browser does not support SVG.

Let's look at the projection uncertainty at the center of the imager at different ranges for an arbitrary model:

mrcal-show-projection-uncertainty --vs-distance-at center data/board/opencv8.cameramodel --set 'yrange [0:0.4]'

Sorry, your browser does not support SVG.

So the uncertainty grows without bound as we approach the camera. As we move away, there's a sweet spot where we have maximum confidence. And as we move further out still, we approach some uncertainty asymptote at infinity. Qualitatively this is the figure I see 100% of the time, with the position of the minimum and of the asymptote varying.

Why is the uncertainty unbounded as we approach the camera? Because we're looking at the projection of a fixed point into a camera whose position is uncertain. As we get closer to the origin of the camera, the noise in the camera position dominates the projection, and the uncertainty shoots to infinity.

What controls the range where we see the uncertainty optimum? The range where we observed the chessboards. The uncertainty we asymptotically approach at infinity is set by the specifics of the chessboard dance.

See the tour of mrcal for a simulation validating the approach and for some empirical results.

Planned improvements

The current implementation is very usable, but a few things should be extended or fixed:

  • As described in the noise model writeup, the expected noise level in the observed chessboard corners \(\sigma\) is currently loosely estimated instead of measured. Measuring it would be very good, but it's not clear how to do that. There's an attempt in mrgingham that could be explored.
  • As noted above, the method used in computing the rotation between the input and perturbed reference frames is aphysical. This produces unexpected results when given chessboard observations at multiple discrete ranges. For instance:

    analyses/dancing/dance-study.py                                                     \
      --scan num_far_constant_Nframes_near --range 2,10 --Ncameras 1 --Nframes-near 100 \
      --observed-pixel-uncertainty 2                                                    \
      --ymax 2.5 --uncertainty-at-range-sampled-max 35                                  \
      test data/cam0.opencv8.cameramodel
    

    says that adding any observations at 10m to the bulk set at 2m makes the projection uncertainty worse. One could expect no improvement from the far-off observations, but they shouldn't break anything. The issue is the averaging in 3D point space. Observation noise causes the far-off geometry to move much more than the nearby chessboards, and that far-off motion then dominates the average. Some experimental fixes are implemented in test/test-projection-uncertainty.py. For instance:

    test/test-projection-uncertainty.py \
      --fixed cam0 --model opencv4      \
      --show-distribution --explore     \
      --reproject-perturbed mean-frames-using-meanq-penalize-big-shifts
    
  • Currently mrcal.projection_uncertainty() computes the uncertainties independently, but for many applications we are interested in the correlations between the projections of multiple points. This could span multiple cameras; for instance, when doing stereo ranging, we want to know the correlated projections due to the intrinsics and extrinsics of the two cameras. The API needs to be expanded to report these joint covariances
  • We want the uncertainty in no-data areas to be high. We're defining uncertainty as a function of the stability of projection in response to noise. However in no-data areas, projection is driven 100% by the regularization terms, which are not directly affected by the observation noise. Most of the time, we still see the high uncertainties we want to see because the noise causes \(\vec p_\mathrm{reference}\) to move, but it's not obvious we can rely on that. Might we see a case when the reported uncertainty in the no-data areas will be low? What if the chessboard poses are locked down?
  • As noted above, the current method used for uncertainty quantification only supports the vanilla calibration problem: stationary cameras are observing a moving chessboard. It would be good to support other scenarios; for instance structure-from-motion coupled with intrinsics optimization