Projection uncertainty
Table of Contents
After a calibration has been computed, it is essential to get a sense of how good the calibration is (how closely it represents reality). Traditional (non-mrcal) calibration routines rely on one metric of calibration quality: the residual fit error. This is clearly inadequate because we can always improve this metric by throwing away some input data, and it doesn't make sense that using less data would make a calibration better.
There are two main sources of error in the calibration solve. Without these
errors, the calibration data would fit perfectly, producing a solve residual
vector that's exactly
- Sampling error. Our computations are based on fitting a model to observations of chessboard corners. These observations aren't perfect, and contain a sample of some noise distribution. We can characterize this distribution and we can analytically predict the effects of that noise
- Model error. These result when the solver's model of the world is insufficient to describe what is actually happening. When model errors are present, even the best set of parameters aren't able to completely fit the data. Some sources of model errors: motion blur, unsynchronized cameras, chessboard detector errors, too-simple (or unstable) lens models or chessboard deformation models, and so on. Since these errors are unmodeled (by definition), we can't analytically predict their effects. Instead we try hard to force these errors to zero, so that we can ignore them. We do this by using rich-enough models and by gathering clean data. To detect model errors we look at the solve diagnostics and we compute cross-validation diffs.
Let's do as much as we can analytically: let's gauge the effects of sampling error by computing a projection uncertainty for a model. Since only the sampling error is evaluated:
Any promises of a high-quality low-uncertainty calibration are valid only if the model errors are small.
The method to estimate the projection uncertainty is accessed via the
mrcal.projection_uncertainty()
function. Here the "uncertainty" is the
sensitivity to sampling error: the calibration-time pixel noise. This tells us
how good a calibration is (we aim for low projection uncertainties), and it can
tell us how good the downstream results are as well (by propagating projection
uncertainties through the downstream computation).
To estimate the projection uncertainty we:
- Estimate the noise in the chessboard observations
- Propagate that noise to the optimal parameters
reported by the calibration routine - Propagate the uncertainty in calibration parameters
through the projection function to get uncertainty in the resulting pixel coordinate
This overall approach is sound, but it implies some limitations:
- Once again, model errors are not included in this uncertainty estimate
- The choice of lens model affects the reported uncertainties. Lean models (those with few parameters) are less flexible than rich models, and don't fit general lenses as well as rich models do. This stiffness also serves to limit the model's response to noise in their parameters. Thus the above method will report less uncertainty for leaner models than rich models. So, unless we're sure that a given lens follows some particular lens model perfectly, a splined lens model (i.e. a very rich model) is recommended for truthful uncertainty reporting. Otherwise the reported confidence comes from the model itself, rather than the calibration data.
- Currently the uncertainty estimates can be computed only from a vanilla calibration problem: a set of stationary cameras observing a moving calibration object. Other formulations can be used to compute the lens parameters as well (structure-from-motion while also computing the lens models for instance), but at this time the uncertainty computations cannot handle those cases. It can be done, but the current method needs to be extended to do so.
Estimating the input noise
We're measuring the sensitivity to the noise in the calibration-time observations. In order to propagate this noise, we need to know what that input noise is. The current approach is described in the optimization problem formulation.
Propagating input noise to the state vector
We solved the least squares problem, so we have the optimal state vector
We apply a perturbation to the observations
We have
At the optimum
We perturb the problem:
And we reoptimize:
We started at an optimum, so
As defined on the input noise page, we have
where
and thus
So if we perturb the input observation vector
As usual,
As stated on the input noise page, we're assuming independent noise on all observed pixels, with a standard deviation inversely proportional to the weight:
so
If we have no regularization, then
Note that these expressions do not explicitly depend on
Propagating the state vector noise through projection
We now have the variance of the full optimization state
The state vector
How do we operate on points in a fixed coordinate system when all the coordinate
systems we have are floating random variables? We use the most fixed thing we
have: chessboards. As with the camera housing, the chessboards themselves are
fixed in space. We have noisy camera observations of the chessboards that
implicitly produce estimates of the fixed transformation
Thus if we project points from a chessboard frame, we would be unaffected by the untethered reference coordinate system. So points in a chessboard frame are somewhat "fixed" for our purposes.
To begin, let's focus on just one chessboard frame: frame 0. We want to know
the uncertainty at a pixel coordinate
We then transform and project
This works, but it depends on
This is better, but there's another issue. What is the transformation relating the original and perturbed reference coordinate systems?
Each transformation
So to summarize, to compute the projection uncertainty at a pixel
- Unproject
and transform to each chessboard coordinate system to obtain - Transform and project back to
, useing the mean of all the and taking into account uncertainties
We have
We computed
The mrcal.projection_uncertainty()
function implements this logic. For the
special-case of visualizing the uncertainties, call the any of the uncertainty
visualization functions:
mrcal.show_projection_uncertainty()
: Visualize the uncertainty in camera projectionmrcal.show_projection_uncertainty_vs_distance()
: Visualize the uncertainty in camera projection along one observation ray
or use the mrcal-show-projection-uncertainty
tool.
A sample uncertainty map of the splined model calibration from the tour of mrcal looking out to infinity:
mrcal-show-projection-uncertainty splined.cameramodel --cbmax 1 --unset key
The effect of range
We glossed over an important detail in the above derivation. Unlike a projection
operation, an unprojection is ambiguous: given some camera-coordinate-system
point
And a surprising consequence of that is that while projection is invariant to
scaling (
Let's look at the projection uncertainty at the center of the imager at different ranges for an arbitrary model:
mrcal-show-projection-uncertainty \ --vs-distance-at center \ --set 'yrange [0:0.1]' \ opencv8.cameramodel
So the uncertainty grows without bound as we approach the camera. As we move away, there's a sweet spot where we have maximum confidence. And as we move further out still, we approach some uncertainty asymptote at infinity. Qualitatively this is the figure I see 100% of the time, with the position of the minimum and of the asymptote varying.
As we approach the camera, the uncertainty is unbounded because we're looking at the projection of a fixed point into a camera whose position is uncertain. As we get closer to the origin, the noise in the camera position dominates the projection, and the uncertainty shoots to infinity.
The "sweet spot" where the uncertainty is lowest sits at the range where we observed the chessboards.
The uncertainty we asymptotically approach at infinity is set by the specifics of the chessboard dance.
See the tour of mrcal for a simulation validating this approach of quantifying uncertainty and for some empirical results.