A tour of mrcal: optimal choreography

Previous
Optimal choreography
Next

We just discussed the effect of range in differencing and uncertainty computations.

Optimal choreography

Now that we know how to measure calibration quality and what to look for, we can run some studies to figure out what makes a good chessboard dance. These are all computed by the analyses/dancing/dance-study.py tool. It generates synthetic data, scans a parameter, and produces the uncertainty-vs-range curves at the imager center to visualize the effect of that parameter.

I run all of these studies using the LENSMODEL_OPENCV8 model. It computes faster than the splined model, and qualitatively produces similar results.

How many chessboard observations should we get?

dance-study.py                   \
  --scan Nframes                 \
  --Ncameras 1                   \
  --Nframes 20,200               \
  --range 0.5                    \
  --observed-pixel-uncertainty 2 \
  --ymax 1                       \
  opencv8.cameramodel

Here I'm running a monocular solve that looks at chessboards ~ 0.5m away, scanning the frame count from 20 to 200.

The horizontal dashed lines in these plots is the uncertainty of the input noise observations: 2 pixels. We can usually do much better than that. The vertical dashed line is the mean distance where we observed the chessboards. Looks like the sweet spot is a bit past that.

And it looks like more observations is always better, but we reach the point of diminishing returns at ~ 100 frames.

How close should the chessboards be?

dance-study.py                   \
  --scan range                   \
  --Ncameras 1                   \
  --Nframes 100                  \
  --range 0.4,10                 \
  --observed-pixel-uncertainty 2 \
  opencv8.cameramodel

This effect is dramatic: we want closeups. Anything else is a waste of time. Here we have two vertical dashed lines, indicating the minimum and maximum ranges I'm scanning. And we can see that the sweet spot moves further back as we move the chessboards back.

Should the chessboards be shown head-on, or should they be tilted?

dance-study.py                         \
  --scan tilt_radius                   \
  --tilt-radius 0,80                   \
  --Ncameras 1                         \
  --Nframes 100                        \
  --range 0.5                          \
  --observed-pixel-uncertainty 2       \
  --ymax 2                             \
  --uncertainty-at-range-sampled-max 5 \
  opencv8.cameramodel

The head-on views (tilt = 0) produce quite poor results. And we get more and more confidence with more board tilt, with diminishing returns at about 45 degrees.

We now know that we want closeups and we want tilted views. This makes intuitive sense: a tilted close-up view is the best-possible view for the solver to disambiguate focal length effects from the effects of chessboard range. The worst-possible observations for this are head-on far-away views. Given such observations, moving the board forward/backward and changing the focal length have a very similar effect on the observed pixels.

Also this clearly tells us that chessboards are the way to go, and a calibration object that contains a grid of circles will work badly. Circle grids work either by finding the centroid of each circle "blob" or by fitting a curve to the circle edge to infer the location of the center. A circle viewed from a tilted closeup will appear lopsided, so we have a choice of suffering a bias from imprecise circle detections or getting poor uncertainties from insufficient tilt. Extra effort can be expended to improve this situation to make circle grids usable, or chessboards can be used.

How many cameras should observe the chessboard?

Moving on. Often we want to calibrate multiple cameras, and if we only care about the intrinsics we are free to do one N-way calibration or N separate monocular calibrations or anything in-between. The former has more constraints, so presumably that would produce less uncertainty. How much?

I'm processing the same calibration geometry, varying the number of cameras from 1 to 8. The cameras are all in the same physical location, so they're all seeing the same thing (modulo the noise), but the solves have different numbers of parameters and constraints.

dance-study.py                          \
  --scan Ncameras                       \
  --Ncameras 1,8                        \
  --camera-spacing 0                    \
  --Nframes 100                         \
  --range 0.5                           \
  --ymax 0.4                            \
  --uncertainty-at-range-sampled-max 10 \
  --observed-pixel-uncertainty 2        \
  opencv8.cameramodel

Clearly there's a benefit to more cameras. After about 4, we hit diminishing returns.

That's great. We now know how to dance given a particular chessboard. But what kind of chessboard do we want? mrcal assumes a chessboard being an evenly-spaced planar grid with any number of points and any spacing.

How dense should the chessboard pattern be?

Let's examine the point counts. We expect that adding more points to a chessboard of the same size would produce better results, since we would have strictly more data to work with. This expectation is correct:

dance-study.py                          \
  --scan object_width_n                 \
  --range 2                             \
  --Ncameras 1                          \
  --Nframes 100                         \
  --object-width-n 5,30                 \
  --uncertainty-at-range-sampled-max 30 \
  --observed-pixel-uncertainty 2        \
  opencv8.cameramodel

Here we varied object-width-n, but also adjusted object-spacing to keep the chessboard size the same.

How big should the chessboard be?

What if we leave the point counts alone, but vary the spacing? As we increase the point spacing, the board grows in size, spanning more and more of the imager. We expect this would improve things:

dance-study.py                   \
  --scan object_spacing          \
  --range 2                      \
  --Ncameras 1                   \
  --Nframes 100                  \
  --object-spacing 0.04,0.20     \
  --observed-pixel-uncertainty 2 \
  opencv8.cameramodel

And it does. At the same range, a bigger chessboard is better.

Finally, what if we increase the spacing (and thus the board size), but also move the board back to compensate, so the apparent size of the chessboard stays the same? I.e. do we want a giant board far away or a tiny board really close in?

dance-study.py                                     \
  --scan object_spacing                            \
  --scan-object-spacing-compensate-range-from 0.04 \
  --range 2                                        \
  --Ncameras 1                                     \
  --Nframes 100                                    \
  --object-spacing 0.04,0.20                       \
  --ymax 20                                        \
  --uncertainty-at-range-sampled-max 200           \
  --observed-pixel-uncertainty 2                   \
  opencv8.cameramodel

Looks like the optimal uncertainty is similar in all cases, but tracks the moving range. The uncertainty at infinity is roughly similar in all cases. This is expected: we care about the size of the chessboard relative to its distance from the camera, and that is constant here.

Conclusions:

More frames are good
Closeups are extremely important (up to some practical limits)
Tilted views are good
A smaller number of bigger calibration problems is good
More chessboard corners is good, as long as the detector can find them reliably

None of these are surprising, but it's good to see the effects directly from the data. And we now know exactly how much value we get out of each additional observation or an extra little bit of board tilt or some extra chessboard corners.

Before moving on, I should stress that the results presented here represent a particular scenario using a LENSMODEL_OPENCV8 lens, and produce clear rules of thumb. For a specific lens and geometry, rerun these studies for your use cases.

We can now use the models for stereo processing