How to run a calibration
Table of Contents
Calibrating cameras is one of the main applications of mrcal, so I describe this
process in some detail here. The main front-end tool is
mrcal-calibrate-cameras
, and a demo is shown in the tour of mrcal.
Capturing images
Calibration object
We need to get images of a calibration object, a board containing an observable grid of points. mrcal doesn't care where these observations come from: any tool that can produce a table of corner observations can be utilized. See the below for the format details. Currently the recommended method is to use a classical chessboard target, and to employ the mrgingham detector. The mrgingham documentation has a .pdf of a chessboard pattern that works well.
The pattern should be printed and mounted onto a rigid and flat surface to produce the calibration object. The optimization problem assumes the board shape to be constant over all the calibration images, and any non-rigidity in the board breaks that assumption. Depending on the resolution of the camera and the amount of calibration accuracy required, even a slightly floppy chessboard can break things. Don't take this for granted.
Similarly, the optimization problem assumes that the calibration object is mostly flat. Some amount of deformation is inevitable, so mrcal does support a simple non-planar board model, with the board shape being computed as part of the calibration solve. In my experience, if you try to make sure the board is flat, this model is enough to fix most of the unavoidable small deflections. If care isn't taken to ensure the board is flat, the current simple deformation model is often insufficient to fully model the shape. Don't take this for granted either.
Calibrating wide lenses generally requires large chessboards, to fill the field of view as much as possible. I use a square chessboard roughly 1m per side to calibrate my fisheye lenses. A big chessboard such as this is never completely rigid or completely flat, so I have a chessboard backed by about 1 inch of aluminum honeycomb. This is expensive, but works well. The previous board I was using is backed with 2.5cm of foam instead. This is significantly cheaper, and still pretty good, but noticeably less flat and rigid. The board does warp with variations in temperature and humidity, but this is slow, and the shape is usually stable long-enough for a chessboard dance.
According to the dance study, we want close-ups that fill the imager, so we want a smaller board next to the camera in favor of a bigger board further away. The limiting factor here is the camera's depth of field: we want both the calibration-time close-up views and the working-range images to be in focus at the same time. If a too-small board is used, the imager-filling close-ups will be out of focus. Also, mrcal currently assumes that all projections are central (\(k \vec v\) projects to the same \(\vec q\) for all \(k\)). This is true for far-away observations for all lenses, but for some lenses this projection breaks down as we look closer-in. Depending on the lens, close-up observations may tickle this effect. Non-central projections will be supported in a future release of mrcal, and this effect still needs to be studied.
Image-capturing details
Now that we have a calibration object, this object needs to be shown to the camera(s). It is important that the images contain clear features. Motion blur or exposure issues will all cause bias in the resulting calibration.
If calibrating multiple cameras, mrcal will solve for all the intrinsics and extrinsics. There is one strong requirement on the captured images in this case: the images must be synchronized across all the cameras. This allows mrcal to assume that if camera A and camera B observed a chessboard at time T, then this chessboard was at exactly the same location when the two cameras saw it. Generally this means that either
- The cameras are wired to respond to a physical trigger signal
- The chessboard and cameras were physically fixed (on a tripod, say) at each time of capture
Some capture systems have a "software" trigger mode, but this is usually too loose to produce usable results. A similar consideration exists when using cameras with a rolling shutter. With such cameras it is imperative that everything remain stationary during image capture, even when only one camera is involved.
If synchronization or rolling shutter effects are in doubt, look at the residual
plot made by the mrcal-show-residuals-board-observation
tool (a sample and a
description appear below). These errors clearly show up as distinct patterns in
those plots.
Dancing
As shown in the dance study, the most useful observations to gather are
- close-ups: the chessboard should fill the whole frame as much as possible. Small chessboards next to the camera are preferred to larger chessboards further out; the limit is set by the depth of field.
- oblique views: tilt the board forward/back and left/right. I generally tilt by ~ 45 degrees. At a certain point the corners become indistinct and mrgingham starts having trouble, but depending on the lens, that point could come with quite a bit of tilt. A less dense chessboard eases this also, at the cost of requiring more board observations to get the same number of points.
- If calibrating multiple cameras, it is impossible to place a calibration object at a location where it's seen by all the cameras and where it's a close-up for all the cameras. So you should get close-ups for each camera individually, and also get observations common to multiple cameras, that aren't necessarily close-ups. The former will serve to define your camera intrinsics, and the latter will serve to define your extrinsics (geometry). Get just far-enough out to create the joint views. If usable joint views are missing, the extrinsics will be undefined, and the solver will complain about a "not positive definite" (singular in this case) Hessian.
A dataset composed primarily of tilted closeups produces good results.
If the model will be used to look at far-away objects, care must be taken to produce a reliable calibration at long ranges, not just at the short ranges where the chessboards were. Close-up chessboard views are the primary data needed to get good uncertainties at infinity, but quite often these will produce images that aren't in focus if the focus ring is set to the working range (infinity). See the dance study for detail. Cameras meant for outdoor stereo and/or wide lenses are susceptible to poor uncertainties at range even if the uncertainties are good closer in. In such cases, it is strongly recommended to re-run the dance study for your particular use case to get a sense of what kind of observations are required, and what kind of uncertainties can be expected. The current thought is that the best thing to do is to get close-up images even if they're out of focus. The blurry images will have a high uncertainty in the corner observatoins (hopefully without bias), but the uncertainty improvement that comes from the close-ups more than makes up for it. In these cases you usually need to get more observations than you normally would to bring down the uncertainties to an acceptable level.
It is better to have more data rather than less. mrgingham will throw away frames where no chessboard can be found, so it is perfectly reasonable to grab too many images with the expectation that they won't all end up being used in the computation.
I usually aim for about 100 usable frames, but you may get away with fewer, depending on your specific scenario. The mrcal uncertainty feedback (see below) will tell you if you need more data.
Naturally, intrinsics are accurate only in areas where chessboards were observed: chessboard observations on the left tell us little about lens behavior on the right. Thus it is imperative to cover the whole field of view during the chessboard dance. It is often tricky to get good data at the edges and corners of the imager, so care must be taken. Some chessboard detectors (mrgingham in particular) only report complete chessboards. This makes it extra-challenging to obtain good data at the edges: a small motion that pushes one chessboard corner barely out of bounds causes the whole observation to be discarded. It is thus very helpful to be able to see a live feed of the camera, as the images are being captured. In either case, visualizing the obtained chessboard detections is very useful to see if enough coverage was obtained.
Image file-naming convention
With monocular calibrations, there're no requirements on image filenames: use
whatever you like. If calibrating multiple synchronized cameras, however, the
image filenames would need to indicate what camera captured each image at which
time. I generally use frameFFF-cameraCCC.jpg
. Images with the same FFF
are
assumed to have been captured at the same instant in time, and CCC
identifies
the camera. Naming images in this way is sufficient to communicate these
mappings to mrcal.
Detecting corners
Any chessboard detector may be utilized. Most of my testing was done using mrgingham, so I go into more detail describing that approach.
Using mrgingham
Once mrgingham is installed or built from source, it can be run by calling the
mrgingham
executable. The sample in the tour of mrcal processes these images
to produce these chessboard corners like this:
mrgingham -j3 '*.JPG' > corners.vnl
mrgingham tries to handle a variety of lighting conditions, including varying illumination across the image, but the corners must exist in the image in some form.
At this time mrgingham returns only complete chessboard views: if even one corner of the chessboard couldn't be found, mrgingham will discard the entire image. Thus it takes care to get data at the edges and in the corners of the imager. A live preview of the captured images is essential.
Another requirement due to the design of mrgingham is that the board should be held with a flat edge parallel to the camera xz plane (parallel to the ground, usually). mrgingham looks for vertical and horizontal sequences of corners, but if the board is rotated diagonally, then none of these sequences are clearly "horizontal" or "vertical".
Using a non-mrgingham detector
If we use a grid detector other than mrgingham, we need to produce a compatible
corners.vnl
file. This is a vnlog
(text table) where each row describes a
single corner detection. This file should contain 3 or 4 columns. The first 3
columns are:
filename
: the path to a chessboard imagex
,y
: pixel coordinates of a detected corner
If a 4th column is present, it describes the detector's confidence in the detection of that particular corner. It may be either
level
: the decimation level of the detected corner. If the detector needed to cut down the image resolution to find this corner, we report that resolution here. Level-0 means "full-resolution", level-1 means "half-resolution", level-2 means "quarter-resolution" and so on. A level of "-" or <0 means "skip this point"; this is how incomplete board observations are specified. This "decimation level" interpretation is themrcal-calibrate-cameras
default. This column is reported by mrginghamweight
: how strongly to weight that corner. More confident detections take stronger weights. This should be inversely proportional to the standard deviation of the detected pixel coordinates. With decimation levels we have \(\mathrm{weight} = 2^{-\mathrm{level}}\). As before, a weight of "-" or <0 means "skip this point"; this is how incomplete board observations are specified. Select this "weight" interpretation withmrcal-calibrate-cameras --corners-cache-has-weights
If no 4th column is present, we assume an even weight of 1.0 for all the points.
The whole chessboard is described by a sequence of these corner detections, listed in a consistent grid order. Each chessboard image is represented by either \(N_\mathrm{width} N_\mathrm{height}\) observations records or a single record
FILENAME - - -
to represent images with no detected corners. An image with incomplete
detections should still contain \(N_\mathrm{width} N_\mathrm{height}\) records
in the same consistent order. The missing corners should be given with any x
,
y
, but with \(\mathrm{weight} \leq 0\) or -
. The record could be completely
null:
FILENAME - - -
Visualization
A sample run from the tour of mrcal follows.
Once we have a corners.vnl
from some chessboard detector, we can visualize it.
This is a simple vnlog
table:
$ < corners.vnl head -n5 ## generated with mrgingham -j3 *.JPG # filename x y level DSC_7305.JPG 3752.349000 168.802000 2 DSC_7305.JPG 3844.411234 150.264910 0 DSC_7305.JPG 3950.404000 132.480000 2
How well did we cover the imager? Did we get the edges and corners?
$ < corners.vnl \ vnl-filter -p x,y | \ feedgnuplot --domain --square
Looks like we did OK. It's a bit thin along the bottom edge, but not terrible.
It is very easy to miss getting usable data at the edges, so checking this is
highly recommended. If you have multiple cameras, check the coverage separately
for each one. This can be done by filtering the corners.vnl
to keep only the
data for the camera in question. For instance, if we're looking at the left
camera with images in files left-XXXXX.jpg
, you can replace the above
vnl-filter
command with vnl-filter 'filename ~ "left"' -p x,y
.
Insufficient coverage will be clearly picked up by the uncertainty reporting, so it's not strictly necessary to explicitly look at it, but doing so is good hygiene.
We can visualize individual detections like this:
$ f=DSC_7305.JPG $ < corners.vnl \ vnl-filter \ --perl \ "filename eq \"$f\"" \ -p x,y,size='2**(1-level)' | \ feedgnuplot \ --image $f \ --domain \ --square \ --tuplesizeall 3 \ --with 'points pt 7 ps variable'
The size of the circle indicates the detection weight. In this image many of the corners were detected at full-resolution (level-0), but some required downsampling for the detector to find them. This is indicated by smaller circles. The downsampled points have less precision, so they are weighed less in the optimization. How many images produced successful corner detections?
$ < corners.vnl vnl-filter --has x -p filename | uniq | grep -v '#' | wc -l 161
So here we have 161 images with detected corners. Most of the misses are probably images where the chessboard wasn't entirely in view, but some could be failures of mrgingham. In any case, 161 observations is usually plenty.
Computing a calibration
Here I follow the same calibration sequence as in the tour of mrcal, but with a bit more detail.
Let's compute the calibration using the mrcal-calibrate-cameras
tool:
mrcal-calibrate-cameras \ --corners-cache corners.vnl \ --lensmodel LENSMODEL_OPENCV8 \ --focal 1700 \ --object-spacing 0.077 \ --object-width-n 10 \ '*.JPG'
--corners-cache corners.vnl
says that the chessboard corner coordinates live in a file calledcorners.vnl
. This is the output of the corner detector. If this argument is omitted, or a non-existent file is given,mrcal-calibrate-cameras
will run mrgingham, and write the results into the given path. Thus the same command would be used to both compute the corners initially, and to reuse the pre-computed corners in subsequent runs.As described above, the
corners.vnl
file can come from any chessboard detector. If it's a detector that produces a 4th column of weights instead of a decimation level, pass in--corners-cache-has-weights
--lensmodel
specifies which lens model we're using for all the cameras. In this example we're using theLENSMODEL_OPENCV8
model. This works reasonably well for wide lenses. See the lens-model page for a description of the available models. The current recommendation is to use an opencv model (LENSMODEL_OPENCV5
for long lenses,LENSMODEL_OPENCV8
for wide lenses) initially, as a sanity check. And once that looks OK, to move toLENSMODEL_SPLINED_STEREOGRAPHIC
to get better accuracy and reliable uncertainty reporting. For very wide fisheye lenses,LENSMODEL_SPLINED_STEREOGRAPHIC
is the only model that will work at all.--focal 1700
provides the initial estimate for the camera focal lengths, in pixels. This doesn't need to be precise, but do try to get this roughly correct if possible. The focal length value to pass to--focal
(\(f_\mathrm{pixels}\)) can be derived using the stereographic model definition:
\[ f_\mathrm{pixels} = \frac{\mathrm{imager\_width\_pixels}}{4 \tan \frac{\mathrm{field\_of\_view\_horizontal}}{4}} \]
With longer lenses, the stereographic model is identical to the pinhole model. With very wide lenses, the stereographic model is the basis for the splined-stereographic model, so this expression should be a good initial estimate in all cases. Note that the manufacturer-specified "field of view" is usually poorly-defined: it's different in all directions, so use your best judgement. If only the focal length is available, keep in mind that the "focal length" of a wide lens is somewhat poorly-defined also. With a longer lens, we can assume pinhole behavior to get
\[ f_\mathrm{pixels} = f_\mathrm{mm} \frac{\mathrm{imager\_width\_pixels}}{\mathrm{imager\_width\_mm}} \]
Again, use your best judgement. This doesn't need to be exact, but getting a value in the ballpark makes life easier for the solver
--object-spacing
is the distance between neighboring corners in the chessboard. Even spacing, identical in both directions is assumed--object-width-n
is the horizontal corner count of the calibration object. In the example invocation above there is no--object-height-n
, somrcal-calibrate-cameras
assumes a square chessboard
After the options, mrcal-calibrate-cameras
takes globs describing the images.
One glob per camera is expected, and in the above example one glob was given:
'*.JPG'
. Thus this is a monocular solve. More cameras would imply more globs.
For instance a 2-camera calibration might take arguments
'frame*-camera0.png' 'frame*-camera1.png'
Note that these are globs, not filenames. So they need to be quoted or
escaped to prevent the shell from expanding them: hence '*.JPG'
and not
*.JPG
.
We could pass --explore
to drop into a REPL after the computation is done, so
that we can look around. The most common diagnostic images can be made by
running the mrcal-show-...
commandline tools on the generated
xxx.cameramodel
files, but --explore
can be useful to get more sophisticated
feedback.
The mrcal-calibrate-cameras
tool reports some high-level diagnostics, writes
the output model(s) to disk, and exits:
## initial solve: geometry only ## RMS error: 31.606057232034026 ## initial solve: geometry and LENSMODEL_STEREOGRAPHIC core only =================== optimizing everything except board warp from seeded intrinsics mrcal.c(5355): Threw out some outliers (have a total of 53 now); going again mrcal.c(5355): Threw out some outliers (have a total of 78 now); going again ## final, full optimization mrcal.c(5355): Threw out some outliers (have a total of 155 now); going again ## RMS error: 0.7086476918204073 RMS reprojection error: 0.7 pixels Worst residual (by measurement): 6.0 pixels Noutliers: 155 out of 16100 total points: 1.0% of the data calobject_warp = [-0.00104306 0.00051718] Wrote ./camera-0.cameramodel
The resulting model is renamed to opencv8.cameramodel
, and is available here.
This is a mrcal-native .cameramodel
file containing at least the lens
parameters and the geometry.
Let's sanity-check the results. We want to flag down any issues with the data that would violate the assumptions made by the solver.
The tool reports some diagnostics. As we can see, the final RMS reprojection error was 0.7 pixels. Of the 16100 corner observations (161 observations of the board with 10*10 = 100 points each), 155 didn't fit the model well-enough, and were thrown out as outliers.
High outlier counts or high reprojection errors would indicate high model errors: the model mrcal is using does not fit the data well. That would suggest some/all of these:
- Issues in the input data, such as incorrectly-detected chessboard corners, unsynchronized cameras, rolling shutter, motion blur, focus issues, unstable lens, etc.
- A badly-fitting lens model. For instance
LENSMODEL_OPENCV4
will not fit wide lenses. And only splined lens models will fit fisheye lenses all the way to the corners - A badly-fitting chessboard model: is the chessboard flat? If not, does it follow the deformation model well?
Outlier rejection will throw out the worst-fitting data, but these model errors will still cause an uncorrectable bias in the results. If at all possible, it is strongly recommended to fix whatever is causing the problem, and then to re-run the solve.
In this example the board deformation was computed as 1.0mm horizontally, and 0.5mm vertically in the opposite direction. That is a small deflection, and sounds reasonable. A way to validate this, would be to get another set of chessboard images, to rerun the solve, and compare the new flex values to the old ones. From experience, if things are working properly, these results are usually consistent.
What does the solve think about our geometry? Does it match reality?
mrcal-show-geometry \ opencv8.cameramodel \ --show-calobjects \ --unset key \ --set 'xyplane 0' \ --set 'view 80,30,1.5'
Here we see the axes of our camera (purple) situated at the reference coordinate system. In this solve, the camera coordinate system is the reference coordinate system; this would look more interesting with more cameras. In front of the camera (along the \(z\) axis) we can see the solved chessboard poses. There are a whole lot of them, and they're all sitting right in front of the camera with some heavy tilt. This matches with how this chessboard dance was performed (by following the guidelines set by the dance study).
Next, let's examine the residuals more closely. We have an overall RMS reprojection error value from above, but let's look at the full distribution of errors for all the cameras:
mrcal-show-residuals \ --histogram \ --set 'xrange [-4:4]' \ --unset key \ --binwidth 0.1 \ opencv8.cameramodel
We would like to see a normal distribution since that's what the noise model assumes. We do see this somewhat, but the central cluster is a bit over-populated. This is a violation of the noise model, but at this time I don't have a good sense of what this means. It's normal-ish, and there isn't a lot to do about this, so I will claim this is close-enough.
Let's look deeper. If there's anything really wrong with our data, then we should see it in the worst-fitting images. Let's ask the tool to see the worst one:
mrcal-show-residuals-board-observation \ --from-worst \ --vectorscale 100 \ --circlescale 0.5 \ --set 'cbrange [0:3]' \ opencv8.cameramodel \ 0
The residual vector for each chessboard corner in this observation is shown, scaled by a factor of 100 for legibility (the actual errors are tiny!) The circle color also indicates the magnitude of the errors. The size of each circle represents the weight given to that point. The weight is reduced for points that were detected at a lower resolution by the chessboard detector. Points thrown out as outliers are not shown at all. Note that we're showing the measurements which are a weighted pixel error: high pixels errors may be reported as a small residual if they had a low weight.
The few worst-fitting images are great at identifying common data-gathering issues. Zooming in at the worst point (easily identifiable by the color) will clearly show any motion blur or focus issues. Incorrectly-detected corners will be visible: they will be outliers or they will have a high error. Especially with lean models, the errors will be higher towards the edge of the imager: the lens models fit the worst there.
There should be no discernible pattern to the errors. In a perfect world, the model fits the observations, and the residuals display purely random noise. Any patterns in the errors indicate that the noise isn't random, and thus the model does not fit. This would violate the noise model, and would result in a bias when we ultimately use this calibration for projection. This bias is an unmodeled source of error, so we really want to push this down as far as we can. Getting rid of all such errors completely is impossible, but we should do our best. Out-of-sync camera observations show up as a systematic error vectors pointing in one direction; and the corresponding out-of-sync image would display equal and opposite errors. Rolling shutter effects would show a more complex, but clearly non-random pattern. An insufficiently-rich model of the world (lens behavior, chessboard shape, etc) results in clear patterns too.
Back to this image. In absolute terms, even this worst-fitting image fits really well. The RMS error of the errors in this image is 1.20 pixels. The residuals in this image look mostly reasonable. There is a pattern, however: the errors are mostly radial (point to/from the center).
As noted in the tour of mrcal, this particular issue is caused by an assumption of a central projection (assuming that all rays intersect at a single point). An experimental and not-entirely-complete support for noncentral projection in mrcal exists, and works much better. The same frame, fitted with a noncentral projection:
This will be included in a future release of mrcal.
One issue with lean models such as LENSMODEL_OPENCV8
, which is used here, is
that the radial distortion is never quite right, especially as we move further
and further away form the optical axis. We can clearly see this here in the
3rd-worst image:
mrcal-show-residuals-board-observation \ --from-worst \ --vectorscale 100 \ --circlescale 0.5 \ --set 'cbrange [0:3]' \ opencv8.cameramodel \ 2
This is clearly a problem that should be addressed. Using a splined lens model
instead of LENSMODEL_OPENCV8
makes this work, as seen in the tour of mrcal:
Let's look at the systematic errors in another way: let's look at all the residuals over all the observations, color-coded by their direction, ignoring the magnitudes:
mrcal-show-residuals \ --directions \ --unset key \ opencv8.cameramodel
As before, if the model fit the observations, the errors would represent random noise, and no color pattern would be discernible in these dots. Here we can clearly see lots of green in the top-right and top and left, lots of blue and magenta in the center, yellow at the bottom, and so on. This is not random noise, and is a very clear indication that this lens model is not able to fit this data.
Once again, a splined lens model resolves these biases, as seen in the tour of mrcal:
It would be good to have a quantitative measure of these systematic patterns. At this time mrcal doesn't provide an automated way to do that. This will be added in the future.
Finally it's useful to look at the projection uncertainty of the model. As noted
in the documentation, a rich model is necessary to get realistic uncertainty
estimates, so here we look at the LENSMODEL_SPLINED_STEREOGRAPHIC
result from
the tour of mrcal:
mrcal-show-projection-uncertainty splined.cameramodel --cbmax 1 --unset key
This is projection uncertainty at infinity. If we care about some other working
distance, this can be requested with mrcal-show-projection-uncertainty
--distance ...
.
The uncertainties are shown as a color-map along with contours. These are the expected value of projection errors based on noise in input corner observations. The lowest uncertainties are at roughly the range and imager locations of the the chessboard observations. Gaps in chessboard coverage will manifest as areas of high uncertainty.
These uncertainty metrics are complementary to the residual metrics described above. If we have too little data, the residuals will be low, but the uncertainties will be very high. The more data we gather, the lower the uncertainties.
If the residual plots don't show any unexplained errors, then the uncertainty plots are the authoritative gauge of calibration quality. If the residuals do suggest problems, then the uncertainty predictions will be overly-optimistic: the reported uncertainties would not include the extra sources of error.
A good way to validate the reported projection uncertainty is a
cross-validation. Multiple sets of chessboard dance images should be gathered
for the same cameras, and multiple independent calibration solves should be
made. Then a difference can be computed using the mrcal-show-projection-diff
tool. This difference should be in line with the uncertainty predictions. If any
unmodeled sources of error are present, the diffs would show high errors despite
low uncertainties.
In the end, if the residuals look reasonable, and the uncertainties look reasonable then we can use the resulting models, and expect to see the accuracy predicted by the reported projection uncertainty.