A tour of mrcal: stereo processing



This is an overview of a more detailed discussion about dense stereo.

We computed intrinsics from chessboards observations earlier, so let's use these for stereo processing.

I took several images off a catwalk over Figueroa Street in downtown Los Angeles. This is the view South along Figueroa Street. There're tall buildings ahead and on either side, which makes an interesting stereo scene.

The two images out of the camera look like this:

0.downsampled.jpg 1.downsampled.jpg

All the full-size images are available by clicking on an image.

The cameras are ~ 2m apart. In order to compute stereo images we need an accurate estimate of the relative geometry of the cameras, which we usually get as an output of the calibration process. Here I only had one physical camera, so I did something different:

  • I calibrated the one camera monocularly, to get its intrinsics. This is what we computed thus far
  • I used this camera to take a pair of images from slightly different locations. This is the "stereo pair"
  • I used a separate tool to estimate the extrinsics from corresponding feature detections in the two images. For the purposes of this document the details of this don't matter

Generating a stereo pair in this way works well to demo the stereo processing tools. It does break the triangulation uncertainty reporting, since we lose the uncertainty information in the extrinsics, however. Preserving this uncertainty, and propagating it through the recomputed extrinsics to triangulation is on the mrcal roadmap. The resulting models we use for stereo processing:

I then use the mrcal-stereo tool to compute rectification maps, rectify the images, compute disparities and convert them to ranges:

mrcal-stereo                     \
  --az-fov-deg 170               \
  --az0-deg    10                \
  --el-fov-deg 95                \
  --el0-deg    -15               \
  --disparity-range 0 300        \
  --sgbm-p1 600                  \
  --sgbm-p2 2400                 \
  --sgbm-uniqueness-ratio 5      \
  --sgbm-disp12-max-diff  1      \
  --sgbm-speckle-window-size 100 \
  --sgbm-speckle-range 2         \
  --force-grayscale              \
  --clahe                        \
  [01].cameramodel               \
  [01].jpg                       \

The --sgbm-... options configure the OpenCV SGBM stereo matcher. Not specifying them uses the OpenCV defaults, which usually produces poor results.

The rectified images look like this:

0-rectified.downsampled.png 1-rectified.downsampled.png

And the disparity and range images look like this:

0-disparity.downsampled.png 0-range.downsampled.png

From a glance, this is clearly working well.

If you've used other stereo libraries previously, these rectified images may look odd: these are fisheye images, so the usual pinhole rectification would pull the corners out towards infinity. We don't see that here because mrcal-stereo uses the transverse equirectangular projection for rectification by default. This scheme samples the azimuth and elevation angles evenly, which minimizes the visual distortion inside each image row. Rectified models for this scene are generated by mrcal-stereo here and here. See the stereo-processing documentation for details.


The rectified models define a system with axes

  • \(x\): the baseline; from the origin of the left camera to the origin of the right camera
  • \(y\): down
  • \(z\): forward

mrcal-stereo selects the \(y\) and \(z\) axes to line up as much as possible with the geometry of the input models. Since the geometry of the actual cameras may not match the idealized rectified geometry (the "forward" view direction may not be orthogonal to the baseline), the usual expectations that \(c_x \approx \frac{\mathrm{imagerwidth}}{2}\) and \(c_y \approx \frac{\mathrm{imagerheight}}{2}\) are not necessarily true in the rectified system. Thus it's valuable to visualize the models and the rectified field of view, for verification. mrcal-stereo can do that:

mrcal-stereo        \
  --az-fov-deg 170  \
  --az0-deg    10   \
  --el-fov-deg 95   \
  --el0-deg    -15  \
  --set 'view 70,5' \
  --viz geometry    \


Here, the geometry is mostly nominal and the rectified view (indicated by the purple lines) does mostly lie along the \(z\) axis. This is a 3D plot that can be rotated by the user when mrcal-stereo --viz geometry is executed, making the geometry clearer.


The mrcal-stereo tool contains a visualizer that allows the user to quickly examine the stereo scene, evaluating the epipolar line alignment, disparities, ranges, etc. It can be invoked by runnung mrcal-stereo --viz stereo .... After panning/zooming, pressing r to display ranges (not disparities), and clicking on the Wilshire Grand building we get this:


The computed range at that pixel is 965.52m. My estimated ground truth range is 873m. According to the linearized sensitivity reported by the tool, this corresponds to an error of \(\frac{966\mathrm{m}-873\mathrm{m}}{241.3\frac{\mathrm{m}}{\mathrm{pixel}}} = 0.39 \mathrm{pixels}\). Usually, stereo matching errors are in the 1/2 - 1/3 pixel range, so this is in-line with expectations.

Here we used dense stereo processing to compute a range map over the whole image. This is slow, and a lot of the time you can get away with computing ranges at a sparse set of points instead. So let's talk about triangulation routines.


We're ready to talk about triangulation routines