A tour of mrcal: stereo processing

Table of Contents

This is an overview of a more detailed discussion about dense stereo.

Previous

Stereo

We computed intrinsics from chessboards observations earlier, so let's use these for stereo processing.

I took several images off a catwalk over Figueroa Street in downtown Los Angeles. This is the view South along Figueroa Street. There're tall buildings ahead and on either side, which makes an interesting stereo scene.

The two images out of the camera look like this:

0.downsampled.jpg 1.downsampled.jpg

All the full-size images are available by clicking on an image.

The cameras are 7ft (2.1m) apart. In order to compute stereo images we need an accurate estimate of the relative geometry of the cameras, which we usually get as an output of the calibration process. Here I only had one physical camera, so I did something different:

  • I calibrated the one camera monocularly, to get its intrinsics. This is what we computed so far
  • I used this camera to take a pair of images from slightly different locations. This is the "stereo pair"
  • I used a separate tool to estimate the extrinsics from corresponding feature detections in the two images. This tool isn't a part of mrcal today, but will be made available in the future

Generating a stereo pair in this way works well to demo the stereo processing tools. It does break the triangulation uncertainty reporting, since we lose the uncertainty information in the extrinsics, however. Preserving this uncertainty, and propagating it through the recomputed extrinsics to triangulation is on the mrcal roadmap. The resulting models we use for stereo processing:

I then use the mrcal APIs to compute rectification maps, rectify the images, compute disparities and convert them to ranges. This is all done with the mrcal-stereo tool:

mrcal-stereo                     \
  --az-fov-deg 145               \
  --el-fov-deg 135               \
  --el0-deg    5                 \
  --disparity-range 0 400        \
  --sgbm-p1 600                  \
  --sgbm-p2 2400                 \
  --sgbm-uniqueness-ratio 5      \
  --sgbm-disp12-max-diff 1       \
  --sgbm-speckle-window-size 200 \
  --sgbm-speckle-range 2         \
  --valid-intrinsics-region      \
  splined-[01].cameramodel       \
  [01].jpg

The --sgbm-... options configure the OpenCV SGBM stereo matcher. Not specifying them uses the OpenCV defaults, which usually produces poor results.

The rectified images look like this:

rectified0-splined.downsampled.png rectified1-splined.downsampled.png

And the disparity and range images look like this:

disparity-splined.downsampled.png range-splined.downsampled.png

On a basic level, this is clearly working well.

If you've used other stereo libraries previously, these rectified images may look odd. This is due to using a transverse equirectangular projection for rectification instead of the more common pinhole projection (see the detailed stereo-processing documentation). This samples the azimuth and elevation angles evenly, which minimizes the visual distortion inside each image row. A side-effect is the the vertical expansion in the rectified image at the azimuth extremesm but since stereo matching works by correlating the rows independently, this is a good trade-off.

The generated rectified models are here and here. These define a rectified system with axes

  • \(x\): the baseline; from the origin of the left camera to the origin of the right camera
  • \(y\): down
  • \(z\): forward

mrcal-stereo selects the \(y\) and \(z\) axes to line up as much as possible with the geometry of the input models. Since the geometry of the actual cameras may not match the idealized rectified geometry (the "forward" view direction may not be orthogonal to the baseline), the usual expectations that \(c_x \approx \frac{\mathrm{imagerwidth}}{2}\) and \(c_y \approx \frac{\mathrm{imagerheight}}{2}\) are not necessarily true in the rectified system. Thus mrcal-stereo can visualize the models and the rectified field of view, for verification:

mrcal-stereo        \
  --az-fov-deg 145  \
  --el-fov-deg 135  \
  --el0-deg    5    \
  --set 'view 70,5' \
  --viz geometry    \
  splined-[01].cameramodel

stereo-rectified-system.svg

Here, the geometry is mostly nominal and the rectified view (indicated by the purple lines) does mostly lie along the \(z\) axis.

Stereo rectification outside of mrcal

As a toolkit, mrcal is fairly flexible, so I want to show how one could perform stereo processing using other tools a part of the pipeline, rather than letting mrcal-stereo do all the work.

What if we want to do our stereo processing with some other tool, and what if that tool doesn't support the splined model we want to use? We can use mrcal to reproject the image to whatever model we like, and then hand off the processed image and new models to that tool. Let's demonstrate with a pinhole model.

We can use the mrcal-reproject-image tool to reproject the images. Mapping fisheye images to a pinhole model introduces an unwinnable trade-off: the angular resolution changes dramatically as you move towards the edges of the image. At the edges the angular resolution becomes extreme, and you need far more pixels to represent the same arc in space as you do in the center. So you usually need to throw out high-information pixels in the center, and gain low-information pixels at the edges. The original image doesn't have more resolution at the edges, so we interpolate. Cutting off the edges (i.e. using a narrower lens) helps bring this back into balance.

So let's do this using two different focal lengths to illustrate the trade-off:

  • --scale-focal 0.35: fairly wide. Looks extreme in a pinhole projection
  • --scale-focal 0.6: not as wide. Looks more reasonable in a pinhole projection, but cuts off big chunks of the image at the edges
for scale in 0.35 0.6; do
  for c in 0 1; do
    mrcal-reproject-image       \
      --valid-intrinsics-region \
      --to-pinhole              \
      --scale-focal $scale      \
      splined-$c.cameramodel    \
      $c.jpg                    \
    | mrcal-to-cahvor           \
    > splined-$c.scale$scale.cahvor;

    mv $c-reprojected{,-scale$scale}.jpg;
  done
done

The wider pinhole resampling of the two images:

0-reprojected-scale0.35.downsampled.jpg 1-reprojected-scale0.35.downsampled.jpg

The narrower resampling of the two images:

0-reprojected-scale0.6.downsampled.jpg 1-reprojected-scale0.6.downsampled.jpg

We will use jplv (a stereo library used at NASA/JPL) to process these pinhole images into a disparity map, so I converted the models to the .cahvor file format, as that tool expects. The models:

Both clearly show the uneven resolution described above, with the wider image being far more extreme. I can now use these images to compute stereo with jplv:

for scale in 0.35 0.6; do           \
  stereo                            \
    --no-ran                        \
    --no-disp                       \
    --no-pre                        \
    --corr-width 17                 \
    --corr-height 5                 \
    --blob-area 10                  \
    --disp-min 0                    \
    --disp-max 400                  \
    splined-[01].scale$scale.cahvor \
    [01]-reprojected-scale$scale.jpg;

  for f in rect-left rect-right diag-left; do \
    mv 00-$f.png jplv-stereo-$f-scale$scale.png;
  done
done

The wide rectified images:

jplv-stereo-rect-left-scale0.35.downsampled.png jplv-stereo-rect-right-scale0.35.downsampled.png

The narrow rectified images:

jplv-stereo-rect-left-scale0.6.downsampled.png jplv-stereo-rect-right-scale0.6.downsampled.png

As most non-mrcal tools, jplv uses a pinhole model for rectification. So even if we gave it wide-angle images and a wide-angle-friendly camera model, we would still experience the issues raised above: the rectified images would have these problems.

The disparities computed by jplv look like this for wide images:

jplv-stereo-diag-left-scale0.35.downsampled.png

and for narrow images:

jplv-stereo-diag-left-scale0.6.downsampled.png

On a basic level, this is clearly working well also.

If jplv output its rectified pinhole models, we could do an apples-to-apples comparison using the SGBM correlator, as before. We would ask mrcal-stereo to accept jplv's rectification by passing mrcal-stereo --already-rectified.

Splitting a wide view into multiple narrow views

We just showed one way to use jplv to handle mrcal lenses, but we had to pay a price of degraded feature-matching accuracy due to unevenly-scaled rectified images. A way to do pinhole-rectified stereo while handling the geometric challenges of wide-angle lenses is to subdivide the wide field of view into multiple narrower virtual views. Then we'd have several narrow-angle stereo pairs instead of a single wide stereo pair, and each narrow pair can be processed with pinhole rectification. mrcal-stereo can do all the work. Let's look 45 degrees to the left:

mrcal-stereo                        \
  --rectification LENSMODEL_PINHOLE \
  --az-fov-deg 80                   \
  --el-fov-deg 80                   \
  --az0-deg    -45                  \
  --disparity-range 0 200           \
  --sgbm-p1 600                     \
  --sgbm-p2 2400                    \
  --sgbm-uniqueness-ratio 5         \
  --sgbm-disp12-max-diff 1          \
  --sgbm-speckle-window-size 200    \
  --sgbm-speckle-range 2            \
  --valid-intrinsics-region         \
  splined-[01].cameramodel          \
  [01].jpg

The pinhole rectified images:

rectified0-narrow-splined.downsampled.png rectified1-narrow-splined.downsampled.png

And the disparity:

disparity-narrow-splined.downsampled.png

This looks much better than the pinhole-rectified stereo from the full image. The rectified pinhole models and images could be passed to a different tool to complete the processing, if desired.

And we can see the rotated field of view when we visualize the rectified system:

mrcal-stereo                        \
  --rectification LENSMODEL_PINHOLE \
  --az-fov-deg 80                   \
  --el-fov-deg 80                   \
  --az0-deg    -45                  \
  --set 'view 70,5'                 \
  --viz geometry                    \
  splined-[01].cameramodel

stereo-rectified-system-narrow.svg

Range accuracy

So far we have looked at the results qualitatively by having a human eyeball the disparity maps to verify that they looked reasonable. But the goal here is to get ranges. So let's talk about triangulation routines.

Next

We're ready to talk about triangulation routines