mrcal lens models

Representation
Intrinsics core
Models

mrcal supports a wide range of projection models. Some are intended to represent physical lenses, while others are idealized, useful for processing. All are referred to as lens models in the code and in the documentation. The representation details and projection behaviors are described here.

Representation

A mrcal lens model represents a lens independent of its pose in space. A lens model is fully specified by

A model family (or type). This is something like LENSMODEL_PINHOLE or LENSMODEL_SPLINED_STEREOGRAPHIC
Configuration parameters. This is a set of key/value pairs, which is required only by some model families. These values are not subject to optimization, and may affect how many optimization parameters are needed.
Optimization parameters. These are the parameters that the optimization routine controls as it runs

Each model family also has some metadata key/value pairs associated with it. These are inherent properties of a model family, and are not settable. At the time of this writing there are 3 metadata keys:

has_core: True if the first 4 optimization values are the "core": \(f_x\), \(f_y\), \(c_x\), \(c_y\)
can_project_behind_camera: True if this model is able to project vectors from behind the camera. If it cannot, then mrcal.unproject() will never report z < 0
has_gradients: True if this model has gradients implemented

In Python, the models are identified with a string LENSMODEL_XXX where the XXX selects the specific model family and the configuration, if needed. A sample model string with a configuration: LENSMODEL_SPLINED_STEREOGRAPHIC_order=3_Nx=30_Ny=20_fov_x_deg=170. The configuration is the pairs order=3, Nx=30 and so on. At this time, model families that accept a configuration require it to be specified fully, although optional configuration keys will be added soon. Today, calling Python functions with LENSMODEL_SPLINED_STEREOGRAPHIC or LENSMODEL_SPLINED_STEREOGRAPHIC_order=3 will fail due to an incomplete configuration. The mrcal.lensmodel_metadata_and_config() function returns a dict containing the metadata and configuration for a particular model string.

In C, the model family is selected with the mrcal_lensmodel_type_t enum. The elements are the same as the Python model names, but with MRCAL_ prepended. So sample model from above has type MRCAL_LENSMODEL_SPLINED_STEREOGRAPHIC. In C the mrcal_lensmodel_t structure contains the type and configuration. This structure is thus an analogue the the model strings, as Python sees them. So a number of C functions accepting mrcal_lensmodel_t arguments are analogous to Python functions taking model strings. For instance, the number of parameters needed to fully describe a given model can be obtained by calling mrcal.lensmodel_num_params() in Python or mrcal_lensmodel_num_params() in C. Given a mrcal_lensmodel_t lensmodel structure of type XXX (i.e. if lensmodel.type = MRCAL_LENSMODEL_XXX=) then the configuration is available in lensmodel.LENSMODEL_XXX__config, which has type mrcal_LENSMODEL_XXX__config_t. The metadata is requestable by calling this function:

mrcal_lensmodel_metadata_t mrcal_lensmodel_metadata( const mrcal_lensmodel_t* lensmodel );

Intrinsics core

Most models contain an "intrinsics core". These are 4 values that appear at the start of the parameter vector:

\(f_x\): the focal length in the horizontal direction, in pixels
\(f_y\): the focal length in the vertical direction, in pixels
\(c_x\): the horizontal projection center, in pixels
\(c_y\): the vertical projection center, in pixels

At this time all models contain a core.

Models

Currently all models represent a central projection: all observation rays intersect at a single point (the camera origin). So \(k \vec v\) projects to the same \(\vec q\) for all \(k\). This isn't strictly true for real-world lenses, so non-central projections will be supported in a future release of mrcal.

`LENSMODEL_PINHOLE`

This is the basic "pinhole" model with 4 parameters: the core. Projection of a point \(\vec p\) is defined as

\[\vec q = \left[ \begin{aligned} f_x \frac{p_x}{p_z} + c_x \\ f_y \frac{p_y}{p_z} + c_y \end{aligned} \right] \]

This model is defined only in front of the camera, and projects to infinity as we approach 90 degrees off the optical axis (\(p_z \rightarrow 0\)). Straight lines in space remain straight under this projection, and observations of the same plane by two pinhole cameras define a homography. This model can be used for stereo rectification, although it only works well with long lenses. Longer lenses tend to have roughly pinhole behavior, but no real-world lens follows this projection, so this exists for data processing only.

`LENSMODEL_STEREOGRAPHIC`

This is another trivial model that exists for data processing, and not to represent real lenses. Like the pinhole model, this has just the 4 core parameters.

To define the projection of a point \(\vec p\), let's define the angle off the optical axis:

\[ \theta \equiv \tan^{-1} \frac{\left| \vec p_{xy} \right|}{p_z} \]

then

\[ \vec u \equiv \frac{\vec p_{xy}}{\left| \vec p_{xy} \right|} 2 \tan\frac{\theta}{2} \]

and

\[\vec q = \left[ \begin{aligned} f_x u_x + c_x \\ f_y u_y + c_y \end{aligned} \right] \]

This model is able to project behind the camera, and has a single singularity: directly opposite the optical axis. mrcal refers to \(\vec u\) as the normalized stereographic projection; we get the projection \(\vec q = \vec u\) when \(f_x = f_y = 1\) and \(c_x = c_y = 0\)

Note that the pinhole model can be defined in the same way, except the pinhole model has \(\vec u \equiv \frac{\vec p_{xy}} {\left| \vec p_{xy} \right|} \tan \theta\). And we can thus see that for long lenses the pinhole model and the stereographic model function similarly: \(\tan \theta \approx 2 \tan \frac{\theta}{2}\) as \(\theta \rightarrow 0\)

`LENSMODEL_LONLAT`

This is a standard equirectangular projection. It's a trivial model useful not for representing lenses, but for describing the projection function of wide panoramic images. This works just like latitude an longitude on a globe, with a linear angular map on latitude and longitude. The 4 intrinsics core parameters are used to linearly map latitude, longitude to pixel coordinates. The full projection expression to map a camera-coordinate point \(\vec p\) to an image pixel \(\vec q\):

\[ \vec q = \left[ \begin{aligned} f_x \, \mathrm{lon} + c_x \\ f_y \, \mathrm{lat} + c_y \end{aligned} \right] = \left[ \begin{aligned} f_x \tan^{-1}\left(\frac{p_x}{p_z}\right) + c_x \\ f_y \sin^{-1}\left(\frac{p_y}{\left|\vec p\right|}\right) + c_y \end{aligned} \right] \]

So \(f_x\) and \(f_y\) specify the angular resolution, in pixels/radian.

For normal lens models the optical axis is at \(\vec p = \left[ \begin{aligned} 0 \\ 0 \\ 1 \end{aligned} \right]\), and projects to roughly the center of the image, roughly at \(\vec q = \left[ \begin{aligned} c_x \\ c_y \end{aligned} \right]\). This model has \(\mathrm{lon} = \mathrm{lat} = 0\) at the optical axis, which produces the same, usual \(\vec q\). However, this projection doesn't represent a lens and there is no "camera" or an "optical axis". The view may be centered anywhere, so \(c_x\) and \(c_y\) could be anything, even negative.

The special case of \(f_x = f_y = 1\) and \(c_x = c_y = 0\) (the default values in mrcal.project_lonlat()) produces a normalized equirectangular projection:

\[ \vec q_\mathrm{normalized} = \left[ \begin{aligned} \mathrm{lon} \\\mathrm{lat} \end{aligned} \right] \]

This projection has a singularity at the poles, approached as \(x \rightarrow 0\) and \(z \rightarrow 0\).

`LENSMODEL_LATLON`

This is a "transverse equirectangular projection". It works just like LENSMODEL_LONLAT, but rotated 90 degrees. So instead of a globe oriented as usual with a vertical North-South axis, this projection has a horizontal North-South axis. The projected \(x\) coordinate corresponds to the latitude, and the projected \(y\) coordinate corresponds to the longitude.

As with LENSMODEL_LONLAT, lenses do not follow this model. It is useful as the core of a rectified view used in stereo processing. The full projection expression to map a camera-coordinate point \(\vec p\) to an image pixel \(\vec q\):

\[ \vec q = \left[ \begin{aligned} f_x \, \mathrm{lat} + c_x \\ f_y \, \mathrm{lon} + c_y \end{aligned} \right] = \left[ \begin{aligned} f_x \sin^{-1}\left(\frac{p_x}{\left|\vec p\right|}\right) + c_x \\ f_y \tan^{-1}\left(\frac{p_y}{p_z}\right) + c_y \end{aligned} \right] \]

As with LENSMODEL_LONLAT, \(f_x\) and \(f_y\) specify the angular resolution, in pixels/radian. And \(c_x\) and \(c_y\) specify the projection at the optical axis \(\vec p = \left[ \begin{aligned} 0 \\ 0 \\ 1 \end{aligned} \right]\).

The special case of \(f_x = f_y = 1\) and \(c_x = c_y = 0\) (the default values in mrcal.project_latlon()) produces a normalized transverse equirectangular projection:

\[ \vec q_\mathrm{normalized} = \left[ \begin{aligned} \mathrm{lat} \\\mathrm{lon} \end{aligned} \right] \]

This projection has a singularity at the poles, approached as \(y \rightarrow 0\) and \(z \rightarrow 0\).

`LENSMODEL_OPENCV4`, `LENSMODEL_OPENCV5`, `LENSMODEL_OPENCV8`, `LENSMODEL_OPENCV12`

These are simple parametric models that have the given number of "distortion" parameters in addition to the 4 core parameters. The projection behavior is described in the OpenCV documentation. These do a reasonable job in representing real-world lenses, and they're compatible with many other tools. The projection function is

\begin{align*} \vec P &\equiv \frac{\vec p_{xy}}{p_z} \\ r &\equiv \left|\vec P\right| \\ \vec P_\mathrm{radial} &\equiv \frac{ 1 + k_0 r^2 + k_1 r^4 + k_4 r^6}{ 1 + k_5 r^2 + k_6 r^4 + k_7 r^6} \vec P \\ \vec P_\mathrm{tangential} &\equiv \left[ \begin{aligned} 2 k_2 P_0 P_1 &+ k_3 \left(r^2 + 2 P_0^2 \right) \\ 2 k_3 P_0 P_1 &+ k_2 \left(r^2 + 2 P_1^2 \right) \end{aligned}\right] \\ \vec P_\mathrm{thinprism} &\equiv \left[ \begin{aligned} k_8 r^2 + k_9 r^4 \\ k_{10} r^2 + k_{11} r^4 \end{aligned}\right] \\ \vec q &= \vec f_{xy} \left( \vec P_\mathrm{radial} + \vec P_\mathrm{tangential} + \vec P_\mathrm{thinprism} \right) + \vec c_{xy} \end{align*}

The parameters are \(k_i\). For any N-parameter OpenCV model the higher-order terms \(k_i\) for \(i \geq N\) are all 0. So the tangential distortion terms exist for all the models, but the thin-prism terms exist only for LENSMODEL_OPENCV12. The radial distortion is a polynomial in LENSMODEL_OPENCV4 and LENSMODEL_OPENCV5, but a rational for the higher-order models. Practically-speaking LENSMODEL_OPENCV8 works decently well for wide lenses. For non-fisheye lenses, LENSMODEL_OPENCV4 and LENSMODEL_OPENCV5 work ok. I'm sure scenarios where LENSMODEL_OPENCV12 is beneficial exist, but I haven't come across them.

`LENSMODEL_CAHVOR`

mrcal supports LENSMODEL_CAHVOR, a lens model used in a number of tools at JPL. The LENSMODEL_CAHVOR model has 5 "distortion" parameters in addition to the 4 core parameters. This support exists only for compatibility, and there's no reason to use it otherwise. If you don't know what this is, you don't need it.

`LENSMODEL_CAHVORE`

This is an extended flavor of LENSMODEL_CAHVOR to support wider lenses. The LENSMODEL_CAHVORE model has 8 "distortion" parameters in addition to the 4 core parameters. CAHVORE is only partially supported:

the parameter gradients aren't implemented, so it isn't currently possible to solve for a CAHVORE model
there're questions about whether CAHVORE projections are invariant to scaling and whether they should be invariant to scaling. These need to be answered conclusively before using the CAHVORE implementation in mrcal. Talk to Dima.

`LENSMODEL_SPLINED_STEREOGRAPHIC_...`

This is a stereographic model with correction factors. It is mrcal's attempt to model real-world lens behavior with more fidelity than the usual parametric models make possible. The current approach is one of many possible ways to define a rich projection function based on splined surfaces. Improved representations could potentially be implemented in the future.

Note that the idea of using a very rich representation to model lens behavior has been described in literature (for instance here and here). However, every paper I've seen models unprojection (mapping pixels to observation vectors) instead of projection (observation vectors to pixels). Projection is the usual direction, employed by every other lens model in every other toolkit, so following the papers would require rewriting lots and lots of code specifically to support this one model. mrcal's rich representation models projection, so this new model fits into the same framework as all the other models, and all the higher-level logic (differencing, uncertainty quantification, etc) continues to work with no changes.

To compute a projection using this new model, we first compute the normalized stereographic projection \(\vec u\) as in the LENSMODEL_STEREOGRAPHIC definition above:

\[ \theta \equiv \tan^{-1} \frac{\left| \vec p_{xy} \right|}{p_z} \]

\[ \vec u \equiv \frac{\vec p_{xy}}{\left| \vec p_{xy} \right|} 2 \tan\frac{\theta}{2} \]

Then we use \(\vec u\) to look-up a \(\Delta \vec u\) using two separate splined surfaces:

\[ \Delta \vec u \equiv \left[ \begin{aligned} \Delta u_x \left( \vec u \right) \\ \Delta u_y \left( \vec u \right) \end{aligned} \right] \]

and we then define the rest of the projection function:

\[\vec q = \left[ \begin{aligned} f_x \left( u_x + \Delta u_x \right) + c_x \\ f_y \left( u_y + \Delta u_y \right) + c_y \end{aligned} \right] \]

The \(\Delta \vec u\) are the off-stereographic terms. If \(\Delta \vec u = 0\), we get a plain stereographic projection.

The surfaces \(\Delta u_x\) and \(\Delta u_y\) are defined as B-splines, parametrized by the values of the "knots" (control points). These knots are arranged in a fixed grid in the space of \(\vec u\), with the grid density and extent set by the model configuration (i.e. not subject to optimization). The values at each knot are set in the intrinsics vector, and this controls the projection function.

B-spline details

We're using B-splines primarily for their local support properties: moving a knot only affects the surface in the immediate neighborhood of that knot. This makes our jacobian sparse, which is critical for rapid convergence of the optimization problem. Conversely, at any \(\vec u\), the sampled value of the spline depends only on the knots in the immediate neighborhood of \(\vec u\). A script used in the development of the splined model shows this effect:

We sampled a curve defined by two sets of cubic B-spline control points: they're the same except the one point in the center differs. We can see that the two spline-interpolated functions produce a different value only in the vicinity of the tweaked control point. And we can clearly see the radius of the effect: the sampled value of a cubic B-spline depends on the two control points on either side of the query point. A quadratic B-spline has a narrower effect: the sampled value depends on the nearest control point, and one neighboring control point on either side.

This plot shows a 1-dimension splined curve, but we have splined surfaces. To sample a spline surface:

Arrange the control points in a grid
Sample each row independently as a separate 1-dimensional B-spline
Use these row samples as control points to sample the resulting column

Processing columns first and then rows produces the same result. The same dev script from above checks this.

Splined models: fidelity and uncertainties

This splined model has many more parameters, and is far more flexible than the lean parametric models (all the other currently-supported lens models). This has several significant effects.

These models are much more capable of representing the behavior of real-world lenses than the lean models: at a certain level of precision the parametric models are always wrong. The tour of mrcal shows a real-world fit using LENSMODEL_OPENCV8 and a real-world fit using a splined model, where we can clearly see that the splined model fits the data better.

The higher parameter counts do result in higher reported uncertainties (see the tour of mrcal for examples). This is a good thing: the lean models report uncertainty estimates that are low, but do not match reality. The higher uncertainty estimates from the splined models are truthful, however. This is because the uncertainty estimate algorithm constrains the lenses to the space that's representable by a given lens model, which is a constraint that only exists on paper. Since mrcal reports the covariance matrix of any projection operation, the uncertainty can be used to pass/fail a calibration or the covariance can be propagated to whatever is using the model.

It is thus recommended to use splined models even for long lenses, which do fit the pinhole model more or less.

Splined model configuration

The configuration selects the model parameters that aren't subject to optimization. These define the high-level behavior of the spline. We have:

order: the degree of each 1D polynomial. This is either 2 (quadratic splines, C1 continuous) or 3 (cubic splines, C2 continuous). At this time, 3 (cubic splines) is recommended. I haven't yet done a thorough study on this, but empirical results tell me that quadratic splines are noticeably less flexible, and require a denser spline to fit as well as a comparable cubic spline.
Nx and Ny: The spline density. We have a Nx by Ny grid of evenly-spaced control points. The ratio of this spline grid should be selected to match the aspect ratio of the imager. Inside each spline patch we effectively have a lean parametric model. Choosing a too-sparse spline spacing will result in larger patches, which aren't able to fit real-world lenses. Choosing a denser spacing results in more parameters and a more flexible model at the cost of needing more data and slower computations. No data-driven method of choosing Nx or Ny is available at this time, but Nx=30_Ny=20 appears to work well for some very wide lenses I tested with.
fov_x_deg: The horizontal field of view, in degrees. Selects the region in the space of \(\vec u\) where the spline is well-defined. fov_y_deg is not included in the configuration: it is assumed proportional with Ny and Nx. fov_x_deg is used to compute a knots_per_u quantity, and this is applied in both the horizontal and vertical directions.

Field-of-view selection

The few knots around any given \(\vec u\) define the value of the spline function there. These knots define "spline patch", a polynomial surface that fully represents the spline function in the neighborhood of \(\vec u\). As the sample point \(\vec u\) moves around, different spline patches, selected by a different set of knots are selected. With cubic splines, each spline patch is defined by the local 4x4 grid of knots (16 knots total). With quadratic splines, each spline is defined by a 3x3 grid.

Since the knots are defined on a fixed grid, it is possible to try to sample the spline beyond the region where the knots are defined (beyond our declared field of view). In this case we use the nearest spline patch, which could sit far away from \(\vec u\). So here we still use a 4x4 grid of knots to define the spline patch, but \(\vec u\) no longer sits in the middle of these knots: because we're past the edge, and the preferred knots aren't available.

This produces continuous projections everywhere, at the cost of reduced function flexibility at the edges: the effective edge patches could be much larger that the internal patches. We can control this by selecting a wide-enough fov_x_deg to cover the full field-of-view of the camera. We then wouldn't be querying the spline beyond the knots, since those regions in space are out-of-view of the lens. fov_x_deg should be large enough to cover the field of view, but not so wide to waste knots representing invisible space. It is recommended to estimate this from the datasheet of the lens, and then to run a calibration. The mrcal-show-splined-model-correction tool can then be used to compare the valid-intrinsics region (area with sufficient calibration data) against the bounds of the spline-in-bounds region.

Splined model optimization practicalities

Core redundancy
As can be seen in the projection function above, the splined stereographic model parameters contain splined correction factors \(\Delta \vec u\) and an intrinsics core. The core variables are largely redundant with \(\Delta \vec u\): for any perturbation in the core, we can achieve a very similar change in projection behavior by bumping \(\Delta \vec u\) in a specific way. As a result, if we allow the optimization algorithm to control all the variables, the system will be under-determined, and the optimization routine will fail: complaining about a "not positive definite" (singular in this case) Hessian. At best the Hessian will be slightly non-singular, but convergence will be slow. To resolve this, the recommended sequence for optimizing splined stereographic models is:
1. Fit the best LENSMODEL_STEREOGRAPHIC model to compute an estimate of the intrinsics core
2. Refine that solution with a full LENSMODEL_SPLINED_STEREOGRAPHIC_... model, using the core we just computed, and asking the optimizer to lock down those core values. This can be done by setting the do_optimize_intrinsics_core bit to 0 in the mrcal_problem_selections_t structure passed to mrcal_optimize() in C (or passing do_optimize_intrinsics_core=False to mrcal.optimize() in Python).
This is what the mrcal-calibrate-cameras tool does.
Regularization

Another issue that comes up is the treatment of areas in the imager where no points were observed. By design, each parameter of the splined model controls projection from only a small area in space. So what happens to parameters controlling an area where no data was gathered? We have no data to suggest to the solver what values these parameters should take: they don't affect the cost function at all. Trying to optimize such a problem will result in a singular Hessian and complaints from the solver. We address this issue with regularization, to lightly pull all the \(\Delta \vec u\) terms to 0.

Another, related effect, is the interaction of extrinsics and intrinsics. Without special handling, splined stereographic solutions often produce a roll of the camera (rotation around the optical axis) to be compensated by a curl in the \(\Delta \vec u\) vector field. This isn't wrong per se, but is an unintended effect that's nice to eliminate. It looks really strange when a motion in the \(x\) direction in the camera coordinate system doesn't result in the projection moving in its \(x\) direction. We use regularization to handle this effect as well. Instead of pulling all the values of \(\Delta \vec u\) towards 0 evenly, we pull the \(\Delta \vec u\) acting tangentially much more than those acting radially. This asymmetry serves to eliminate any unnecessary curl in \(\Delta \vec u\).

Regardless of direction, these regularization terms are light. The weights are chosen to be small-enough to not noticeably affect the optimization in its fitting of the data. This may be handled differently in the future.
Uglyness at the edges

An unwelcome property of the projection function defined above, is that it allows aphysical, nonmonotonic behavior to be represented. For instance, let's look at the gradient in one particular direction.
\begin{aligned} q_x &= f_x \left( u_x + \Delta u_x \right) + c_x \\ \frac{\mathrm{d}q_x}{\mathrm{d}u_x} &\propto 1 + \frac{\mathrm{d}\Delta u_x}{\mathrm{d}u_x} \end{aligned}
We would expect \(\frac{\mathrm{d}q_x}{\mathrm{d}u_x}\) to always be positive, but as we can see, here that depends on \(\frac{\mathrm{d}\Delta u_x}{\mathrm{d}u_x}\), which could be anything since \(\Delta u_x\) is an arbitrary splined function. Most of the time we're fitting the spline into real data, so the real-world monotonic behavior will be represented. However, near the edges quite often no data is available, so the behavior is driven by regularization, and we're very likely to hit this non-monotonic behavior there. This produces very alarming-looking spline surfaces, but it's not really a problem: we get aphysical behavior in areas where we don't have data, so we have no expectations of reliable projections there. The mrcal-show-splined-model-correction tool visualizes either the bounds of the valid-intrinsics region or the bounds of the imager. In many cases we have no calibration data near the imager edges, so the spline is determined by regularization in that area, and we get odd-looking knot layouts and imager contours. A better regularization scheme or (better yet) a better representation would address this. See a tour of mrcal for examples.

mrcal lens models

Table of Contents

Representation

Intrinsics core

Models

LENSMODEL_PINHOLE

LENSMODEL_STEREOGRAPHIC

LENSMODEL_LONLAT

LENSMODEL_LATLON

LENSMODEL_OPENCV4, LENSMODEL_OPENCV5, LENSMODEL_OPENCV8, LENSMODEL_OPENCV12

LENSMODEL_CAHVOR

LENSMODEL_CAHVORE

LENSMODEL_SPLINED_STEREOGRAPHIC_...