mrcal lens models
Table of Contents
mrcal supports a wide range of projection models. Some are intended to represent physical lenses, while others are idealized, useful for processing. All are referred to as lens models in the code and in the documentation. The representation details and projection behaviors are described here.
Representation
A mrcal
lens model represents a lens independent of its pose in space. A
lens model is fully specified by
- A model family (or type). This is something like
LENSMODEL_PINHOLE
orLENSMODEL_SPLINED_STEREOGRAPHIC
- Configuration parameters. This is a set of key/value pairs, which is required only by some model families. These values are not subject to optimization, and may affect how many optimization parameters are needed.
- Optimization parameters. These are the parameters that the optimization routine controls as it runs
Each model family also has some metadata key/value pairs associated with it. These are inherent properties of a model family, and are not settable. At the time of this writing there are 3 metadata keys:
has_core
: True if the first 4 optimization values are the "core": \(f_x\), \(f_y\), \(c_x\), \(c_y\)can_project_behind_camera
: True if this model is able to project vectors from behind the camera. If it cannot, thenmrcal.unproject()
will never reportz
< 0has_gradients
: True if this model has gradients implemented
In Python, the models are identified with a string LENSMODEL_XXX
where the
XXX
selects the specific model family and the configuration, if needed. A
sample model string with a configuration:
LENSMODEL_SPLINED_STEREOGRAPHIC_order=3_Nx=30_Ny=20_fov_x_deg=170
. The
configuration is the pairs order=3
, Nx=30
and so on. At this time, model
families that accept a configuration require it to be specified fully,
although optional configuration keys will be added soon. Today, calling Python
functions with LENSMODEL_SPLINED_STEREOGRAPHIC
or
LENSMODEL_SPLINED_STEREOGRAPHIC_order=3
will fail due to an incomplete
configuration. The mrcal.lensmodel_metadata_and_config()
function returns a
dict containing the metadata and configuration for a particular model string.
In C, the model family is selected with the mrcal_lensmodel_type_t
enum. The
elements are the same as the Python model names, but with MRCAL_
prepended. So
sample model from above has type MRCAL_LENSMODEL_SPLINED_STEREOGRAPHIC
. In C
the mrcal_lensmodel_t
structure contains the type and configuration. This
structure is thus an analogue the the model strings, as Python sees them. So a
number of C functions accepting mrcal_lensmodel_t
arguments are analogous to
Python functions taking model strings. For instance, the number of parameters
needed to fully describe a given model can be obtained by calling
mrcal.lensmodel_num_params()
in Python or mrcal_lensmodel_num_params()
in C.
Given a mrcal_lensmodel_t lensmodel
structure of type XXX
(i.e. if
lensmodel.type =
MRCAL_LENSMODEL_XXX=) then the configuration is available in
lensmodel.LENSMODEL_XXX__config
, which has type
mrcal_LENSMODEL_XXX__config_t
. The metadata is requestable by calling this
function:
mrcal_lensmodel_metadata_t mrcal_lensmodel_metadata( const mrcal_lensmodel_t* lensmodel );
Intrinsics core
Most models contain an "intrinsics core". These are 4 values that appear at the start of the parameter vector:
- \(f_x\): the focal length in the horizontal direction, in pixels
- \(f_y\): the focal length in the vertical direction, in pixels
- \(c_x\): the horizontal projection center, in pixels
- \(c_y\): the vertical projection center, in pixels
At this time all models contain a core.
Models
Currently all models represent a central projection: all observation rays intersect at a single point (the camera origin). So \(k \vec v\) projects to the same \(\vec q\) for all \(k\). This isn't strictly true for real-world lenses, so non-central projections will be supported in a future release of mrcal.
LENSMODEL_PINHOLE
This is the basic "pinhole" model with 4 parameters: the core. Projection of a point \(\vec p\) is defined as
\[\vec q = \left[ \begin{aligned} f_x \frac{p_x}{p_z} + c_x \\ f_y \frac{p_y}{p_z} + c_y \end{aligned} \right] \]
This model is defined only in front of the camera, and projects to infinity as we approach 90 degrees off the optical axis (\(p_z \rightarrow 0\)). Straight lines in space remain straight under this projection, and observations of the same plane by two pinhole cameras define a homography. This model can be used for stereo rectification, although it only works well with long lenses. Longer lenses tend to have roughly pinhole behavior, but no real-world lens follows this projection, so this exists for data processing only.
LENSMODEL_STEREOGRAPHIC
This is another trivial model that exists for data processing, and not to represent real lenses. Like the pinhole model, this has just the 4 core parameters.
To define the projection of a point \(\vec p\), let's define the angle off the optical axis:
\[ \theta \equiv \tan^{-1} \frac{\left| \vec p_{xy} \right|}{p_z} \]
then
\[ \vec u \equiv \frac{\vec p_{xy}}{\left| \vec p_{xy} \right|} 2 \tan\frac{\theta}{2} \]
and
\[\vec q = \left[ \begin{aligned} f_x u_x + c_x \\ f_y u_y + c_y \end{aligned} \right] \]
This model is able to project behind the camera, and has a single singularity: directly opposite the optical axis. mrcal refers to \(\vec u\) as the normalized stereographic projection; we get the projection \(\vec q = \vec u\) when \(f_x = f_y = 1\) and \(c_x = c_y = 0\)
Note that the pinhole model can be defined in the same way, except the pinhole model has \(\vec u \equiv \frac{\vec p_{xy}} {\left| \vec p_{xy} \right|} \tan \theta\). And we can thus see that for long lenses the pinhole model and the stereographic model function similarly: \(\tan \theta \approx 2 \tan \frac{\theta}{2}\) as \(\theta \rightarrow 0\)
LENSMODEL_LONLAT
This is a standard equirectangular projection. It's a trivial model useful not for representing lenses, but for describing the projection function of wide panoramic images. This works just like latitude an longitude on a globe, with a linear angular map on latitude and longitude. The 4 intrinsics core parameters are used to linearly map latitude, longitude to pixel coordinates. The full projection expression to map a camera-coordinate point \(\vec p\) to an image pixel \(\vec q\):
\[ \vec q = \left[ \begin{aligned} f_x \, \mathrm{lon} + c_x \\ f_y \, \mathrm{lat} + c_y \end{aligned} \right] = \left[ \begin{aligned} f_x \tan^{-1}\left(\frac{p_x}{p_z}\right) + c_x \\ f_y \sin^{-1}\left(\frac{p_y}{\left|\vec p\right|}\right) + c_y \end{aligned} \right] \]
So \(f_x\) and \(f_y\) specify the angular resolution, in pixels/radian.
For normal lens models the optical axis is at \(\vec p = \left[ \begin{aligned} 0 \\ 0 \\ 1 \end{aligned} \right]\), and projects to roughly the center of the image, roughly at \(\vec q = \left[ \begin{aligned} c_x \\ c_y \end{aligned} \right]\). This model has \(\mathrm{lon} = \mathrm{lat} = 0\) at the optical axis, which produces the same, usual \(\vec q\). However, this projection doesn't represent a lens and there is no "camera" or an "optical axis". The view may be centered anywhere, so \(c_x\) and \(c_y\) could be anything, even negative.
The special case of \(f_x = f_y = 1\) and \(c_x = c_y = 0\) (the default values in
mrcal.project_lonlat()
) produces a normalized equirectangular projection:
\[ \vec q_\mathrm{normalized} = \left[ \begin{aligned} \mathrm{lon} \\\mathrm{lat} \end{aligned} \right] \]
This projection has a singularity at the poles, approached as \(x \rightarrow 0\) and \(z \rightarrow 0\).
LENSMODEL_LATLON
This is a "transverse equirectangular projection". It works just like
LENSMODEL_LONLAT
, but rotated 90 degrees. So instead of a globe oriented as
usual with a vertical North-South axis, this projection has a horizontal
North-South axis. The projected \(x\) coordinate corresponds to the latitude, and
the projected \(y\) coordinate corresponds to the longitude.
As with LENSMODEL_LONLAT
, lenses do not follow this model. It is useful as the
core of a rectified view used in stereo processing. The full projection
expression to map a camera-coordinate point \(\vec p\) to an image pixel \(\vec q\):
\[ \vec q = \left[ \begin{aligned} f_x \, \mathrm{lat} + c_x \\ f_y \, \mathrm{lon} + c_y \end{aligned} \right] = \left[ \begin{aligned} f_x \sin^{-1}\left(\frac{p_x}{\left|\vec p\right|}\right) + c_x \\ f_y \tan^{-1}\left(\frac{p_y}{p_z}\right) + c_y \end{aligned} \right] \]
As with LENSMODEL_LONLAT
, \(f_x\) and \(f_y\) specify the angular resolution, in
pixels/radian. And \(c_x\) and \(c_y\) specify the projection at the optical axis
\(\vec p = \left[ \begin{aligned} 0 \\ 0 \\ 1 \end{aligned} \right]\).
The special case of \(f_x = f_y = 1\) and \(c_x = c_y = 0\) (the default values in
mrcal.project_latlon()
) produces a normalized transverse equirectangular
projection:
\[ \vec q_\mathrm{normalized} = \left[ \begin{aligned} \mathrm{lat} \\\mathrm{lon} \end{aligned} \right] \]
This projection has a singularity at the poles, approached as \(y \rightarrow 0\) and \(z \rightarrow 0\).
LENSMODEL_OPENCV4
, LENSMODEL_OPENCV5
, LENSMODEL_OPENCV8
, LENSMODEL_OPENCV12
These are simple parametric models that have the given number of "distortion" parameters in addition to the 4 core parameters. The projection behavior is described in the OpenCV documentation. These do a reasonable job in representing real-world lenses, and they're compatible with many other tools. The projection function is
\begin{align*} \vec P &\equiv \frac{\vec p_{xy}}{p_z} \\ r &\equiv \left|\vec P\right| \\ \vec P_\mathrm{radial} &\equiv \frac{ 1 + k_0 r^2 + k_1 r^4 + k_4 r^6}{ 1 + k_5 r^2 + k_6 r^4 + k_7 r^6} \vec P \\ \vec P_\mathrm{tangential} &\equiv \left[ \begin{aligned} 2 k_2 P_0 P_1 &+ k_3 \left(r^2 + 2 P_0^2 \right) \\ 2 k_3 P_0 P_1 &+ k_2 \left(r^2 + 2 P_1^2 \right) \end{aligned}\right] \\ \vec P_\mathrm{thinprism} &\equiv \left[ \begin{aligned} k_8 r^2 + k_9 r^4 \\ k_{10} r^2 + k_{11} r^4 \end{aligned}\right] \\ \vec q &= \vec f_{xy} \left( \vec P_\mathrm{radial} + \vec P_\mathrm{tangential} + \vec P_\mathrm{thinprism} \right) + \vec c_{xy} \end{align*}
The parameters are \(k_i\). For any N-parameter OpenCV model the higher-order
terms \(k_i\) for \(i \geq N\) are all 0. So the tangential distortion terms exist for
all the models, but the thin-prism terms exist only for LENSMODEL_OPENCV12
.
The radial distortion is a polynomial in LENSMODEL_OPENCV4
and
LENSMODEL_OPENCV5
, but a rational for the higher-order models.
Practically-speaking LENSMODEL_OPENCV8
works decently well for wide lenses.
For non-fisheye lenses, LENSMODEL_OPENCV4
and LENSMODEL_OPENCV5
work ok. I'm
sure scenarios where LENSMODEL_OPENCV12
is beneficial exist, but I haven't
come across them.
LENSMODEL_CAHVOR
mrcal supports LENSMODEL_CAHVOR
, a lens model used in a number of tools at
JPL. The LENSMODEL_CAHVOR
model has 5 "distortion" parameters in addition to
the 4 core parameters. This support exists only for compatibility, and there's
no reason to use it otherwise. This model is described in:
LENSMODEL_CAHVORE
This is an extended flavor of LENSMODEL_CAHVOR
to support wider lenses. The
LENSMODEL_CAHVORE
model has 8 "distortion" parameters in addition to the 4
core parameters. CAHVORE is only partially supported:
- the parameter gradients aren't implemented, so it isn't currently possible to solve for a CAHVORE model
- this is a noncentral projection, so more infrastructure is required to make
unproject()
do something reasonable. Today, a CAHVOREunproject()
will report a point at an arbitrary range. This model is described in:
LENSMODEL_SPLINED_STEREOGRAPHIC_...
This is a stereographic model with correction factors. It is mrcal's attempt to model real-world lens behavior with more fidelity than the usual parametric models make possible.
To compute a projection using this model, we first compute the normalized
stereographic projection \(\vec u\) as in the LENSMODEL_STEREOGRAPHIC
definition
above:
\[ \theta \equiv \tan^{-1} \frac{\left| \vec p_{xy} \right|}{p_z} \]
\[ \vec u \equiv \frac{\vec p_{xy}}{\left| \vec p_{xy} \right|} 2 \tan\frac{\theta}{2} \]
Then we use \(\vec u\) to look-up a \(\Delta \vec u\) using two separate splined surfaces:
\[ \Delta \vec u \equiv \left[ \begin{aligned} \Delta u_x \left( \vec u \right) \\ \Delta u_y \left( \vec u \right) \end{aligned} \right] \]
and we then define the rest of the projection function:
\[\vec q = \left[ \begin{aligned} f_x \left( u_x + \Delta u_x \right) + c_x \\ f_y \left( u_y + \Delta u_y \right) + c_y \end{aligned} \right] \]
The \(\Delta \vec u\) are the off-stereographic terms. If \(\Delta \vec u = 0\), we get a plain stereographic projection.
Much more detail about this model is available on the splined models page.