mrcal conventions

Table of Contents

Terminology

Some terms in the documentation and sources can have ambiguous meanings, so I explicitly define them here

  • calibration: the procedure used to compute the lens parameters and geometry of a set of cameras. Usually this involves a stationary set of cameras observing a moving object.
  • calibration object or chessboard or board: these are used more or less interchangeably. They refer to the known-geometry object observed by the cameras, with those observations used as input during calibration
  • camera model: this is used to refer to the intrinsics (lens parameters) and extrinsics (geometry) together
  • confidence: a measure of dispersion of some estimator. Higher confidence implies less dispersion. Used to describe the calibration quality. Inverse of uncertainty
  • extrinsics: the pose of a camera in respect to some fixed coordinate system
  • frames: in the context of mrcal's optimization these refer to an array of poses of the observed chessboards
  • intrinsics: the parameters describing the behavior of a lens. The pose of the lens is independent of the intrinsics
  • measurements or residuals: used more or less interchangeably. This is the vector whose norm the optimization algorithm is trying to minimize. mrcal refers to this as \(\vec x\), and it primarily contains differences between observed and predicted pixel observations
  • projection: a mapping of a point in space in the camera coorddinate system to a pixel coordinate where that point would be observed by a given camera
  • pose: a 6 degree-of-freedom vector describing a position and an orientation
  • SFM: structure from motion. This is the converse of "calibration": we observe a stationary scene from a moving camera to compute the geometry of the scene
  • state: the vector of parameters that the mrcal optimization algorithm moves around as it searches for the optimum. mrcal refers to this as \(\vec p\)
  • uncertainty: a measure of dispersion of some estimator. Higher uncertainty implies more dispersion. Used to describe the calibration quality. Inverse of confidence
  • unprojection: a mapping of a pixel coordinate back to a point in space in the camera coordinate system that would produce an observation at that pixel. Unprojection is only unique up-to scale

Symbols

Geometry

  • \(\vec q\) is a 2-dimensional vector representing a pixel coordinate: \(\left( x,y \right)\)
  • \(\vec v\) is a 3-dimensional vector representing a direction \(\left( x,y,z \right)\) in space. \(\vec v\) is unique only up-to-length. In a camera's coordinate system we have \(\vec q = \mathrm{project}\left(\vec v \right)\)
  • \(\vec p\) is a 3-dimensional vector representing a point \(\left( x,y,z \right)\) in space. Unlike \(\vec v\), \(\vec p\) has a defined range. Like \(\vec v\) we have \(\vec q = \mathrm{project}\left(\vec p \right)\)
  • \(\vec u\) is a 2-dimensional vector representing a normalized stereographic projection

Optimization

The core of the mrcal calibration routine is a nonlinear least-squares optimization

\[ \min_{\vec p} E = \min_{\vec p} \left \Vert \vec x \left( \vec p \right) \right \Vert ^2 \]

Here we have

  • \(\vec p\) is the vector of parameters being optimized. It's clear from context whether \(\vec p\) refers to some point in space, or the optimization vector.
  • \(\vec x\) is the vector of measurements describing the error of the solution at some hypothesis \(\vec p\)
  • \(\vec E\) is the cost function being optimized. \(E \equiv \left \Vert \vec x \right \Vert ^2\)
  • \(\vec J\) is the jacobian matrix. This is the matrix \(\frac{ \partial \vec x }{ \partial \vec p }\). Usually this is large and sparse.

Camera coordinate system

mrcal uses right-handed coordinate systems. No convention is assumed for the world coordinate system. The canonical camera coordinate system has \(x\) and \(y\) as with pixel coordinates in an image: \(x\) is to the right and \(y\) is down. \(z\) is then forward to complete the right-handed system of coordinates.

Transformation naming

We describe transformations as mappings between a representation of a point in one coordinate system to a representation of the same point in another coordinate system. T_AB is a transformation from coordinate system B to coordinate system A. These chain together nicely, so if we know the transformation between A and B and between B and C, we can transform a point represented in C to A: x_A = T_AB T_BC x_C = T_AC x_C. And T_AC = T_AB T_BC.

Pose representation

Various parts of the toolkit have preferred representations of pose, and mrcal has functions to convert between them. Available representations are:

  • Rt: a (4,3) numpy array with a (3,3) rotation matrix concatenated with a (1,3) translation vector:

    \[ \begin{bmatrix} R \\ \vec t^T \end{bmatrix} \]

    This form is easy to work with, but there are implied constraints: most (4,3) numpy arrays are not valid Rt transformations.

  • rt: a (6,) numpy array with a (3,) vector representing a Rodrigues rotation concatenated with another (3,) vector, representing a translation:

    \[ \left[ \vec r^T \quad \vec t^T \right] \]

    This form requires more computations to deal with, but has no implied constraints: any (6,) numpy array is a valid rt transformation. Thus this is the form used inside the mrcal optimization routine.

Each of these represents a transformation rotate(x) + t.

Since a pose represents a transformation between two coordinate systems, the toolkit generally refers to a pose as something like Rt_AB, which is an Rt-represented transformation to convert a point from a representation in the coordinate system B to a representation in coordinate system A.

A Rodrigues rotation vector r represents a rotation of length(r) radians around an axis in the direction r. Converting between R and r is done via the Rodrigues rotation formula: using the mrcal.r_from_R() and mrcal.R_from_r() functions. For translating poses, not just rotations, use mrcal.rt_from_Rt() and mrcal.Rt_from_rt().

There're several functions to work with unit quaternions as a rotation representation, but they're lightly used, and exist only for compatibility with other tools. mrcal does not use quaternions.

Linear algebra

mrcal follows the usual linear algebra convention of column vectors. So rotating a vector using a rotation matrix is a matrix-vector multiplication operation: \(\vec b = R \vec a\) where both \(\vec a\) and \(\vec b\) are column vectors.

However, numpy print vectors (1-dimensional objects), as row vectors, so the code treats 1-dimensional objects as transposed vectors. In the code, the above rotation would be implemented equivalently: \(\vec b^T = \vec a^T R^T\). The mrcal.rotate_point_R() and mrcal.transform_point_Rt() functions serve to handle this transparently.

A similar issue is that numpy follows the linear algebra convention of indexing arrays with (index_column, index_row) and representing array sizes with (height, width). This runs against the other convention of referring to pixel coordinates as (x,y) and image dimensions as (width, height). Whenever possible, mrcal places the x coordinate first (as in the latter), but when interacting directly with numpy, it must place the y coordinate first. Usually x goes first. In any case, the choice being made is very clearly documented, so when in doubt, pay attention to the docs.

When computing gradients mrcal places the dependent variables in the leading dimensions, and the independent variables in the trailing dimensions. So if we have \(\vec b = R \vec a\), then

\[ R = \frac{ \partial \vec b }{ \partial \vec a } = \left[ \begin{aligned} \frac{ \partial b_0 }{ \partial \vec a } \\ \frac{ \partial b_1 }{ \partial \vec a } \\ \frac{ \partial b_2 }{ \partial \vec a } \end{aligned} \right] = \left[ \frac{ \partial \vec b }{ \partial a_0 } \quad \frac{ \partial \vec b }{ \partial a_1 } \quad \frac{ \partial \vec b }{ \partial a_2 } \right] \]

\(\frac{ \partial b_i }{ \partial \vec a }\) is a (1,3) row vector and \(\frac{ \partial \vec b }{ \partial a_i }\) is a (3,1) column vector.

Implementation

The core of mrcal is written in C, but most of the API is currently available in Python only. The Python wrapping is done via the numpysane_pywrap library, which makes it simple to make consistent Python interfaces and provides broadcasting support.

The Python layer uses numpy and numpysane heavily. All the plotting is done with gnuplotlib. OpenCV is used a bit, but only in the Python layer (their C APIs are gone, and the C++ APIs are unstable).