blob: 5cd199c0a72b8fe265be806c75c49c70a987f9d3 [file] [log] [blame]
Sameer Agarwal8ed29a72012-06-07 17:04:25 -07001%!TEX root = ceres-solver.tex
Sameer Agarwald3eaa482012-05-29 23:47:57 -07002\chapter{Modeling}
Keir Mierle8ebb0732012-04-30 23:09:08 -07003\label{chapter:api}
Keir Mierle8ebb0732012-04-30 23:09:08 -07004\section{\texttt{CostFunction}}
5Given parameter blocks $\left[x_{i_1}, \hdots , x_{i_k}\right]$, a \texttt{CostFunction} is responsible for computing
Sameer Agarwald3eaa482012-05-29 23:47:57 -07006a vector of residuals and if asked a vector of Jacobian matrices, i.e., given $\left[x_{i_1}, \hdots , x_{i_k}\right]$, compute the vector $f_i\left(x_{i_1},\hdots,x_{k_i}\right)$ and the matrices
Keir Mierle8ebb0732012-04-30 23:09:08 -07007\begin{equation}
8J_{ij} = \frac{\partial}{\partial x_{i_j}}f_i\left(x_{i_1},\hdots,x_{k_i}\right),\quad \forall j = i_1,\hdots, i_k
9\end{equation}
Keir Mierle8ebb0732012-04-30 23:09:08 -070010\begin{minted}{c++}
11class CostFunction {
12 public:
13 virtual bool Evaluate(double const* const* parameters,
14 double* residuals,
15 double** jacobians) = 0;
16 const vector<int16>& parameter_block_sizes();
17 int num_residuals() const;
18
19 protected:
20 vector<int16>* mutable_parameter_block_sizes();
21 void set_num_residuals(int num_residuals);
22};
23\end{minted}
24
25The signature of the function (number and sizes of input parameter blocks and number of outputs)
26is stored in \texttt{parameter\_block\_sizes\_} and \texttt{num\_residuals\_} respectively. User
27code inheriting from this class is expected to set these two members with the
28corresponding accessors. This information will be verified by the Problem
29when added with \texttt{Problem::AddResidualBlock}.
30
31The most important method here is \texttt{Evaluate}. It implements the residual and Jacobian computation.
32
33\texttt{parameters} is an array of pointers to arrays containing the various parameter blocks. parameters has the same number of elements as parameter\_block\_sizes\_. Parameter blocks are in the same order as parameter\_block\_sizes\_.
34
35
36\texttt{residuals} is an array of size \texttt{num\_residuals\_}.
37
38
Sameer Agarwald3eaa482012-05-29 23:47:57 -070039\texttt{jacobians} is an array of size \texttt{parameter\_block\_sizes\_} containing pointers to storage for Jacobian matrices corresponding to each parameter block. The Jacobian matrices are in the same order as \texttt{parameter\_block\_sizes\_}. \texttt{jacobians[i]} is an array that contains \texttt{num\_residuals\_} $\times$ \texttt{parameter\_block\_sizes\_[i]} elements. Each Jacobian matrix is stored in row-major order, i.e.,
Keir Mierle8ebb0732012-04-30 23:09:08 -070040
41\begin{equation}
Sameer Agarwald3eaa482012-05-29 23:47:57 -070042\texttt{jacobians[i][r * parameter\_block\_size\_[i] + c]} =
43%\frac{\partial}{\partial x_{ic}} f_{r}\left(x_{1},\hdots, x_{k}\right)
44\frac{\partial \texttt{residual[r]}}{\partial \texttt{parameters[i][c]}}
Keir Mierle8ebb0732012-04-30 23:09:08 -070045\end{equation}
46
47If \texttt{jacobians} is \texttt{NULL}, then no derivatives are returned; this is the case when computing cost only. If \texttt{jacobians[i]} is \texttt{NULL}, then the Jacobian matrix corresponding to the $i^{\textrm{th}}$ parameter block must not be returned, this is the case when the a parameter block is marked constant.
48
49\section{\texttt{SizedCostFunction}}
50If the size of the parameter blocks and the size of the residual vector is known at compile time (this is the common case), Ceres provides \texttt{SizedCostFunction}, where these values can be specified as template parameters.
51\begin{minted}{c++}
52template<int kNumResiduals,
53 int N0 = 0, int N1 = 0, int N2 = 0, int N3 = 0, int N4 = 0, int N5 = 0>
54class SizedCostFunction : public CostFunction {
55 public:
56 virtual bool Evaluate(double const* const* parameters,
57 double* residuals,
58 double** jacobians) = 0;
59};
60\end{minted}
61In this case the user only needs to implement the \texttt{Evaluate} method.
62
63\section{\texttt{AutoDiffCostFunction}}
64But even defining the \texttt{SizedCostFunction} can be a tedious affair if complicated derivative computations are involved. To this end Ceres provides automatic differentiation.
65
66To get an auto differentiated cost function, you must define a class with a
67 templated \texttt{operator()} (a functor) that computes the cost function in terms of
68 the template parameter \texttt{T}. The autodiff framework substitutes appropriate
69 \texttt{Jet} objects for T in order to compute the derivative when necessary, but
70 this is hidden, and you should write the function as if T were a scalar type
71 (e.g. a double-precision floating point number).
72
73 The function must write the computed value in the last argument (the only
74 non-\texttt{const} one) and return true to indicate success.
75
76 For example, consider a scalar error $e = k - x^\top y$, where both $x$ and $y$ are
77 two-dimensional vector parameters and $k$ is a constant. The form of this error, which is the
78 difference between a constant and an expression, is a common pattern in least
79 squares problems. For example, the value $x^\top y$ might be the model expectation
80 for a series of measurements, where there is an instance of the cost function
81 for each measurement $k$.
82
83 The actual cost added to the total problem is $e^2$, or $(k - x^\top y)^2$; however,
84 the squaring is implicitly done by the optimization framework.
85
86 To write an auto-differentiable cost function for the above model, first
87 define the object
88\begin{minted}{c++}
89class MyScalarCostFunction {
90 MyScalarCostFunction(double k): k_(k) {}
91 template <typename T>
92 bool operator()(const T* const x , const T* const y, T* e) const {
93 e[0] = T(k_) - x[0] * y[0] + x[1] * y[1]
94 return true;
95 }
96
97 private:
98 double k_;
99};
100\end{minted}
101
102Note that in the declaration of \texttt{operator()} the input parameters \texttt{x} and \texttt{y} come
103 first, and are passed as const pointers to arrays of \texttt{T}. If there were three
104 input parameters, then the third input parameter would come after \texttt{y}. The
105 output is always the last parameter, and is also a pointer to an array. In
106 the example above, \texttt{e} is a scalar, so only \texttt{e[0]} is set.
107
108 Then given this class definition, the auto differentiated cost function for
109 it can be constructed as follows.
110
111\begin{minted}{c++}
112CostFunction* cost_function
113 = new AutoDiffCostFunction<MyScalarCostFunction, 1, 2, 2>(
114 new MyScalarCostFunction(1.0)); ^ ^ ^
115 | | |
116 Dimension of residual ------+ | |
117 Dimension of x ----------------+ |
118 Dimension of y -------------------+
119\end{minted}
120
Sameer Agarwal76460392012-05-06 22:02:19 -0700121In this example, there is usually an instance for each measurement of k.
Keir Mierle8ebb0732012-04-30 23:09:08 -0700122
123In the instantiation above, the template parameters following
124 \texttt{MyScalarCostFunction}, \texttt{<1, 2, 2>} describe the functor as computing a
125 1-dimensional output from two arguments, both 2-dimensional.
126
127 The framework can currently accommodate cost functions of up to 6 independent
128 variables, and there is no limit on the dimensionality of each of them.
129
130 \textbf{WARNING 1} Since the functor will get instantiated with different types for
131 \texttt{T}, you must convert from other numeric types to \texttt{T} before mixing
132 computations with other variables of type \texttt{T}. In the example above, this is
133 seen where instead of using \texttt{k\_} directly, \texttt{k\_} is wrapped with \texttt{T(k\_)}.
134
135 \textbf{WARNING 2} A common beginner's error when first using \texttt{AutoDiffCostFunction} is to get the sizing wrong. In particular, there is a tendency to
136 set the template parameters to (dimension of residual, number of parameters)
137 instead of passing a dimension parameter for {\em every parameter block}. In the
138 example above, that would be \texttt{<MyScalarCostFunction, 1, 2>}, which is missing
139 the 2 as the last template argument.
140
Sameer Agarwald3eaa482012-05-29 23:47:57 -0700141\subsection{Theory \& Implementation}
142TBD
143
Keir Mierle8ebb0732012-04-30 23:09:08 -0700144\section{\texttt{NumericDiffCostFunction}}
145To get a numerically differentiated cost function, define a subclass of
146\texttt{CostFunction} such that the \texttt{Evaluate} function ignores the jacobian
147parameter. The numeric differentiation wrapper will fill in the jacobians array
Sameer Agarwal76460392012-05-06 22:02:19 -0700148 if necessary by repeatedly calling the \texttt{Evaluate} method with
Keir Mierle8ebb0732012-04-30 23:09:08 -0700149small changes to the appropriate parameters, and computing the slope. For
150performance, the numeric differentiation wrapper class is templated on the
151concrete cost function, even though it could be implemented only in terms of
152the virtual \texttt{CostFunction} interface.
153\begin{minted}{c++}
154template <typename CostFunctionNoJacobian,
155 NumericDiffMethod method = CENTRAL, int M = 0,
156 int N0 = 0, int N1 = 0, int N2 = 0, int N3 = 0, int N4 = 0, int N5 = 0>
157class NumericDiffCostFunction
158 : public SizedCostFunction<M, N0, N1, N2, N3, N4, N5> {
159};
160\end{minted}
161
162The numerically differentiated version of a cost function for a cost function
163can be constructed as follows:
164\begin{minted}{c++}
165CostFunction* cost_function
166 = new NumericDiffCostFunction<MyCostFunction, CENTRAL, 1, 4, 8>(
167 new MyCostFunction(...), TAKE_OWNERSHIP);
168\end{minted}
169where \texttt{MyCostFunction} has 1 residual and 2 parameter blocks with sizes 4 and 8
170respectively. Look at the tests for a more detailed example.
171
172The central difference method is considerably more accurate at the cost of
173twice as many function evaluations than forward difference. Consider using
174central differences begin with, and only after that works, trying forward
175difference to improve performance.
176
177\section{\texttt{LossFunction}}
178 For least squares problems where the minimization may encounter
179 input terms that contain outliers, that is, completely bogus
180 measurements, it is important to use a loss function that reduces
181 their influence.
182
183 Consider a structure from motion problem. The unknowns are 3D
184 points and camera parameters, and the measurements are image
185 coordinates describing the expected reprojected position for a
186 point in a camera. For example, we want to model the geometry of a
187 street scene with fire hydrants and cars, observed by a moving
188 camera with unknown parameters, and the only 3D points we care
189 about are the pointy tippy-tops of the fire hydrants. Our magic
190 image processing algorithm, which is responsible for producing the
191 measurements that are input to Ceres, has found and matched all
192 such tippy-tops in all image frames, except that in one of the
193 frame it mistook a car's headlight for a hydrant. If we didn't do
194 anything special the
195 residual for the erroneous measurement will result in the
196 entire solution getting pulled away from the optimum to reduce
197 the large error that would otherwise be attributed to the wrong
198 measurement.
199
200 Using a robust loss function, the cost for large residuals is
201 reduced. In the example above, this leads to outlier terms getting
202 down-weighted so they do not overly influence the final solution.
203
204\begin{minted}{c++}
205class LossFunction {
206 public:
207 virtual void Evaluate(double s, double out[3]) const = 0;
208};
209\end{minted}
210
211The key method is \texttt{Evaluate}, which given a non-negative scalar \texttt{s}, computes
212\begin{align}
213 \texttt{out} = \begin{bmatrix}\rho(s), & \rho'(s), & \rho''(s)\end{bmatrix}
214\end{align}
215
216Here the convention is that the contribution of a term to the cost function is given by $\frac{1}{2}\rho(s)$, where $s = \|f_i\|^2$. Calling the method with a negative value of $s$ is an error and the implementations are not required to handle that case.
217
218Most sane choices of $\rho$ satisfy:
219\begin{align}
220 \rho(0) &= 0\\
221 \rho'(0) &= 1\\
222 \rho'(s) &< 1 \text{ in the outlier region}\\
223 \rho''(s) &< 0 \text{ in the outlier region}
224\end{align}
225so that they mimic the squared cost for small residuals.
226
227\subsection{Scaling}
228Given one robustifier $\rho(s)$
229 one can change the length scale at which robustification takes
230 place, by adding a scale factor $a > 0$ which gives us $\rho(s,a) = a^2 \rho(s / a^2)$ and the first and second derivatives as $\rho'(s / a^2)$ and $(1 / a^2) \rho''(s / a^2)$ respectively.
231
232
233\begin{figure}[hbt]
234\includegraphics[width=\textwidth]{loss.pdf}
235\caption{Shape of the various common loss functions.}
236\label{fig:loss}
237\end{figure}
238
239
240The reason for the appearance of squaring is that $a$ is in the units of the residual vector norm whereas $s$ is a squared norm. For applications it is more convenient to specify $a$ than
241its square.
242
243Here are some common loss functions implemented in Ceres. For simplicity we described their unscaled versions. Figure~\ref{fig:loss} illustrates their shape graphically.
244
245\begin{align}
246 \rho(s)&=s \tag{\texttt{NullLoss}}\\
247 \rho(s) &= \begin{cases}
248 s & s \le 1\\
249 2 \sqrt{s} - 1 & s > 1
250 \end{cases} \tag{\texttt{HuberLoss}}\\
251 \rho(s) &= 2 (\sqrt{1+s} - 1) \tag{\texttt{SoftLOneLoss}}\\
252 \rho(s) &= \log(1 + s) \tag{\texttt{CauchyLoss}}
253\end{align}
254
Sameer Agarwald3eaa482012-05-29 23:47:57 -0700255\subsection{Theory \& Implementation}
256TBD
257
Keir Mierle8ebb0732012-04-30 23:09:08 -0700258\section{\texttt{LocalParameterization}}
259Sometimes the parameters $x$ can overparameterize a problem. In
260that case it is desirable to choose a parameterization to remove
261the null directions of the cost. More generally, if $x$ lies on a
262manifold of a smaller dimension than the ambient space that it is
263embedded in, then it is numerically and computationally more
264effective to optimize it using a parameterization that lives in
265the tangent space of that manifold at each point.
266
267For example, a sphere in three dimensions is a two dimensional
268manifold, embedded in a three dimensional space. At each point on
269the sphere, the plane tangent to it defines a two dimensional
270tangent space. For a cost function defined on this sphere, given a
271point $x$, moving in the direction normal to the sphere at that
272point is not useful. Thus a better way to parameterize a point on
273a sphere is to optimize over two dimensional vector $\Delta x$ in the
274tangent space at the point on the sphere point and then "move" to
275the point $x + \Delta x$, where the move operation involves projecting
276back onto the sphere. Doing so removes a redundant dimension from
277the optimization, making it numerically more robust and efficient.
278
279More generally we can define a function
280\begin{equation}
281 x' = \boxplus(x, \Delta x),
282\end{equation}
283where $x'$ has the same size as $x$, and $\Delta x$ is of size less
284than or equal to $x$. The function $\boxplus$, generalizes the
285definition of vector addition. Thus it satisfies the identity
286\begin{equation}
287 \boxplus(x, 0) = x,\quad \forall x.
288\end{equation}
289
290Instances of \texttt{LocalParameterization} implement the $\boxplus$ operation and its derivative with respect to $\Delta x$ at $\Delta x = 0$.
291
292\begin{minted}{c++}
293class LocalParameterization {
294 public:
295 virtual ~LocalParameterization() {}
296 virtual bool Plus(const double* x,
297 const double* delta,
298 double* x_plus_delta) const = 0;
299 virtual bool ComputeJacobian(const double* x, double* jacobian) const = 0;
300 virtual int GlobalSize() const = 0;
301 virtual int LocalSize() const = 0;
302};
303\end{minted}
304
305\texttt{GlobalSize} is the dimension of the ambient space in which the parameter block $x$ lives. \texttt{LocalSize} is the size of the tangent space that $\Delta x$ lives in. \texttt{Plus} implements $\boxplus(x,\Delta x)$ and $\texttt{ComputeJacobian}$ computes the Jacobian matrix
306\begin{equation}
307 J = \left . \frac{\partial }{\partial \Delta x} \boxplus(x,\Delta x)\right|_{\Delta x = 0}
308\end{equation}
309in row major form.
310
311A trivial version of $\boxplus$ is when delta is of the same size as $x$
312and
313
314\begin{equation}
315 \boxplus(x, \Delta x) = x + \Delta x
316\end{equation}
317
318A more interesting case if $x$ is a two dimensional vector, and the
319user wishes to hold the first coordinate constant. Then, $\Delta x$ is a
320scalar and $\boxplus$ is defined as
321
322\begin{equation}
323 \boxplus(x, \Delta x) = x + \left[ \begin{array}{c} 0 \\ 1
324 \end{array} \right] \Delta x
325\end{equation}
326
327\texttt{SubsetParameterization} generalizes this construction to hold any part of a parameter block constant.
328
329
330Another example that occurs commonly in Structure from Motion problems
331is when camera rotations are parameterized using a quaternion. There,
332it is useful only to make updates orthogonal to that 4-vector defining
333the quaternion. One way to do this is to let $\Delta x$ be a 3
334dimensional vector and define $\boxplus$ to be
335
336\begin{equation}
337 \boxplus(x, \Delta x) =
338\left[
339\cos(|\Delta x|), \frac{\sin\left(|\Delta x|\right)}{|\Delta x|} \Delta x
340\right] * x
341\label{eq:quaternion}
342\end{equation}
343The multiplication between the two 4-vectors on the right hand
344side is the standard quaternion product. \texttt{QuaternionParameterization} is an implementation of~\eqref{eq:quaternion}.
345
Sameer Agarwald3eaa482012-05-29 23:47:57 -0700346\clearpage
347
Keir Mierle8ebb0732012-04-30 23:09:08 -0700348\section{\texttt{Problem}}
349\begin{minted}{c++}
350class Problem {
351 public:
352 struct Options {
353 Options();
354 Ownership cost_function_ownership;
355 Ownership loss_function_ownership;
356 Ownership local_parameterization_ownership;
357 };
358
359 Problem();
360 explicit Problem(const Options& options);
361 ~Problem();
362
363 ResidualBlockId AddResidualBlock(CostFunction* cost_function,
364 LossFunction* loss_function,
365 const vector<double*>& parameter_blocks);
366
367 void AddParameterBlock(double* values, int size);
368 void AddParameterBlock(double* values,
369 int size,
370 LocalParameterization* local_parameterization);
371
372 void SetParameterBlockConstant(double* values);
373 void SetParameterBlockVariable(double* values);
374 void SetParameterization(double* values,
375 LocalParameterization* local_parameterization);
376
377 int NumParameterBlocks() const;
378 int NumParameters() const;
379 int NumResidualBlocks() const;
380 int NumResiduals() const;
381};
382\end{minted}
383
384The \texttt{Problem} objects holds the robustified non-linear least squares problem~\eqref{eq:ceresproblem}. To create a least squares problem, use the \texttt{Problem::AddResidualBlock} and \texttt{Problem::AddParameterBlock} methods.
385
386For example a problem containing 3 parameter blocks of sizes 3, 4 and 5
387respectively and two residual blocks of size 2 and 6:
388
389\begin{minted}{c++}
390double x1[] = { 1.0, 2.0, 3.0 };
391double x2[] = { 1.0, 2.0, 3.0, 5.0 };
392double x3[] = { 1.0, 2.0, 3.0, 6.0, 7.0 };
393
394Problem problem;
395problem.AddResidualBlock(new MyUnaryCostFunction(...), x1);
396problem.AddResidualBlock(new MyBinaryCostFunction(...), x2, x3);
397\end{minted}
398
399
400\texttt{AddResidualBlock} as the name implies, adds a residual block to the problem. It adds a cost function, an optional loss function, and connects the cost function to a set of parameter blocks.
401
402The cost
403 function carries with it information about the sizes of the
404 parameter blocks it expects. The function checks that these match
405 the sizes of the parameter blocks listed in \texttt{parameter\_blocks}. The
406 program aborts if a mismatch is detected. \texttt{loss\_function} can be
407 \texttt{NULL}, in which case the cost of the term is just the squared norm
408 of the residuals.
409
410 The user has the option of explicitly adding the parameter blocks
411 using \texttt{AddParameterBlock}. This causes additional correctness
412 checking; however, \texttt{AddResidualBlock} implicitly adds the parameter
413 blocks if they are not present, so calling \texttt{AddParameterBlock}
414 explicitly is not required.
415
416
417 \texttt{Problem} by default takes ownership of the
418 \texttt{cost\_function} and \texttt{loss\_function pointers}. These objects remain
419 live for the life of the \texttt{Problem} object. If the user wishes to
420 keep control over the destruction of these objects, then they can
421 do this by setting the corresponding enums in the \texttt{Options} struct.
422
423
424 Note that even though the Problem takes ownership of \texttt{cost\_function}
425 and \texttt{loss\_function}, it does not preclude the user from re-using
426 them in another residual block. The destructor takes care to call
427 delete on each \texttt{cost\_function} or \texttt{loss\_function} pointer only once,
428 regardless of how many residual blocks refer to them.
429
430\texttt{AddParameterBlock} explicitly adds a parameter block to the \texttt{Problem}. Optionally it allows the user to associate a LocalParameterization object with the parameter block too. Repeated calls with the same arguments are ignored. Repeated
431calls with the same double pointer but a different size results in undefined behaviour.
432
433You can set any parameter block to be constant using
434
435\texttt{Problem::SetParameterBlockConstant}
436
437and undo this using
438
439\texttt{Problem::SetParameterBlockVariable}.
440
441In fact you can set any number of parameter blocks to be constant, and Ceres is smart enough to figure out what part of the problem you have constructed depends on the parameter blocks that are free to change and only spends time solving it. So for example if you constructed a problem with a million parameter blocks and 2 million residual blocks, but then set all but one parameter blocks to be constant and say only 10 residual blocks depend on this one non-constant parameter block. Then the computational effort Ceres spends in solving this problem will be the same if you had defined a problem with one parameter block and 10 residual blocks.
442
443 \texttt{Problem} by default takes ownership of the
444 \texttt{cost\_function}, \texttt{loss\_function} and \\ \texttt{local\_parameterization} pointers. These objects remain
445 live for the life of the \texttt{Problem} object. If the user wishes to
446 keep control over the destruction of these objects, then they can
447 do this by setting the corresponding enums in the \texttt{Options} struct. Even though \texttt{Problem} takes ownership of these pointers, it does not preclude the user from re-using them in another residual or parameter block. The destructor takes care to call
Sameer Agarwald3eaa482012-05-29 23:47:57 -0700448 delete on each pointer only once.