blob: d8b9f4ab12860a734f1ca7e2a9fa99fd0b476ec5 [file] [log] [blame]
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001
2.. default-domain:: cpp
3
4.. cpp:namespace:: ceres
5
6.. _chapter-solving:
7
Sameer Agarwalebbb9842013-05-26 12:40:12 -07008=======
9Solving
10=======
Sameer Agarwal3d87b722013-02-02 00:49:31 -080011
Sameer Agarwalfbbea462013-02-15 11:25:03 -080012
13Introduction
14============
15
Sameer Agarwal3d87b722013-02-02 00:49:31 -080016Effective use of Ceres requires some familiarity with the basic
17components of a nonlinear least squares solver, so before we describe
Sameer Agarwalebbb9842013-05-26 12:40:12 -070018how to configure and use the solver, we will take a brief look at how
19some of the core optimization algorithms in Ceres work.
Sameer Agarwal3d87b722013-02-02 00:49:31 -080020
21Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
22variables, and
23:math:`F(x) = \left[f_1(x), ... , f_{m}(x) \right]^{\top}` be a
24:math:`m`-dimensional function of :math:`x`. We are interested in
25solving the following optimization problem [#f1]_ .
26
27.. math:: \arg \min_x \frac{1}{2}\|F(x)\|^2\ .
28 :label: nonlinsq
29
30Here, the Jacobian :math:`J(x)` of :math:`F(x)` is an :math:`m\times
31n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)` and the
32gradient vector :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2 = J(x)^\top
Sameer Agarwalfbbea462013-02-15 11:25:03 -080033F(x)`. Since the efficient global minimization of :eq:`nonlinsq` for
34general :math:`F(x)` is an intractable problem, we will have to settle
35for finding a local minimum.
Sameer Agarwal3d87b722013-02-02 00:49:31 -080036
37The general strategy when solving non-linear optimization problems is
38to solve a sequence of approximations to the original problem
39[NocedalWright]_. At each iteration, the approximation is solved to
40determine a correction :math:`\Delta x` to the vector :math:`x`. For
41non-linear least squares, an approximation can be constructed by using
42the linearization :math:`F(x+\Delta x) \approx F(x) + J(x)\Delta x`,
Sameer Agarwalfbbea462013-02-15 11:25:03 -080043which leads to the following linear least squares problem:
Sameer Agarwal3d87b722013-02-02 00:49:31 -080044
45.. math:: \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2
46 :label: linearapprox
47
48Unfortunately, naively solving a sequence of these problems and
Sameer Agarwalfbbea462013-02-15 11:25:03 -080049updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that
50may not converge. To get a convergent algorithm, we need to control
51the size of the step :math:`\Delta x`. Depending on how the size of
52the step :math:`\Delta x` is controlled, non-linear optimization
53algorithms can be divided into two major categories [NocedalWright]_.
Sameer Agarwal3d87b722013-02-02 00:49:31 -080054
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800551. **Trust Region** The trust region approach approximates the
56 objective function using using a model function (often a quadratic)
57 over a subset of the search space known as the trust region. If the
58 model function succeeds in minimizing the true objective function
59 the trust region is expanded; conversely, otherwise it is
60 contracted and the model optimization problem is solved again.
Sameer Agarwal3d87b722013-02-02 00:49:31 -080061
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800622. **Line Search** The line search approach first finds a descent
63 direction along which the objective function will be reduced and
64 then computes a step size that decides how far should move along
65 that direction. The descent direction can be computed by various
66 methods, such as gradient descent, Newton's method and Quasi-Newton
67 method. The step size can be determined either exactly or
68 inexactly.
69
70Trust region methods are in some sense dual to line search methods:
71trust region methods first choose a step size (the size of the trust
72region) and then a step direction while line search methods first
Sameer Agarwalebbb9842013-05-26 12:40:12 -070073choose a step direction and then a step size. Ceres implements
74multiple algorithms in both categories.
Sameer Agarwalfbbea462013-02-15 11:25:03 -080075
76.. _section-trust-region-methods:
77
78Trust Region Methods
79====================
80
81The basic trust region algorithm looks something like this.
82
83 1. Given an initial point :math:`x` and a trust region radius :math:`\mu`.
84 2. :math:`\arg \min_{\Delta x} \frac{1}{2}\|J(x)\Delta
85 x + F(x)\|^2` s.t. :math:`\|D(x)\Delta x\|^2 \le \mu`
86 3. :math:`\rho = \frac{\displaystyle \|F(x + \Delta x)\|^2 -
87 \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 -
88 \|F(x)\|^2}`
89 4. if :math:`\rho > \epsilon` then :math:`x = x + \Delta x`.
90 5. if :math:`\rho > \eta_1` then :math:`\rho = 2 \rho`
91 6. else if :math:`\rho < \eta_2` then :math:`\rho = 0.5 * \rho`
92 7. Goto 2.
Sameer Agarwal3d87b722013-02-02 00:49:31 -080093
94Here, :math:`\mu` is the trust region radius, :math:`D(x)` is some
95matrix used to define a metric on the domain of :math:`F(x)` and
96:math:`\rho` measures the quality of the step :math:`\Delta x`, i.e.,
97how well did the linear model predict the decrease in the value of the
98non-linear objective. The idea is to increase or decrease the radius
99of the trust region depending on how well the linearization predicts
100the behavior of the non-linear objective, which in turn is reflected
101in the value of :math:`\rho`.
102
103The key computational step in a trust-region algorithm is the solution
104of the constrained optimization problem
105
106.. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2\quad \text{such that}\quad \|D(x)\Delta x\|^2 \le \mu
107 :label: trp
108
109There are a number of different ways of solving this problem, each
110giving rise to a different concrete trust-region algorithm. Currently
111Ceres, implements two trust-region algorithms - Levenberg-Marquardt
112and Dogleg. The user can choose between them by setting
113:member:`Solver::Options::trust_region_strategy_type`.
114
115.. rubric:: Footnotes
116
Sameer Agarwalebbb9842013-05-26 12:40:12 -0700117.. [#f1] At the level of the non-linear solver, the block
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800118 structure is not relevant, therefore our discussion here is
119 in terms of an optimization problem defined over a state
120 vector of size :math:`n`.
121
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800122
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800123.. _section-levenberg-marquardt:
124
125Levenberg-Marquardt
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800126-------------------
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800127
128The Levenberg-Marquardt algorithm [Levenberg]_ [Marquardt]_ is the
129most popular algorithm for solving non-linear least squares problems.
130It was also the first trust region algorithm to be developed
131[Levenberg]_ [Marquardt]_. Ceres implements an exact step [Madsen]_
132and an inexact step variant of the Levenberg-Marquardt algorithm
133[WrightHolt]_ [NashSofer]_.
134
135It can be shown, that the solution to :eq:`trp` can be obtained by
136solving an unconstrained optimization of the form
137
138.. math:: \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda \|D(x)\Delta x\|^2
139
140Where, :math:`\lambda` is a Lagrange multiplier that is inverse
141related to :math:`\mu`. In Ceres, we solve for
142
143.. math:: \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2
144 :label: lsqr
145
146The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
147the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
148
149Before going further, let us make some notational simplifications. We
150will assume that the matrix :math:`\sqrt{\mu} D` has been concatenated
151at the bottom of the matrix :math:`J` and similarly a vector of zeros
152has been added to the bottom of the vector :math:`f` and the rest of
153our discussion will be in terms of :math:`J` and :math:`f`, i.e, the
154linear least squares problem.
155
156.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
157 :label: simple
158
159For all but the smallest problems the solution of :eq:`simple` in
160each iteration of the Levenberg-Marquardt algorithm is the dominant
161computational cost in Ceres. Ceres provides a number of different
162options for solving :eq:`simple`. There are two major classes of
163methods - factorization and iterative.
164
165The factorization methods are based on computing an exact solution of
166:eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact
167step Levenberg-Marquardt algorithm. But it is not clear if an exact
168solution of :eq:`lsqr` is necessary at each step of the LM algorithm
169to solve :eq:`nonlinsq`. In fact, we have already seen evidence
170that this may not be the case, as :eq:`lsqr` is itself a regularized
171version of :eq:`linearapprox`. Indeed, it is possible to
172construct non-linear optimization algorithms in which the linearized
173problem is solved approximately. These algorithms are known as inexact
174Newton or truncated Newton methods [NocedalWright]_.
175
176An inexact Newton method requires two ingredients. First, a cheap
177method for approximately solving systems of linear
178equations. Typically an iterative linear solver like the Conjugate
179Gradients method is used for this
180purpose [NocedalWright]_. Second, a termination rule for
181the iterative solver. A typical termination rule is of the form
182
183.. math:: \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|.
184 :label: inexact
185
186Here, :math:`k` indicates the Levenberg-Marquardt iteration number and
187:math:`0 < \eta_k <1` is known as the forcing sequence. [WrightHolt]_
188prove that a truncated Levenberg-Marquardt algorithm that uses an
189inexact Newton step based on :eq:`inexact` converges for any
190sequence :math:`\eta_k \leq \eta_0 < 1` and the rate of convergence
191depends on the choice of the forcing sequence :math:`\eta_k`.
192
193Ceres supports both exact and inexact step solution strategies. When
194the user chooses a factorization based linear solver, the exact step
195Levenberg-Marquardt algorithm is used. When the user chooses an
196iterative linear solver, the inexact step Levenberg-Marquardt
197algorithm is used.
198
199.. _section-dogleg:
200
201Dogleg
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800202------
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800203
204Another strategy for solving the trust region problem :eq:`trp` was
205introduced by M. J. D. Powell. The key idea there is to compute two
206vectors
207
208.. math::
209
210 \Delta x^{\text{Gauss-Newton}} &= \arg \min_{\Delta x}\frac{1}{2} \|J(x)\Delta x + f(x)\|^2.\\
211 \Delta x^{\text{Cauchy}} &= -\frac{\|g(x)\|^2}{\|J(x)g(x)\|^2}g(x).
212
213Note that the vector :math:`\Delta x^{\text{Gauss-Newton}}` is the
214solution to :eq:`linearapprox` and :math:`\Delta
215x^{\text{Cauchy}}` is the vector that minimizes the linear
216approximation if we restrict ourselves to moving along the direction
217of the gradient. Dogleg methods finds a vector :math:`\Delta x`
218defined by :math:`\Delta x^{\text{Gauss-Newton}}` and :math:`\Delta
219x^{\text{Cauchy}}` that solves the trust region problem. Ceres
220supports two variants that can be chose by setting
221:member:`Solver::Options::dogleg_type`.
222
223``TRADITIONAL_DOGLEG`` as described by Powell, constructs two line
224segments using the Gauss-Newton and Cauchy vectors and finds the point
225farthest along this line shaped like a dogleg (hence the name) that is
226contained in the trust-region. For more details on the exact reasoning
227and computations, please see Madsen et al [Madsen]_.
228
229``SUBSPACE_DOGLEG`` is a more sophisticated method that considers the
230entire two dimensional subspace spanned by these two vectors and finds
Sameer Agarwalfa21df82013-02-18 08:48:52 -0800231the point that minimizes the trust region problem in this subspace
232[ByrdSchnabel]_.
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800233
234The key advantage of the Dogleg over Levenberg Marquardt is that if
235the step computation for a particular choice of :math:`\mu` does not
236result in sufficient decrease in the value of the objective function,
237Levenberg-Marquardt solves the linear approximation from scratch with
238a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
239to compute the interpolation between the Gauss-Newton and the Cauchy
240vectors, as neither of them depend on the value of :math:`\mu`.
241
242The Dogleg method can only be used with the exact factorization based
243linear solvers.
244
245.. _section-inner-iterations:
246
247Inner Iterations
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800248----------------
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800249
250Some non-linear least squares problems have additional structure in
251the way the parameter blocks interact that it is beneficial to modify
252the way the trust region step is computed. e.g., consider the
253following regression problem
254
255.. math:: y = a_1 e^{b_1 x} + a_2 e^{b_3 x^2 + c_1}
256
257
258Given a set of pairs :math:`\{(x_i, y_i)\}`, the user wishes to estimate
259:math:`a_1, a_2, b_1, b_2`, and :math:`c_1`.
260
261Notice that the expression on the left is linear in :math:`a_1` and
262:math:`a_2`, and given any value for :math:`b_1, b_2` and :math:`c_1`,
263it is possible to use linear regression to estimate the optimal values
264of :math:`a_1` and :math:`a_2`. It's possible to analytically
265eliminate the variables :math:`a_1` and :math:`a_2` from the problem
266entirely. Problems like these are known as separable least squares
267problem and the most famous algorithm for solving them is the Variable
268Projection algorithm invented by Golub & Pereyra [GolubPereyra]_.
269
270Similar structure can be found in the matrix factorization with
271missing data problem. There the corresponding algorithm is known as
272Wiberg's algorithm [Wiberg]_.
273
274Ruhe & Wedin present an analysis of various algorithms for solving
275separable non-linear least squares problems and refer to *Variable
276Projection* as Algorithm I in their paper [RuheWedin]_.
277
278Implementing Variable Projection is tedious and expensive. Ruhe &
279Wedin present a simpler algorithm with comparable convergence
280properties, which they call Algorithm II. Algorithm II performs an
281additional optimization step to estimate :math:`a_1` and :math:`a_2`
282exactly after computing a successful Newton step.
283
284
285This idea can be generalized to cases where the residual is not
286linear in :math:`a_1` and :math:`a_2`, i.e.,
287
288.. math:: y = f_1(a_1, e^{b_1 x}) + f_2(a_2, e^{b_3 x^2 + c_1})
289
290In this case, we solve for the trust region step for the full problem,
291and then use it as the starting point to further optimize just `a_1`
292and `a_2`. For the linear case, this amounts to doing a single linear
293least squares solve. For non-linear problems, any method for solving
294the `a_1` and `a_2` optimization problems will do. The only constraint
295on `a_1` and `a_2` (if they are two different parameter block) is that
296they do not co-occur in a residual block.
297
298This idea can be further generalized, by not just optimizing
299:math:`(a_1, a_2)`, but decomposing the graph corresponding to the
300Hessian matrix's sparsity structure into a collection of
301non-overlapping independent sets and optimizing each of them.
302
303Setting :member:`Solver::Options::use_inner_iterations` to ``true``
304enables the use of this non-linear generalization of Ruhe & Wedin's
305Algorithm II. This version of Ceres has a higher iteration
306complexity, but also displays better convergence behavior per
307iteration.
308
309Setting :member:`Solver::Options::num_threads` to the maximum number
310possible is highly recommended.
311
312.. _section-non-monotonic-steps:
313
314Non-monotonic Steps
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800315-------------------
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800316
317Note that the basic trust-region algorithm described in
318Algorithm~\ref{alg:trust-region} is a descent algorithm in that they
319only accepts a point if it strictly reduces the value of the objective
320function.
321
322Relaxing this requirement allows the algorithm to be more efficient in
323the long term at the cost of some local increase in the value of the
324objective function.
325
326This is because allowing for non-decreasing objective function values
Sameer Agarwalebbb9842013-05-26 12:40:12 -0700327in a principled manner allows the algorithm to *jump over boulders* as
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800328the method is not restricted to move into narrow valleys while
329preserving its convergence properties.
330
331Setting :member:`Solver::Options::use_nonmonotonic_steps` to ``true``
332enables the non-monotonic trust region algorithm as described by Conn,
333Gould & Toint in [Conn]_.
334
335Even though the value of the objective function may be larger
336than the minimum value encountered over the course of the
337optimization, the final parameters returned to the user are the
338ones corresponding to the minimum cost over all iterations.
339
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800340The option to take non-monotonic steps is available for all trust
341region strategies.
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800342
343
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800344.. _section-line-search-methods:
345
346Line Search Methods
347===================
348
349**The implementation of line search algorithms in Ceres Solver is
350fairly new and not very well tested, so for now this part of the
351solver should be considered beta quality. We welcome reports of your
352experiences both good and bad on the mailinglist.**
353
354Line search algorithms
355
356 1. Given an initial point :math:`x`
357 2. :math:`\Delta x = -H^{-1}(x) g(x)`
358 3. :math:`\arg \min_\mu \frac{1}{2} \| F(x + \mu \Delta x) \|^2`
359 4. :math:`x = x + \mu \Delta x`
360 5. Goto 2.
361
362Here :math:`H(x)` is some approximation to the Hessian of the
363objective function, and :math:`g(x)` is the gradient at
364:math:`x`. Depending on the choice of :math:`H(x)` we get a variety of
365different search directions -`\Delta x`.
366
367Step 4, which is a one dimensional optimization or `Line Search` along
368:math:`\Delta x` is what gives this class of methods its name.
369
370Different line search algorithms differ in their choice of the search
371direction :math:`\Delta x` and the method used for one dimensional
372optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
373primary source of computational complexity in these
374methods. Currently, Ceres Solver supports three choices of search
375directions, all aimed at large scale problems.
376
3771. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
378 be the identity matrix. This is not a good search direction for
379 anything but the simplest of the problems. It is only included here
380 for completeness.
381
3822. ``NONLINEAR_CONJUGATE_GRADIENT`` A generalization of the Conjugate
383 Gradient method to non-linear functions. The generalization can be
384 performed in a number of different ways, resulting in a variety of
385 search directions. Ceres Solver currently supports
386 ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and ``HESTENES_STIEFEL``
387 directions.
388
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -07003893. ``BFGS`` A generalization of the Secant method to multiple
390 dimensions in which a full, dense approximation to the inverse
391 Hessian is maintained and used to compute a quasi-Newton step
392 [NocedalWright]_. BFGS is currently the best known general
393 quasi-Newton algorithm.
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800394
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -07003954. ``LBFGS`` A limited memory approximation to the full ``BFGS``
396 method in which the last `M` iterations are used to approximate the
397 inverse Hessian used to compute a quasi-Newton step [Nocedal]_,
398 [ByrdNocedal]_.
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100399
Sameer Agarwal42be9ca2013-07-18 08:02:08 -0700400Currently Ceres Solver supports both a backtracking and interpolation
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700401based Armijo line search algorithm, and a sectioning / zoom
402interpolation (strong) Wolfe condition line search algorithm.
403However, note that in order for the assumptions underlying the
404``BFGS`` and ``LBFGS`` methods to be guaranteed to be satisfied the
405Wolfe line search algorithm should be used.
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800406
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800407.. _section-linear-solver:
408
409LinearSolver
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800410============
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800411
412Recall that in both of the trust-region methods described above, the
413key computational cost is the solution of a linear least squares
414problem of the form
415
416.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
417 :label: simple2
418
419Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
420f(x)`. For notational convenience let us also drop the dependence on
421:math:`x`. Then it is easy to see that solving :eq:`simple2` is
422equivalent to solving the *normal equations*.
423
424.. math:: H \Delta x = g
425 :label: normal
426
427Ceres provides a number of different options for solving :eq:`normal`.
428
429.. _section-qr:
430
431``DENSE_QR``
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800432------------
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800433
434For small problems (a couple of hundred parameters and a few thousand
435residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
436of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
437:math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
438an upper triangular matrix [TrefethenBau]_. Then it can be shown that
439the solution to :eq:`normal` is given by
440
441.. math:: \Delta x^* = -R^{-1}Q^\top f
442
443
444Ceres uses ``Eigen`` 's dense QR factorization routines.
445
446.. _section-cholesky:
447
448``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800449------------------------------------------------------
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800450
451Large non-linear least square problems are usually sparse. In such
452cases, using a dense QR factorization is inefficient. Let :math:`H =
453R^\top R` be the Cholesky factorization of the normal equations, where
454:math:`R` is an upper triangular matrix, then the solution to
455:eq:`normal` is given by
456
457.. math::
458
459 \Delta x^* = R^{-1} R^{-\top} g.
460
461
462The observant reader will note that the :math:`R` in the Cholesky
463factorization of :math:`H` is the same upper triangular matrix
464:math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
465orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
466Q^\top Q R = R^\top R`. There are two variants of Cholesky
467factorization -- sparse and dense.
468
469``DENSE_NORMAL_CHOLESKY`` as the name implies performs a dense
470Cholesky factorization of the normal equations. Ceres uses
471``Eigen`` 's dense LDLT factorization routines.
472
473``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse
474Cholesky factorization of the normal equations. This leads to
475substantial savings in time and memory for large sparse
476problems. Ceres uses the sparse Cholesky factorization routines in
477Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_.
478
479.. _section-schur:
480
481``DENSE_SCHUR`` & ``SPARSE_SCHUR``
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800482----------------------------------
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800483
484While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
485adjustment problems, bundle adjustment problem have a special
486structure, and a more efficient scheme for solving :eq:`normal`
487can be constructed.
488
489Suppose that the SfM problem consists of :math:`p` cameras and
490:math:`q` points and the variable vector :math:`x` has the block
491structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
492:math:`y` and :math:`z` correspond to camera and point parameters,
493respectively. Further, let the camera blocks be of size :math:`c` and
494the point blocks be of size :math:`s` (for most problems :math:`c` =
495:math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
496requirement on these block sizes, but choosing them to be constant
497simplifies the exposition.
498
499A key characteristic of the bundle adjustment problem is that there is
500no term :math:`f_{i}` that includes two or more point blocks. This in
501turn implies that the matrix :math:`H` is of the form
502
503.. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
504 :label: hblock
505
506where, :math:`B \in \mathbb{R}^{pc\times pc}` is a block sparse matrix
507with :math:`p` blocks of size :math:`c\times c` and :math:`C \in
508\mathbb{R}^{qs\times qs}` is a block diagonal matrix with :math:`q` blocks
509of size :math:`s\times s`. :math:`E \in \mathbb{R}^{pc\times qs}` is a
510general block sparse matrix, with a block of size :math:`c\times s`
511for each observation. Let us now block partition :math:`\Delta x =
512[\Delta y,\Delta z]` and :math:`g=[v,w]` to restate :eq:`normal`
513as the block structured linear system
514
515.. math:: \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix}
516 \right]\left[ \begin{matrix} \Delta y \\ \Delta z
Sameer Agarwalebbb9842013-05-26 12:40:12 -0700517 \end{matrix} \right] = \left[ \begin{matrix} v\\ w
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800518 \end{matrix} \right]\ ,
519 :label: linear2
520
521and apply Gaussian elimination to it. As we noted above, :math:`C` is
522a block diagonal matrix, with small diagonal blocks of size
523:math:`s\times s`. Thus, calculating the inverse of :math:`C` by
524inverting each of these blocks is cheap. This allows us to eliminate
525:math:`\Delta z` by observing that :math:`\Delta z = C^{-1}(w - E^\top
526\Delta y)`, giving us
527
528.. math:: \left[B - EC^{-1}E^\top\right] \Delta y = v - EC^{-1}w\ .
529 :label: schur
530
531The matrix
532
533.. math:: S = B - EC^{-1}E^\top
534
535is the Schur complement of :math:`C` in :math:`H`. It is also known as
536the *reduced camera matrix*, because the only variables
537participating in :eq:`schur` are the ones corresponding to the
538cameras. :math:`S \in \mathbb{R}^{pc\times pc}` is a block structured
539symmetric positive definite matrix, with blocks of size :math:`c\times
540c`. The block :math:`S_{ij}` corresponding to the pair of images
541:math:`i` and :math:`j` is non-zero if and only if the two images
542observe at least one common point.
543
544
545Now, eq-linear2 can be solved by first forming :math:`S`, solving for
546:math:`\Delta y`, and then back-substituting :math:`\Delta y` to
547obtain the value of :math:`\Delta z`. Thus, the solution of what was
548an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
549inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
550and matrix-vector multiplies, and the solution of block sparse
551:math:`pc\times pc` linear system :eq:`schur`. For almost all
552problems, the number of cameras is much smaller than the number of
553points, :math:`p \ll q`, thus solving :eq:`schur` is
554significantly cheaper than solving :eq:`linear2`. This is the
555*Schur complement trick* [Brown]_.
556
557This still leaves open the question of solving :eq:`schur`. The
558method of choice for solving symmetric positive definite systems
559exactly is via the Cholesky factorization [TrefethenBau]_ and
560depending upon the structure of the matrix, there are, in general, two
561options. The first is direct factorization, where we store and factor
562:math:`S` as a dense matrix [TrefethenBau]_. This method has
563:math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
564is only practical for problems with up to a few hundred cameras. Ceres
565implements this strategy as the ``DENSE_SCHUR`` solver.
566
567
568But, :math:`S` is typically a fairly sparse matrix, as most images
569only see a small fraction of the scene. This leads us to the second
570option: Sparse Direct Methods. These methods store :math:`S` as a
571sparse matrix, use row and column re-ordering algorithms to maximize
572the sparsity of the Cholesky decomposition, and focus their compute
573effort on the non-zero part of the factorization [Chen]_. Sparse
574direct methods, depending on the exact sparsity structure of the Schur
575complement, allow bundle adjustment algorithms to significantly scale
576up over those based on dense factorization. Ceres implements this
577strategy as the ``SPARSE_SCHUR`` solver.
578
579.. _section-cgnr:
580
581``CGNR``
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800582--------
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800583
584For general sparse problems, if the problem is too large for
585``CHOLMOD`` or a sparse linear algebra library is not linked into
586Ceres, another option is the ``CGNR`` solver. This solver uses the
587Conjugate Gradients solver on the *normal equations*, but without
588forming the normal equations explicitly. It exploits the relation
589
590.. math::
591 H x = J^\top J x = J^\top(J x)
592
593
594When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres
595automatically switches from the exact step algorithm to an inexact
596step algorithm.
597
598.. _section-iterative_schur:
599
600``ITERATIVE_SCHUR``
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800601-------------------
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800602
603Another option for bundle adjustment problems is to apply PCG to the
604reduced camera matrix :math:`S` instead of :math:`H`. One reason to do
605this is that :math:`S` is a much smaller matrix than :math:`H`, but
606more importantly, it can be shown that :math:`\kappa(S)\leq
607\kappa(H)`. Cseres implements PCG on :math:`S` as the
608``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR``
609as the linear solver, Ceres automatically switches from the exact step
610algorithm to an inexact step algorithm.
611
612The cost of forming and storing the Schur complement :math:`S` can be
613prohibitive for large problems. Indeed, for an inexact Newton solver
614that computes :math:`S` and runs PCG on it, almost all of its time is
615spent in constructing :math:`S`; the time spent inside the PCG
616algorithm is negligible in comparison. Because PCG only needs access
617to :math:`S` via its product with a vector, one way to evaluate
618:math:`Sx` is to observe that
619
620.. math:: x_1 &= E^\top x
621.. math:: x_2 &= C^{-1} x_1
622.. math:: x_3 &= Ex_2\\
623.. math:: x_4 &= Bx\\
624.. math:: Sx &= x_4 - x_3
625 :label: schurtrick1
626
627Thus, we can run PCG on :math:`S` with the same computational effort
628per iteration as PCG on :math:`H`, while reaping the benefits of a
629more powerful preconditioner. In fact, we do not even need to compute
630:math:`H`, :eq:`schurtrick1` can be implemented using just the columns
631of :math:`J`.
632
633Equation :eq:`schurtrick1` is closely related to *Domain
634Decomposition methods* for solving large linear systems that arise in
635structural engineering and partial differential equations. In the
636language of Domain Decomposition, each point in a bundle adjustment
637problem is a domain, and the cameras form the interface between these
638domains. The iterative solution of the Schur complement then falls
639within the sub-category of techniques known as Iterative
640Sub-structuring [Saad]_ [Mathew]_.
641
642.. _section-preconditioner:
643
644Preconditioner
645--------------
646
647The convergence rate of Conjugate Gradients for
648solving :eq:`normal` depends on the distribution of eigenvalues
649of :math:`H` [Saad]_. A useful upper bound is
650:math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
651number of the matrix :math:`H`. For most bundle adjustment problems,
652:math:`\kappa(H)` is high and a direct application of Conjugate
653Gradients to :eq:`normal` results in extremely poor performance.
654
655The solution to this problem is to replace :eq:`normal` with a
656*preconditioned* system. Given a linear system, :math:`Ax =b` and a
657preconditioner :math:`M` the preconditioned system is given by
658:math:`M^{-1}Ax = M^{-1}b`. The resulting algorithm is known as
659Preconditioned Conjugate Gradients algorithm (PCG) and its worst case
660complexity now depends on the condition number of the *preconditioned*
661matrix :math:`\kappa(M^{-1}A)`.
662
663The computational cost of using a preconditioner :math:`M` is the cost
664of computing :math:`M` and evaluating the product :math:`M^{-1}y` for
665arbitrary vectors :math:`y`. Thus, there are two competing factors to
666consider: How much of :math:`H`'s structure is captured by :math:`M`
667so that the condition number :math:`\kappa(HM^{-1})` is low, and the
668computational cost of constructing and using :math:`M`. The ideal
669preconditioner would be one for which :math:`\kappa(M^{-1}A)
670=1`. :math:`M=A` achieves this, but it is not a practical choice, as
671applying this preconditioner would require solving a linear system
672equivalent to the unpreconditioned problem. It is usually the case
673that the more information :math:`M` has about :math:`H`, the more
674expensive it is use. For example, Incomplete Cholesky factorization
675based preconditioners have much better convergence behavior than the
676Jacobi preconditioner, but are also much more expensive.
677
678
679The simplest of all preconditioners is the diagonal or Jacobi
680preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
681block structured matrices like :math:`H` can be generalized to the
682block Jacobi preconditioner.
683
684For ``ITERATIVE_SCHUR`` there are two obvious choices for block
685diagonal preconditioners for :math:`S`. The block diagonal of the
686matrix :math:`B` [Mandel]_ and the block diagonal :math:`S`, i.e, the
687block Jacobi preconditioner for :math:`S`. Ceres's implements both of
688these preconditioners and refers to them as ``JACOBI`` and
689``SCHUR_JACOBI`` respectively.
690
691For bundle adjustment problems arising in reconstruction from
692community photo collections, more effective preconditioners can be
693constructed by analyzing and exploiting the camera-point visibility
694structure of the scene [KushalAgarwal]. Ceres implements the two
695visibility based preconditioners described by Kushal & Agarwal as
696``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. These are fairly new
697preconditioners and Ceres' implementation of them is in its early
698stages and is not as mature as the other preconditioners described
699above.
700
701.. _section-ordering:
702
703Ordering
704--------
705
706The order in which variables are eliminated in a linear solver can
707have a significant of impact on the efficiency and accuracy of the
708method. For example when doing sparse Cholesky factorization, there
709are matrices for which a good ordering will give a Cholesky factor
710with :math:`O(n)` storage, where as a bad ordering will result in an
711completely dense factor.
712
713Ceres allows the user to provide varying amounts of hints to the
714solver about the variable elimination ordering to use. This can range
715from no hints, where the solver is free to decide the best ordering
716based on the user's choices like the linear solver being used, to an
717exact order in which the variables should be eliminated, and a variety
718of possibilities in between.
719
720Instances of the :class:`ParameterBlockOrdering` class are used to
721communicate this information to Ceres.
722
723Formally an ordering is an ordered partitioning of the parameter
724blocks. Each parameter block belongs to exactly one group, and each
725group has a unique integer associated with it, that determines its
726order in the set of groups. We call these groups *Elimination Groups*
727
728Given such an ordering, Ceres ensures that the parameter blocks in the
729lowest numbered elimination group are eliminated first, and then the
730parameter blocks in the next lowest numbered elimination group and so
731on. Within each elimination group, Ceres is free to order the
732parameter blocks as it chooses. e.g. Consider the linear system
733
734.. math::
735 x + y &= 3\\
736 2x + 3y &= 7
737
738There are two ways in which it can be solved. First eliminating
739:math:`x` from the two equations, solving for y and then back
740substituting for :math:`x`, or first eliminating :math:`y`, solving
741for :math:`x` and back substituting for :math:`y`. The user can
742construct three orderings here.
743
7441. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
7452. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
7463. :math:`\{0: x, y\}` : Solver gets to decide the elimination order.
747
748Thus, to have Ceres determine the ordering automatically using
749heuristics, put all the variables in the same elimination group. The
750identity of the group does not matter. This is the same as not
751specifying an ordering at all. To control the ordering for every
752variable, create an elimination group per variable, ordering them in
753the desired order.
754
755If the user is using one of the Schur solvers (``DENSE_SCHUR``,
756``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an
757ordering, it must have one important property. The lowest numbered
758elimination group must form an independent set in the graph
759corresponding to the Hessian, or in other words, no two parameter
760blocks in in the first elimination group should co-occur in the same
761residual block. For the best performance, this elimination group
762should be as large as possible. For standard bundle adjustment
763problems, this corresponds to the first elimination group containing
764all the 3d points, and the second containing the all the cameras
765parameter blocks.
766
767If the user leaves the choice to Ceres, then the solver uses an
768approximate maximum independent set algorithm to identify the first
769elimination group [LiSaad]_.
770
771.. _section-solver-options:
772
773:class:`Solver::Options`
774------------------------
775
776.. class:: Solver::Options
777
778 :class:`Solver::Options` controls the overall behavior of the
779 solver. We list the various settings and their default values below.
780
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800781
782.. member:: MinimizerType Solver::Options::minimizer_type
783
784 Default: ``TRUST_REGION``
785
786 Choose between ``LINE_SEARCH`` and ``TRUST_REGION`` algorithms. See
787 :ref:`section-trust-region-methods` and
788 :ref:`section-line-search-methods` for more details.
789
790.. member:: LineSearchDirectionType Solver::Options::line_search_direction_type
791
792 Default: ``LBFGS``
793
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100794 Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
795 ``BFGS`` and ``LBFGS``.
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800796
797.. member:: LineSearchType Solver::Options::line_search_type
798
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100799 Default: ``WOLFE``
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800800
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700801 Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
802 Note that in order for the assumptions underlying the ``BFGS`` and
803 ``LBFGS`` line search direction algorithms to be guaranteed to be
804 satisifed, the ``WOLFE`` line search should be used.
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800805
Sameer Agarwalfa21df82013-02-18 08:48:52 -0800806.. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800807
808 Default: ``FLETCHER_REEVES``
809
810 Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and
811 ``HESTENES_STIEFEL``.
812
813.. member:: int Solver::Options::max_lbfs_rank
814
815 Default: 20
816
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100817 The L-BFGS hessian approximation is a low rank approximation to the
Sameer Agarwalfbbea462013-02-15 11:25:03 -0800818 inverse of the Hessian matrix. The rank of the approximation
819 determines (linearly) the space and time complexity of using the
820 approximation. Higher the rank, the better is the quality of the
821 approximation. The increase in quality is however is bounded for a
822 number of reasons.
823
824 1. The method only uses secant information and not actual
825 derivatives.
826
827 2. The Hessian approximation is constrained to be positive
828 definite.
829
830 So increasing this rank to a large number will cost time and space
831 complexity without the corresponding increase in solution
832 quality. There are no hard and fast rules for choosing the maximum
833 rank. The best choice usually requires some problem specific
834 experimentation.
835
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100836.. member:: bool Solver::Options::use_approximate_eigenvalue_bfgs_scaling
837
838 Default: ``false``
839
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700840 As part of the ``BFGS`` update step / ``LBFGS`` right-multiply
841 step, the initial inverse Hessian approximation is taken to be the
842 Identity. However, [Oren]_ showed that using instead :math:`I *
843 \gamma`, where :math:`\gamma` is a scalar chosen to approximate an
844 eigenvalue of the true inverse Hessian can result in improved
845 convergence in a wide variety of cases. Setting
846 ``use_approximate_eigenvalue_bfgs_scaling`` to true enables this
847 scaling in ``BFGS`` (before first iteration) and ``LBFGS`` (at each
848 iteration).
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100849
850 Precisely, approximate eigenvalue scaling equates to
851
852 .. math:: \gamma = \frac{y_k' s_k}{y_k' y_k}
853
854 With:
855
856 .. math:: y_k = \nabla f_{k+1} - \nabla f_k
857 .. math:: s_k = x_{k+1} - x_k
858
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700859 Where :math:`f()` is the line search objective and :math:`x` the
860 vector of parameter values [NocedalWright]_.
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100861
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700862 It is important to note that approximate eigenvalue scaling does
863 **not** *always* improve convergence, and that it can in fact
864 *significantly* degrade performance for certain classes of problem,
865 which is why it is disabled by default. In particular it can
866 degrade performance when the sensitivity of the problem to different
867 parameters varies significantly, as in this case a single scalar
868 factor fails to capture this variation and detrimentally downscales
869 parts of the Jacobian approximation which correspond to
870 low-sensitivity parameters. It can also reduce the robustness of the
871 solution to errors in the Jacobians.
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100872
Sameer Agarwal09244012013-06-30 14:33:23 -0700873.. member:: LineSearchIterpolationType Solver::Options::line_search_interpolation_type
874
875 Default: ``CUBIC``
876
877 Degree of the polynomial used to approximate the objective
878 function. Valid values are ``BISECTION``, ``QUADRATIC`` and
879 ``CUBIC``.
880
881.. member:: double Solver::Options::min_line_search_step_size
882
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100883 The line search terminates if:
Sameer Agarwal09244012013-06-30 14:33:23 -0700884
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100885 .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
886
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700887 where :math:`\|\cdot\|_\infty` refers to the max norm, and
888 :math:`\Delta x_k` is the step change in the parameter values at
889 the :math:`k`-th iteration.
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100890
891.. member:: double Solver::Options::line_search_sufficient_function_decrease
892
893 Default: ``1e-4``
Sameer Agarwal09244012013-06-30 14:33:23 -0700894
895 Solving the line search problem exactly is computationally
896 prohibitive. Fortunately, line search based optimization algorithms
897 can still guarantee convergence if instead of an exact solution,
898 the line search algorithm returns a solution which decreases the
899 value of the objective function sufficiently. More precisely, we
900 are looking for a step size s.t.
901
902 .. math:: f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]
903
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100904 This condition is known as the Armijo condition.
Sameer Agarwal09244012013-06-30 14:33:23 -0700905
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100906.. member:: double Solver::Options::max_line_search_step_contraction
Sameer Agarwal09244012013-06-30 14:33:23 -0700907
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100908 Default: ``1e-3``
Sameer Agarwal09244012013-06-30 14:33:23 -0700909
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100910 In each iteration of the line search,
Sameer Agarwal09244012013-06-30 14:33:23 -0700911
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100912 .. math:: \text{new_step_size} >= \text{max_line_search_step_contraction} * \text{step_size}
Sameer Agarwal09244012013-06-30 14:33:23 -0700913
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100914 Note that by definition, for contraction:
915
916 .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
917
918.. member:: double Solver::Options::min_line_search_step_contraction
919
920 Default: ``0.6``
921
922 In each iteration of the line search,
923
924 .. math:: \text{new_step_size} <= \text{min_line_search_step_contraction} * \text{step_size}
925
926 Note that by definition, for contraction:
927
928 .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
929
930.. member:: int Solver::Options::max_num_line_search_step_size_iterations
931
932 Default: ``20``
933
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700934 Maximum number of trial step size iterations during each line
935 search, if a step size satisfying the search conditions cannot be
936 found within this number of trials, the line search will stop.
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100937
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700938 As this is an 'artificial' constraint (one imposed by the user, not
939 the underlying math), if ``WOLFE`` line search is being used, *and*
940 points satisfying the Armijo sufficient (function) decrease
941 condition have been found during the current search (in :math:`<=`
942 ``max_num_line_search_step_size_iterations``). Then, the step size
943 with the lowest function value which satisfies the Armijo condition
944 will be returned as the new valid step, even though it does *not*
945 satisfy the strong Wolfe conditions. This behaviour protects
946 against early termination of the optimizer at a sub-optimal point.
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100947
948.. member:: int Solver::Options::max_num_line_search_direction_restarts
949
950 Default: ``5``
951
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700952 Maximum number of restarts of the line search direction algorithm
953 before terminating the optimization. Restarts of the line search
954 direction algorithm occur when the current algorithm fails to
955 produce a new descent direction. This typically indicates a
956 numerical failure, or a breakdown in the validity of the
957 approximations used.
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100958
959.. member:: double Solver::Options::line_search_sufficient_curvature_decrease
960
961 Default: ``0.9``
962
963 The strong Wolfe conditions consist of the Armijo sufficient
964 decrease condition, and an additional requirement that the
965 step size be chosen s.t. the *magnitude* ('strong' Wolfe
966 conditions) of the gradient along the search direction
967 decreases sufficiently. Precisely, this second condition
968 is that we seek a step size s.t.
969
970 .. math:: \|f'(\text{step_size})\| <= \text{sufficient_curvature_decrease} * \|f'(0)\|
971
972 Where :math:`f()` is the line search objective and :math:`f'()` is the derivative
973 of :math:`f` with respect to the step size: :math:`\frac{d f}{d~\text{step size}}`.
974
975.. member:: double Solver::Options::max_line_search_step_expansion
976
977 Default: ``10.0``
978
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -0700979 During the bracketing phase of a Wolfe line search, the step size
980 is increased until either a point satisfying the Wolfe conditions
981 is found, or an upper bound for a bracket containinqg a point
982 satisfying the conditions is found. Precisely, at each iteration
983 of the expansion:
Alex Stewart9aa0e3c2013-07-05 20:22:37 +0100984
985 .. math:: \text{new_step_size} <= \text{max_step_expansion} * \text{step_size}
986
987 By definition for expansion
988
989 .. math:: \text{max_step_expansion} > 1.0
Sameer Agarwal09244012013-06-30 14:33:23 -0700990
Sameer Agarwal3d87b722013-02-02 00:49:31 -0800991.. member:: TrustRegionStrategyType Solver::Options::trust_region_strategy_type
992
993 Default: ``LEVENBERG_MARQUARDT``
994
995 The trust region step computation algorithm used by
996 Ceres. Currently ``LEVENBERG_MARQUARDT`` and ``DOGLEG`` are the two
997 valid choices. See :ref:`section-levenberg-marquardt` and
998 :ref:`section-dogleg` for more details.
999
1000.. member:: DoglegType Solver::Options::dogleg_type
1001
1002 Default: ``TRADITIONAL_DOGLEG``
1003
1004 Ceres supports two different dogleg strategies.
1005 ``TRADITIONAL_DOGLEG`` method by Powell and the ``SUBSPACE_DOGLEG``
Sameer Agarwalfa21df82013-02-18 08:48:52 -08001006 method described by [ByrdSchnabel]_ . See :ref:`section-dogleg`
1007 for more details.
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001008
1009.. member:: bool Solver::Options::use_nonmonotonic_steps
1010
1011 Default: ``false``
1012
1013 Relax the requirement that the trust-region algorithm take strictly
1014 decreasing steps. See :ref:`section-non-monotonic-steps` for more
1015 details.
1016
1017.. member:: int Solver::Options::max_consecutive_nonmonotonic_steps
1018
1019 Default: ``5``
1020
1021 The window size used by the step selection algorithm to accept
1022 non-monotonic steps.
1023
1024.. member:: int Solver::Options::max_num_iterations
1025
1026 Default: ``50``
1027
1028 Maximum number of iterations for which the solver should run.
1029
1030.. member:: double Solver::Options::max_solver_time_in_seconds
1031
1032 Default: ``1e6``
1033 Maximum amount of time for which the solver should run.
1034
1035.. member:: int Solver::Options::num_threads
1036
1037 Default: ``1``
1038
1039 Number of threads used by Ceres to evaluate the Jacobian.
1040
1041.. member:: double Solver::Options::initial_trust_region_radius
1042
1043 Default: ``1e4``
1044
1045 The size of the initial trust region. When the
1046 ``LEVENBERG_MARQUARDT`` strategy is used, the reciprocal of this
1047 number is the initial regularization parameter.
1048
1049.. member:: double Solver::Options::max_trust_region_radius
1050
1051 Default: ``1e16``
1052
1053 The trust region radius is not allowed to grow beyond this value.
1054
1055.. member:: double Solver::Options::min_trust_region_radius
1056
1057 Default: ``1e-32``
1058
1059 The solver terminates, when the trust region becomes smaller than
1060 this value.
1061
1062.. member:: double Solver::Options::min_relative_decrease
1063
1064 Default: ``1e-3``
1065
1066 Lower threshold for relative decrease before a trust-region step is
Sameer Agarwalebbb9842013-05-26 12:40:12 -07001067 accepted.
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001068
Sameer Agarwaleeedd2e2013-07-07 23:04:31 -07001069.. member:: double Solver::Options::min_lm_diagonal
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001070
1071 Default: ``1e6``
1072
1073 The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
1074 regularize the the trust region step. This is the lower bound on
1075 the values of this diagonal matrix.
1076
Sameer Agarwaleeedd2e2013-07-07 23:04:31 -07001077.. member:: double Solver::Options::max_lm_diagonal
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001078
1079 Default: ``1e32``
1080
1081 The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
1082 regularize the the trust region step. This is the upper bound on
1083 the values of this diagonal matrix.
1084
1085.. member:: int Solver::Options::max_num_consecutive_invalid_steps
1086
1087 Default: ``5``
1088
1089 The step returned by a trust region strategy can sometimes be
1090 numerically invalid, usually because of conditioning
1091 issues. Instead of crashing or stopping the optimization, the
1092 optimizer can go ahead and try solving with a smaller trust
1093 region/better conditioned problem. This parameter sets the number
1094 of consecutive retries before the minimizer gives up.
1095
1096.. member:: double Solver::Options::function_tolerance
1097
1098 Default: ``1e-6``
1099
1100 Solver terminates if
1101
1102 .. math:: \frac{|\Delta \text{cost}|}{\text{cost} < \text{function_tolerance}}
1103
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -07001104 where, :math:`\Delta \text{cost}` is the change in objective
1105 function value (up or down) in the current iteration of
1106 Levenberg-Marquardt.
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001107
1108.. member:: double Solver::Options::gradient_tolerance
1109
1110 Default: ``1e-10``
1111
1112 Solver terminates if
1113
1114 .. math:: \frac{\|g(x)\|_\infty}{\|g(x_0)\|_\infty} < \text{gradient_tolerance}
1115
1116 where :math:`\|\cdot\|_\infty` refers to the max norm, and :math:`x_0` is
1117 the vector of initial parameter values.
1118
1119.. member:: double Solver::Options::parameter_tolerance
1120
1121 Default: ``1e-8``
1122
1123 Solver terminates if
1124
1125 .. math:: \|\Delta x\| < (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance}
1126
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -07001127 where :math:`\Delta x` is the step computed by the linear solver in
1128 the current iteration of Levenberg-Marquardt.
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001129
1130.. member:: LinearSolverType Solver::Options::linear_solver_type
1131
1132 Default: ``SPARSE_NORMAL_CHOLESKY`` / ``DENSE_QR``
1133
1134 Type of linear solver used to compute the solution to the linear
1135 least squares problem in each iteration of the Levenberg-Marquardt
1136 algorithm. If Ceres is build with ``SuiteSparse`` linked in then
1137 the default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR``
1138 otherwise.
1139
1140.. member:: PreconditionerType Solver::Options::preconditioner_type
1141
1142 Default: ``JACOBI``
1143
1144 The preconditioner used by the iterative linear solver. The default
1145 is the block Jacobi preconditioner. Valid values are (in increasing
1146 order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``,
1147 ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See
1148 :ref:`section-preconditioner` for more details.
1149
1150.. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library
1151
1152 Default:``SUITE_SPARSE``
1153
1154 Ceres supports the use of two sparse linear algebra libraries,
1155 ``SuiteSparse``, which is enabled by setting this parameter to
1156 ``SUITE_SPARSE`` and ``CXSparse``, which can be selected by setting
1157 this parameter to ```CX_SPARSE``. ``SuiteSparse`` is a
1158 sophisticated and complex sparse linear algebra library and should
1159 be used in general. If your needs/platforms prevent you from using
1160 ``SuiteSparse``, consider using ``CXSparse``, which is a much
1161 smaller, easier to build library. As can be expected, its
1162 performance on large problems is not comparable to that of
1163 ``SuiteSparse``.
1164
1165.. member:: int Solver::Options::num_linear_solver_threads
1166
1167 Default: ``1``
1168
1169 Number of threads used by the linear solver.
1170
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001171.. member:: ParameterBlockOrdering* Solver::Options::linear_solver_ordering
1172
1173 Default: ``NULL``
1174
1175 An instance of the ordering object informs the solver about the
1176 desired order in which parameter blocks should be eliminated by the
1177 linear solvers. See section~\ref{sec:ordering`` for more details.
1178
1179 If ``NULL``, the solver is free to choose an ordering that it
Sameer Agarwal09396322013-05-28 22:29:36 -07001180 thinks is best.
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001181
1182 See :ref:`section-ordering` for more details.
1183
Sameer Agarwal09396322013-05-28 22:29:36 -07001184.. member:: bool Solver::Options::use_post_ordering
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001185
Sameer Agarwal09396322013-05-28 22:29:36 -07001186 Default: ``false``
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001187
Sameer Agarwal09396322013-05-28 22:29:36 -07001188 Sparse Cholesky factorization algorithms use a fill-reducing
1189 ordering to permute the columns of the Jacobian matrix. There are
1190 two ways of doing this.
1191
1192 1. Compute the Jacobian matrix in some order and then have the
1193 factorization algorithm permute the columns of the Jacobian.
1194
Sameer Agarwalf0b071b2013-05-31 13:22:51 -07001195 2. Compute the Jacobian with its columns already permuted.
Sameer Agarwal09396322013-05-28 22:29:36 -07001196
1197 The first option incurs a significant memory penalty. The
1198 factorization algorithm has to make a copy of the permuted Jacobian
1199 matrix, thus Ceres pre-permutes the columns of the Jacobian matrix
1200 and generally speaking, there is no performance penalty for doing
1201 so.
1202
1203 In some rare cases, it is worth using a more complicated reordering
1204 algorithm which has slightly better runtime performance at the
1205 expense of an extra copy of the Jacobian matrix. Setting
1206 ``use_postordering`` to ``true`` enables this tradeoff.
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001207
Sameer Agarwaleeedd2e2013-07-07 23:04:31 -07001208.. member:: int Solver::Options::min_linear_solver_iterations
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001209
1210 Default: ``1``
1211
1212 Minimum number of iterations used by the linear solver. This only
1213 makes sense when the linear solver is an iterative solver, e.g.,
1214 ``ITERATIVE_SCHUR`` or ``CGNR``.
1215
Sameer Agarwaleeedd2e2013-07-07 23:04:31 -07001216.. member:: int Solver::Options::max_linear_solver_iterations
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001217
1218 Default: ``500``
1219
1220 Minimum number of iterations used by the linear solver. This only
1221 makes sense when the linear solver is an iterative solver, e.g.,
1222 ``ITERATIVE_SCHUR`` or ``CGNR``.
1223
1224.. member:: double Solver::Options::eta
1225
1226 Default: ``1e-1``
1227
1228 Forcing sequence parameter. The truncated Newton solver uses this
1229 number to control the relative accuracy with which the Newton step
1230 is computed. This constant is passed to
1231 ``ConjugateGradientsSolver`` which uses it to terminate the
1232 iterations when
1233
1234 .. math:: \frac{Q_i - Q_{i-1}}{Q_i} < \frac{\eta}{i}
1235
1236.. member:: bool Solver::Options::jacobi_scaling
1237
1238 Default: ``true``
1239
1240 ``true`` means that the Jacobian is scaled by the norm of its
1241 columns before being passed to the linear solver. This improves the
1242 numerical conditioning of the normal equations.
1243
Sameer Agarwal09396322013-05-28 22:29:36 -07001244.. member:: bool Solver::Options::use_inner_iterations
1245
1246 Default: ``false``
1247
1248 Use a non-linear version of a simplified variable projection
1249 algorithm. Essentially this amounts to doing a further optimization
1250 on each Newton/Trust region step using a coordinate descent
1251 algorithm. For more details, see :ref:`section-inner-iterations`.
1252
1253.. member:: double Solver::Options::inner_itearation_tolerance
1254
1255 Default: ``1e-3``
1256
1257 Generally speaking, inner iterations make significant progress in
1258 the early stages of the solve and then their contribution drops
1259 down sharply, at which point the time spent doing inner iterations
1260 is not worth it.
1261
1262 Once the relative decrease in the objective function due to inner
1263 iterations drops below ``inner_iteration_tolerance``, the use of
1264 inner iterations in subsequent trust region minimizer iterations is
1265 disabled.
1266
1267.. member:: ParameterBlockOrdering* Solver::Options::inner_iteration_ordering
1268
1269 Default: ``NULL``
1270
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -07001271 If :member:`Solver::Options::use_inner_iterations` true, then the
1272 user has two choices.
Sameer Agarwal09396322013-05-28 22:29:36 -07001273
1274 1. Let the solver heuristically decide which parameter blocks to
1275 optimize in each inner iteration. To do this, set
1276 :member:`Solver::Options::inner_iteration_ordering` to ``NULL``.
1277
1278 2. Specify a collection of of ordered independent sets. The lower
1279 numbered groups are optimized before the higher number groups
1280 during the inner optimization phase. Each group must be an
1281 independent set. Not all parameter blocks need to be included in
1282 the ordering.
1283
1284 See :ref:`section-ordering` for more details.
1285
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001286.. member:: LoggingType Solver::Options::logging_type
1287
1288 Default: ``PER_MINIMIZER_ITERATION``
1289
1290.. member:: bool Solver::Options::minimizer_progress_to_stdout
1291
1292 Default: ``false``
1293
1294 By default the :class:`Minimizer` progress is logged to ``STDERR``
1295 depending on the ``vlog`` level. If this flag is set to true, and
1296 :member:`Solver::Options::logging_type` is not ``SILENT``, the logging
1297 output is sent to ``STDOUT``.
1298
Sameer Agarwal44376392013-06-03 09:20:49 -07001299 For ``TRUST_REGION_MINIMIZER`` the progress display looks like
1300
1301 .. code-block:: bash
1302
1303 0: f: 1.250000e+01 d: 0.00e+00 g: 5.00e+00 h: 0.00e+00 rho: 0.00e+00 mu: 1.00e+04 li: 0 it: 6.91e-06 tt: 1.91e-03
1304 1: f: 1.249750e-07 d: 1.25e+01 g: 5.00e-04 h: 5.00e+00 rho: 1.00e+00 mu: 3.00e+04 li: 1 it: 2.81e-05 tt: 1.99e-03
1305 2: f: 1.388518e-16 d: 1.25e-07 g: 1.67e-08 h: 5.00e-04 rho: 1.00e+00 mu: 9.00e+04 li: 1 it: 1.00e-05 tt: 2.01e-03
1306
1307 Here
1308
1309 #. ``f`` is the value of the objective function.
1310 #. ``d`` is the change in the value of the objective function if
1311 the step computed in this iteration is accepted.
1312 #. ``g`` is the max norm of the gradient.
1313 #. ``h`` is the change in the parameter vector.
1314 #. ``rho`` is the ratio of the actual change in the objective
1315 function value to the change in the the value of the trust
1316 region model.
1317 #. ``mu`` is the size of the trust region radius.
1318 #. ``li`` is the number of linear solver iterations used to compute
1319 the trust region step. For direct/factorization based solvers it
1320 is always 1, for iterative solvers like ``ITERATIVE_SCHUR`` it
1321 is the number of iterations of the Conjugate Gradients
1322 algorithm.
1323 #. ``it`` is the time take by the current iteration.
1324 #. ``tt`` is the the total time taken by the minimizer.
1325
1326 For ``LINE_SEARCH_MINIMIZER`` the progress display looks like
1327
1328 .. code-block:: bash
1329
1330 0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e: 0 it: 2.98e-02 tt: 8.50e-02
1331 1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e: 1 it: 4.54e-02 tt: 1.31e-01
1332 2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e: 1 it: 4.96e-02 tt: 1.81e-01
1333
1334 Here
1335
1336 #. ``f`` is the value of the objective function.
1337 #. ``d`` is the change in the value of the objective function if
1338 the step computed in this iteration is accepted.
1339 #. ``g`` is the max norm of the gradient.
1340 #. ``h`` is the change in the parameter vector.
1341 #. ``s`` is the optimal step length computed by the line search.
1342 #. ``it`` is the time take by the current iteration.
1343 #. ``tt`` is the the total time taken by the minimizer.
1344
Sameer Agarwalc4a32912013-06-13 22:00:48 -07001345.. member:: vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001346
1347 Default: ``empty``
1348
Sameer Agarwalc4a32912013-06-13 22:00:48 -07001349 List of iterations at which the trust region minimizer should dump
1350 the trust region problem. Useful for testing and benchmarking. If
1351 ``empty``, no problems are dumped.
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001352
Sameer Agarwalc4a32912013-06-13 22:00:48 -07001353.. member:: string Solver::Options::trust_region_problem_dump_directory
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001354
1355 Default: ``/tmp``
1356
Sameer Agarwalc4a32912013-06-13 22:00:48 -07001357 Directory to which the problems should be written to. Should be
1358 non-empty if
1359 :member:`Solver::Options::trust_region_minimizer_iterations_to_dump` is
1360 non-empty and
1361 :member:`Solver::Options::trust_region_problem_dump_format_type` is not
1362 ``CONSOLE``.
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001363
Sameer Agarwalc4a32912013-06-13 22:00:48 -07001364.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001365
1366 Default: ``TEXTFILE``
1367
Sameer Agarwalc4a32912013-06-13 22:00:48 -07001368 The format in which trust region problems should be logged when
1369 :member:`Solver::Options::trust_region_minimizer_iterations_to_dump`
1370 is non-empty. There are three options:
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001371
1372 * ``CONSOLE`` prints the linear least squares problem in a human
1373 readable format to ``stderr``. The Jacobian is printed as a
1374 dense matrix. The vectors :math:`D`, :math:`x` and :math:`f` are
1375 printed as dense vectors. This should only be used for small
1376 problems.
1377
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001378 * ``TEXTFILE`` Write out the linear least squares problem to the
Sameer Agarwalc4a32912013-06-13 22:00:48 -07001379 directory pointed to by
1380 :member:`Solver::Options::trust_region_problem_dump_directory` as
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001381 text files which can be read into ``MATLAB/Octave``. The Jacobian
1382 is dumped as a text file containing :math:`(i,j,s)` triplets, the
1383 vectors :math:`D`, `x` and `f` are dumped as text files
1384 containing a list of their values.
1385
Sameer Agarwalc4a32912013-06-13 22:00:48 -07001386 A ``MATLAB/Octave`` script called
1387 ``ceres_solver_iteration_???.m`` is also output, which can be
1388 used to parse and load the problem into memory.
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001389
1390.. member:: bool Solver::Options::check_gradients
1391
1392 Default: ``false``
1393
1394 Check all Jacobians computed by each residual block with finite
1395 differences. This is expensive since it involves computing the
1396 derivative by normal means (e.g. user specified, autodiff, etc),
1397 then also computing it using finite differences. The results are
1398 compared, and if they differ substantially, details are printed to
1399 the log.
1400
1401.. member:: double Solver::Options::gradient_check_relative_precision
1402
1403 Default: ``1e08``
1404
1405 Precision to check for in the gradient checker. If the relative
1406 difference between an element in a Jacobian exceeds this number,
1407 then the Jacobian for that cost term is dumped.
1408
1409.. member:: double Solver::Options::numeric_derivative_relative_step_size
1410
1411 Default: ``1e-6``
1412
1413 Relative shift used for taking numeric derivatives. For finite
1414 differencing, each dimension is evaluated at slightly shifted
1415 values, e.g., for forward differences, the numerical derivative is
1416
1417 .. math::
1418
1419 \delta &= numeric\_derivative\_relative\_step\_size\\
1420 \Delta f &= \frac{f((1 + \delta) x) - f(x)}{\delta x}
1421
1422 The finite differencing is done along each dimension. The reason to
1423 use a relative (rather than absolute) step size is that this way,
1424 numeric differentiation works for functions where the arguments are
1425 typically large (e.g. :math:`10^9`) and when the values are small
1426 (e.g. :math:`10^{-5}`). It is possible to construct *torture cases*
1427 which break this finite difference heuristic, but they do not come
1428 up often in practice.
1429
1430.. member:: vector<IterationCallback> Solver::Options::callbacks
1431
1432 Callbacks that are executed at the end of each iteration of the
1433 :class:`Minimizer`. They are executed in the order that they are
1434 specified in this vector. By default, parameter blocks are updated
1435 only at the end of the optimization, i.e when the
1436 :class:`Minimizer` terminates. This behavior is controlled by
Sameer Agarwalc5bcfc02013-07-19 15:50:27 -07001437 :member:`Solver::Options::update_state_every_variable`. If the user
1438 wishes to have access to the update parameter blocks when his/her
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001439 callbacks are executed, then set
1440 :member:`Solver::Options::update_state_every_iteration` to true.
1441
1442 The solver does NOT take ownership of these pointers.
1443
1444.. member:: bool Solver::Options::update_state_every_iteration
1445
1446 Default: ``false``
1447
1448 Normally the parameter blocks are only updated when the solver
1449 terminates. Setting this to true update them in every
1450 iteration. This setting is useful when building an interactive
1451 application using Ceres and using an :class:`IterationCallback`.
1452
1453.. member:: string Solver::Options::solver_log
1454
1455 Default: ``empty``
1456
1457 If non-empty, a summary of the execution of the solver is recorded
1458 to this file. This file is used for recording and Ceres'
1459 performance. Currently, only the iteration number, total time and
1460 the objective function value are logged. The format of this file is
1461 expected to change over time as the performance evaluation
1462 framework is fleshed out.
1463
1464:class:`ParameterBlockOrdering`
1465-------------------------------
1466
1467.. class:: ParameterBlockOrdering
1468
Sameer Agarwal09396322013-05-28 22:29:36 -07001469 ``ParameterBlockOrdering`` is a class for storing and manipulating
1470 an ordered collection of groups/sets with the following semantics:
1471
1472 Group IDs are non-negative integer values. Elements are any type
1473 that can serve as a key in a map or an element of a set.
1474
1475 An element can only belong to one group at a time. A group may
1476 contain an arbitrary number of elements.
1477
1478 Groups are ordered by their group id.
1479
1480.. function:: bool ParameterBlockOrdering::AddElementToGroup(const double* element, const int group)
1481
1482 Add an element to a group. If a group with this id does not exist,
1483 one is created. This method can be called any number of times for
1484 the same element. Group ids should be non-negative numbers. Return
1485 value indicates if adding the element was a success.
1486
1487.. function:: void ParameterBlockOrdering::Clear()
1488
1489 Clear the ordering.
1490
1491.. function:: bool ParameterBlockOrdering::Remove(const double* element)
1492
1493 Remove the element, no matter what group it is in. If the element
1494 is not a member of any group, calling this method will result in a
1495 crash. Return value indicates if the element was actually removed.
1496
1497.. function:: void ParameterBlockOrdering::Reverse()
1498
1499 Reverse the order of the groups in place.
1500
1501.. function:: int ParameterBlockOrdering::GroupId(const double* element) const
1502
1503 Return the group id for the element. If the element is not a member
1504 of any group, return -1.
1505
1506.. function:: bool ParameterBlockOrdering::IsMember(const double* element) const
1507
1508 True if there is a group containing the parameter block.
1509
1510.. function:: int ParameterBlockOrdering::GroupSize(const int group) const
1511
1512 This function always succeeds, i.e., implicitly there exists a
1513 group for every integer.
1514
1515.. function:: int ParameterBlockOrdering::NumElements() const
1516
1517 Number of elements in the ordering.
1518
1519.. function:: int ParameterBlockOrdering::NumGroups() const
1520
1521 Number of groups with one or more elements.
1522
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001523
1524:class:`IterationCallback`
1525--------------------------
1526
Sameer Agarwalebbb9842013-05-26 12:40:12 -07001527.. class:: IterationSummary
1528
1529 :class:`IterationSummary` describes the state of the optimizer
Sameer Agarwalf0b071b2013-05-31 13:22:51 -07001530 after each iteration of the minimization. Note that all times are
1531 wall times.
Sameer Agarwalebbb9842013-05-26 12:40:12 -07001532
1533 .. code-block:: c++
1534
1535 struct IterationSummary {
1536 // Current iteration number.
1537 int32 iteration;
1538
1539 // Step was numerically valid, i.e., all values are finite and the
1540 // step reduces the value of the linearized model.
1541 //
1542 // Note: step_is_valid is false when iteration = 0.
1543 bool step_is_valid;
1544
1545 // Step did not reduce the value of the objective function
1546 // sufficiently, but it was accepted because of the relaxed
1547 // acceptance criterion used by the non-monotonic trust region
1548 // algorithm.
1549 //
1550 // Note: step_is_nonmonotonic is false when iteration = 0;
1551 bool step_is_nonmonotonic;
1552
1553 // Whether or not the minimizer accepted this step or not. If the
1554 // ordinary trust region algorithm is used, this means that the
1555 // relative reduction in the objective function value was greater
1556 // than Solver::Options::min_relative_decrease. However, if the
1557 // non-monotonic trust region algorithm is used
1558 // (Solver::Options:use_nonmonotonic_steps = true), then even if the
1559 // relative decrease is not sufficient, the algorithm may accept the
1560 // step and the step is declared successful.
1561 //
1562 // Note: step_is_successful is false when iteration = 0.
1563 bool step_is_successful;
1564
1565 // Value of the objective function.
1566 double cost;
1567
1568 // Change in the value of the objective function in this
1569 // iteration. This can be positive or negative.
1570 double cost_change;
1571
1572 // Infinity norm of the gradient vector.
1573 double gradient_max_norm;
1574
1575 // 2-norm of the size of the step computed by the optimization
1576 // algorithm.
1577 double step_norm;
1578
1579 // For trust region algorithms, the ratio of the actual change in
1580 // cost and the change in the cost of the linearized approximation.
1581 double relative_decrease;
1582
1583 // Size of the trust region at the end of the current iteration. For
1584 // the Levenberg-Marquardt algorithm, the regularization parameter
1585 // mu = 1.0 / trust_region_radius.
1586 double trust_region_radius;
1587
1588 // For the inexact step Levenberg-Marquardt algorithm, this is the
1589 // relative accuracy with which the Newton(LM) step is solved. This
1590 // number affects only the iterative solvers capable of solving
1591 // linear systems inexactly. Factorization-based exact solvers
1592 // ignore it.
1593 double eta;
1594
1595 // Step sized computed by the line search algorithm.
1596 double step_size;
1597
1598 // Number of function evaluations used by the line search algorithm.
1599 int line_search_function_evaluations;
1600
1601 // Number of iterations taken by the linear solver to solve for the
1602 // Newton step.
1603 int linear_solver_iterations;
1604
1605 // Time (in seconds) spent inside the minimizer loop in the current
1606 // iteration.
1607 double iteration_time_in_seconds;
1608
1609 // Time (in seconds) spent inside the trust region step solver.
1610 double step_solver_time_in_seconds;
1611
1612 // Time (in seconds) since the user called Solve().
1613 double cumulative_time_in_seconds;
1614 };
1615
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001616.. class:: IterationCallback
1617
Sameer Agarwalebbb9842013-05-26 12:40:12 -07001618 .. code-block:: c++
1619
1620 class IterationCallback {
1621 public:
1622 virtual ~IterationCallback() {}
1623 virtual CallbackReturnType operator()(const IterationSummary& summary) = 0;
1624 };
1625
1626 Interface for specifying callbacks that are executed at the end of
1627 each iteration of the Minimizer. The solver uses the return value of
1628 ``operator()`` to decide whether to continue solving or to
1629 terminate. The user can return three values.
1630
1631 #. ``SOLVER_ABORT`` indicates that the callback detected an abnormal
1632 situation. The solver returns without updating the parameter
1633 blocks (unless ``Solver::Options::update_state_every_iteration`` is
1634 set true). Solver returns with ``Solver::Summary::termination_type``
1635 set to ``USER_ABORT``.
1636
1637 #. ``SOLVER_TERMINATE_SUCCESSFULLY`` indicates that there is no need
1638 to optimize anymore (some user specified termination criterion
1639 has been met). Solver returns with
1640 ``Solver::Summary::termination_type``` set to ``USER_SUCCESS``.
1641
1642 #. ``SOLVER_CONTINUE`` indicates that the solver should continue
1643 optimizing.
1644
1645 For example, the following ``IterationCallback`` is used internally
1646 by Ceres to log the progress of the optimization.
1647
1648 .. code-block:: c++
1649
1650 class LoggingCallback : public IterationCallback {
1651 public:
1652 explicit LoggingCallback(bool log_to_stdout)
1653 : log_to_stdout_(log_to_stdout) {}
1654
1655 ~LoggingCallback() {}
1656
1657 CallbackReturnType operator()(const IterationSummary& summary) {
1658 const char* kReportRowFormat =
1659 "% 4d: f:% 8e d:% 3.2e g:% 3.2e h:% 3.2e "
1660 "rho:% 3.2e mu:% 3.2e eta:% 3.2e li:% 3d";
1661 string output = StringPrintf(kReportRowFormat,
1662 summary.iteration,
1663 summary.cost,
1664 summary.cost_change,
1665 summary.gradient_max_norm,
1666 summary.step_norm,
1667 summary.relative_decrease,
1668 summary.trust_region_radius,
1669 summary.eta,
1670 summary.linear_solver_iterations);
1671 if (log_to_stdout_) {
1672 cout << output << endl;
1673 } else {
1674 VLOG(1) << output;
1675 }
1676 return SOLVER_CONTINUE;
1677 }
1678
1679 private:
1680 const bool log_to_stdout_;
1681 };
1682
1683
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001684
1685:class:`CRSMatrix`
1686------------------
1687
1688.. class:: CRSMatrix
1689
Sameer Agarwalebbb9842013-05-26 12:40:12 -07001690 .. code-block:: c++
1691
1692 struct CRSMatrix {
1693 int num_rows;
1694 int num_cols;
1695 vector<int> cols;
1696 vector<int> rows;
1697 vector<double> values;
1698 };
1699
1700 A compressed row sparse matrix used primarily for communicating the
1701 Jacobian matrix to the user.
1702
1703 A compressed row matrix stores its contents in three arrays,
1704 ``rows``, ``cols`` and ``values``.
1705
1706 ``rows`` is a ``num_rows + 1`` sized array that points into the ``cols`` and
1707 ``values`` array. For each row ``i``:
1708
1709 ``cols[rows[i]]`` ... ``cols[rows[i + 1] - 1]`` are the indices of the
1710 non-zero columns of row ``i``.
1711
1712 ``values[rows[i]]`` ... ``values[rows[i + 1] - 1]`` are the values of the
1713 corresponding entries.
1714
1715 ``cols`` and ``values`` contain as many entries as there are
1716 non-zeros in the matrix.
1717
1718 e.g, consider the 3x4 sparse matrix
1719
1720 .. code-block:: c++
1721
1722 0 10 0 4
1723 0 2 -3 2
1724 1 2 0 0
1725
1726 The three arrays will be:
1727
1728 .. code-block:: c++
1729
1730 -row0- ---row1--- -row2-
1731 rows = [ 0, 2, 5, 7]
1732 cols = [ 1, 3, 1, 2, 3, 0, 1]
1733 values = [10, 4, 2, -3, 2, 1, 2]
1734
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001735
1736:class:`Solver::Summary`
1737------------------------
1738
1739.. class:: Solver::Summary
1740
Sameer Agarwalf0b071b2013-05-31 13:22:51 -07001741 Note that all times reported in this struct are wall times.
1742
Sameer Agarwalebbb9842013-05-26 12:40:12 -07001743 .. code-block:: c++
1744
1745 struct Summary {
1746 // A brief one line description of the state of the solver after
1747 // termination.
1748 string BriefReport() const;
1749
1750 // A full multiline description of the state of the solver after
1751 // termination.
1752 string FullReport() const;
1753
1754 // Minimizer summary -------------------------------------------------
1755 MinimizerType minimizer_type;
1756
1757 SolverTerminationType termination_type;
1758
1759 // If the solver did not run, or there was a failure, a
1760 // description of the error.
1761 string error;
1762
1763 // Cost of the problem before and after the optimization. See
1764 // problem.h for definition of the cost of a problem.
1765 double initial_cost;
1766 double final_cost;
1767
1768 // The part of the total cost that comes from residual blocks that
1769 // were held fixed by the preprocessor because all the parameter
1770 // blocks that they depend on were fixed.
1771 double fixed_cost;
1772
1773 vector<IterationSummary> iterations;
1774
1775 int num_successful_steps;
1776 int num_unsuccessful_steps;
1777 int num_inner_iteration_steps;
1778
1779 // When the user calls Solve, before the actual optimization
1780 // occurs, Ceres performs a number of preprocessing steps. These
1781 // include error checks, memory allocations, and reorderings. This
1782 // time is accounted for as preprocessing time.
1783 double preprocessor_time_in_seconds;
1784
1785 // Time spent in the TrustRegionMinimizer.
1786 double minimizer_time_in_seconds;
1787
1788 // After the Minimizer is finished, some time is spent in
1789 // re-evaluating residuals etc. This time is accounted for in the
1790 // postprocessor time.
1791 double postprocessor_time_in_seconds;
1792
1793 // Some total of all time spent inside Ceres when Solve is called.
1794 double total_time_in_seconds;
1795
1796 double linear_solver_time_in_seconds;
1797 double residual_evaluation_time_in_seconds;
1798 double jacobian_evaluation_time_in_seconds;
1799 double inner_iteration_time_in_seconds;
1800
1801 // Preprocessor summary.
1802 int num_parameter_blocks;
1803 int num_parameters;
1804 int num_effective_parameters;
1805 int num_residual_blocks;
1806 int num_residuals;
1807
1808 int num_parameter_blocks_reduced;
1809 int num_parameters_reduced;
1810 int num_effective_parameters_reduced;
1811 int num_residual_blocks_reduced;
1812 int num_residuals_reduced;
1813
1814 int num_eliminate_blocks_given;
1815 int num_eliminate_blocks_used;
1816
1817 int num_threads_given;
1818 int num_threads_used;
1819
1820 int num_linear_solver_threads_given;
1821 int num_linear_solver_threads_used;
1822
1823 LinearSolverType linear_solver_type_given;
1824 LinearSolverType linear_solver_type_used;
1825
1826 vector<int> linear_solver_ordering_given;
1827 vector<int> linear_solver_ordering_used;
1828
1829 bool inner_iterations_given;
1830 bool inner_iterations_used;
1831
1832 vector<int> inner_iteration_ordering_given;
1833 vector<int> inner_iteration_ordering_used;
1834
1835 PreconditionerType preconditioner_type;
1836
1837 TrustRegionStrategyType trust_region_strategy_type;
1838 DoglegType dogleg_type;
1839
1840 SparseLinearAlgebraLibraryType sparse_linear_algebra_library;
1841
1842 LineSearchDirectionType line_search_direction_type;
1843 LineSearchType line_search_type;
1844 int max_lbfgs_rank;
1845 };
1846
1847
Sameer Agarwal1f17f562013-06-06 10:30:50 -07001848Covariance Estimation
1849=====================
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001850
Sameer Agarwal1f17f562013-06-06 10:30:50 -07001851Background
1852----------
Sameer Agarwal3d87b722013-02-02 00:49:31 -08001853
Sameer Agarwal1f17f562013-06-06 10:30:50 -07001854One way to assess the quality of the solution returned by a
1855non-linear least squares solve is to analyze the covariance of the
1856solution.
1857
1858Let us consider the non-linear regression problem
1859
1860.. math:: y = f(x) + N(0, I)
1861
1862i.e., the observation :math:`y` is a random non-linear function of the
1863independent variable :math:`x` with mean :math:`f(x)` and identity
1864covariance. Then the maximum likelihood estimate of :math:`x` given
1865observations :math:`y` is the solution to the non-linear least squares
1866problem:
1867
1868.. math:: x^* = \arg \min_x \|f(x)\|^2
1869
1870And the covariance of :math:`x^*` is given by
1871
1872.. math:: C(x^*) = \left(J'(x^*)J(x^*)\right)^{-1}
1873
1874Here :math:`J(x^*)` is the Jacobian of :math:`f` at :math:`x^*`. The
1875above formula assumes that :math:`J(x^*)` has full column rank.
1876
1877If :math:`J(x^*)` is rank deficient, then the covariance matrix :math:`C(x^*)`
1878is also rank deficient and is given by the Moore-Penrose pseudo inverse.
1879
1880.. math:: C(x^*) = \left(J'(x^*)J(x^*)\right)^{\dagger}
1881
1882Note that in the above, we assumed that the covariance matrix for
1883:math:`y` was identity. This is an important assumption. If this is
1884not the case and we have
1885
1886.. math:: y = f(x) + N(0, S)
1887
1888Where :math:`S` is a positive semi-definite matrix denoting the
1889covariance of :math:`y`, then the maximum likelihood problem to be
1890solved is
1891
1892.. math:: x^* = \arg \min_x f'(x) S^{-1} f(x)
1893
1894and the corresponding covariance estimate of :math:`x^*` is given by
1895
1896.. math:: C(x^*) = \left(J'(x^*) S^{-1} J(x^*)\right)^{-1}
1897
1898So, if it is the case that the observations being fitted to have a
1899covariance matrix not equal to identity, then it is the user's
1900responsibility that the corresponding cost functions are correctly
1901scaled, e.g. in the above case the cost function for this problem
1902should evaluate :math:`S^{-1/2} f(x)` instead of just :math:`f(x)`,
1903where :math:`S^{-1/2}` is the inverse square root of the covariance
1904matrix :math:`S`.
1905
1906Gauge Invariance
1907----------------
1908
1909In structure from motion (3D reconstruction) problems, the
1910reconstruction is ambiguous upto a similarity transform. This is
1911known as a *Gauge Ambiguity*. Handling Gauges correctly requires the
1912use of SVD or custom inversion algorithms. For small problems the
1913user can use the dense algorithm. For more details see the work of
1914Kanatani & Morris [KanataniMorris]_.
1915
1916
1917:class:`Covariance`
1918-------------------
1919
1920:class:`Covariance` allows the user to evaluate the covariance for a
1921non-linear least squares problem and provides random access to its
1922blocks. The computation assumes that the cost functions compute
1923residuals such that their covariance is identity.
1924
1925Since the computation of the covariance matrix requires computing the
1926inverse of a potentially large matrix, this can involve a rather large
1927amount of time and memory. However, it is usually the case that the
1928user is only interested in a small part of the covariance
1929matrix. Quite often just the block diagonal. :class:`Covariance`
1930allows the user to specify the parts of the covariance matrix that she
1931is interested in and then uses this information to only compute and
1932store those parts of the covariance matrix.
1933
1934Rank of the Jacobian
1935--------------------
1936
1937As we noted above, if the Jacobian is rank deficient, then the inverse
1938of :math:`J'J` is not defined and instead a pseudo inverse needs to be
1939computed.
1940
1941The rank deficiency in :math:`J` can be *structural* -- columns
1942which are always known to be zero or *numerical* -- depending on the
1943exact values in the Jacobian.
1944
1945Structural rank deficiency occurs when the problem contains parameter
1946blocks that are constant. This class correctly handles structural rank
1947deficiency like that.
1948
1949Numerical rank deficiency, where the rank of the matrix cannot be
1950predicted by its sparsity structure and requires looking at its
1951numerical values is more complicated. Here again there are two
1952cases.
1953
1954 a. The rank deficiency arises from overparameterization. e.g., a
1955 four dimensional quaternion used to parameterize :math:`SO(3)`,
1956 which is a three dimensional manifold. In cases like this, the
1957 user should use an appropriate
1958 :class:`LocalParameterization`. Not only will this lead to better
1959 numerical behaviour of the Solver, it will also expose the rank
1960 deficiency to the :class:`Covariance` object so that it can
1961 handle it correctly.
1962
1963 b. More general numerical rank deficiency in the Jacobian requires
1964 the computation of the so called Singular Value Decomposition
1965 (SVD) of :math:`J'J`. We do not know how to do this for large
1966 sparse matrices efficiently. For small and moderate sized
1967 problems this is done using dense linear algebra.
1968
1969
1970:class:`Covariance::Options`
1971
1972.. class:: Covariance::Options
1973
1974.. member:: int Covariance::Options::num_threads
1975
1976 Default: ``1``
1977
1978 Number of threads to be used for evaluating the Jacobian and
1979 estimation of covariance.
1980
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07001981.. member:: CovarianceAlgorithmType Covariance::Options::algorithm_type
Sameer Agarwal1f17f562013-06-06 10:30:50 -07001982
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07001983 Default: ``SPARSE_QR`` or ``DENSE_SVD``
Sameer Agarwal1f17f562013-06-06 10:30:50 -07001984
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07001985 Ceres supports three different algorithms for covariance
1986 estimation, which represent different tradeoffs in speed, accuracy
1987 and reliability.
Sameer Agarwal1f17f562013-06-06 10:30:50 -07001988
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07001989 1. ``DENSE_SVD`` uses ``Eigen``'s ``JacobiSVD`` to perform the
1990 computations. It computes the singular value decomposition
Sameer Agarwal1f17f562013-06-06 10:30:50 -07001991
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07001992 .. math:: U S V^\top = J
1993
1994 and then uses it to compute the pseudo inverse of J'J as
1995
1996 .. math:: (J'J)^{\dagger} = V S^{\dagger} V^\top
1997
1998 It is an accurate but slow method and should only be used for
1999 small to moderate sized problems. It can handle full-rank as
2000 well as rank deficient Jacobians.
2001
2002 2. ``SPARSE_CHOLESKY`` uses the ``CHOLMOD`` sparse Cholesky
2003 factorization library to compute the decomposition :
2004
2005 .. math:: R^\top R = J^\top J
2006
2007 and then
2008
2009 .. math:: \left(J^\top J\right)^{-1} = \left(R^\top R\right)^{-1}
2010
2011 It a fast algorithm for sparse matrices that should be used when
2012 the Jacobian matrix J is well conditioned. For ill-conditioned
2013 matrices, this algorithm can fail unpredictabily. This is
2014 because Cholesky factorization is not a rank-revealing
2015 factorization, i.e., it cannot reliably detect when the matrix
2016 being factorized is not of full
2017 rank. ``SuiteSparse``/``CHOLMOD`` supplies a heuristic for
2018 checking if the matrix is rank deficient (cholmod_rcond), but it
2019 is only a heuristic and can have both false positive and false
2020 negatives.
2021
2022 Recent versions of ``SuiteSparse`` (>= 4.2.0) provide a much more
2023 efficient method for solving for rows of the covariance
2024 matrix. Therefore, if you are doing ``SPARSE_CHOLESKY``, we strongly
2025 recommend using a recent version of ``SuiteSparse``.
2026
2027 3. ``SPARSE_QR`` uses the ``SuiteSparseQR`` sparse QR factorization
2028 library to compute the decomposition
2029
2030 .. math::
2031
2032 QR &= J\\
2033 \left(J^\top J\right)^{-1} &= \left(R^\top R\right)^{-1}
2034
2035 It is a moderately fast algorithm for sparse matrices, which at
2036 the price of more time and memory than the ``SPARSE_CHOLESKY``
2037 algorithm is numerically better behaved and is rank revealing,
2038 i.e., it can reliably detect when the Jacobian matrix is rank
2039 deficient.
2040
2041 Neither ``SPARSE_CHOLESKY`` or ``SPARSE_QR`` are capable of computing
2042 the covariance if the Jacobian is rank deficient.
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002043
2044.. member:: int Covariance::Options::min_reciprocal_condition_number
2045
2046 Default: :math:`10^{-14}`
2047
2048 If the Jacobian matrix is near singular, then inverting :math:`J'J`
2049 will result in unreliable results, e.g, if
2050
2051 .. math::
2052
2053 J = \begin{bmatrix}
2054 1.0& 1.0 \\
2055 1.0& 1.0000001
2056 \end{bmatrix}
2057
2058 which is essentially a rank deficient matrix, we have
2059
2060 .. math::
2061
2062 (J'J)^{-1} = \begin{bmatrix}
2063 2.0471e+14& -2.0471e+14 \\
2064 -2.0471e+14 2.0471e+14
2065 \end{bmatrix}
2066
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002067
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002068 This is not a useful result. Therefore, by default
2069 :func:`Covariance::Compute` will return ``false`` if a rank
2070 deficient Jacobian is encountered. How rank deficiency is detected
2071 depends on the algorithm being used.
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002072
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002073 1. ``DENSE_SVD``
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002074
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002075 .. math:: \frac{\sigma_{\text{min}}}{\sigma_{\text{max}}} < \sqrt{\text{min_reciprocal_condition_number}}
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002076
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002077 where :math:`\sigma_{\text{min}}` and
2078 :math:`\sigma_{\text{max}}` are the minimum and maxiumum
2079 singular values of :math:`J` respectively.
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002080
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002081 2. ``SPARSE_CHOLESKY``
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002082
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002083 .. math:: \text{cholmod_rcond} < \text{min_reciprocal_conditioner_number}
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002084
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002085 Here cholmod_rcond is a crude estimate of the reciprocal
2086 condition number of :math:`J^\top J` by using the maximum and
2087 minimum diagonal entries of the Cholesky factor :math:`R`. There
2088 are no theoretical guarantees associated with this test. It can
2089 give false positives and negatives. Use at your own risk. The
2090 default value of ``min_reciprocal_condition_number`` has been
2091 set to a conservative value, and sometimes the
2092 :func:`Covariance::Compute` may return false even if it is
2093 possible to estimate the covariance reliably. In such cases, the
2094 user should exercise their judgement before lowering the value
2095 of ``min_reciprocal_condition_number``.
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002096
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002097 3. ``SPARSE_QR``
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002098
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002099 .. math:: \operatorname{rank}(J) < \operatorname{num\_col}(J)
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002100
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002101 Here :\math:`\operatorname{rank}(J)` is the estimate of the
2102 rank of `J` returned by the ``SuiteSparseQR`` algorithm. It is
2103 a fairly reliable indication of rank deficiency.
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002104
2105.. member:: int Covariance::Options::null_space_rank
2106
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002107 When using ``DENSE_SVD``, the user has more control in dealing
2108 with singular and near singular covariance matrices.
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002109
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002110 As mentioned above, when the covariance matrix is near singular,
2111 instead of computing the inverse of :math:`J'J`, the Moore-Penrose
2112 pseudoinverse of :math:`J'J` should be computed.
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002113
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002114 If :math:`J'J` has the eigen decomposition :math:`(\lambda_i,
2115 e_i)`, where :math:`lambda_i` is the :math:`i^\textrm{th}`
2116 eigenvalue and :math:`e_i` is the corresponding eigenvector, then
2117 the inverse of :math:`J'J` is
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002118
Sameer Agarwal42be9ca2013-07-18 08:02:08 -07002119 .. math:: (J'J)^{-1} = \sum_i \frac{1}{\lambda_i} e_i e_i'
2120
2121 and computing the pseudo inverse involves dropping terms from this
2122 sum that correspond to small eigenvalues.
2123
2124 How terms are dropped is controlled by
2125 `min_reciprocal_condition_number` and `null_space_rank`.
2126
2127 If `null_space_rank` is non-negative, then the smallest
2128 `null_space_rank` eigenvalue/eigenvectors are dropped irrespective
2129 of the magnitude of :math:`\lambda_i`. If the ratio of the
2130 smallest non-zero eigenvalue to the largest eigenvalue in the
2131 truncated matrix is still below min_reciprocal_condition_number,
2132 then the `Covariance::Compute()` will fail and return `false`.
2133
2134 Setting `null_space_rank = -1` drops all terms for which
2135
2136 .. math:: \frac{\lambda_i}{\lambda_{\textrm{max}}} < \textrm{min_reciprocal_condition_number}
2137
2138 This option has no effect on ``SPARSE_QR`` and ``SPARSE_CHOLESKY``
2139 algorithms.
Sameer Agarwal1f17f562013-06-06 10:30:50 -07002140
2141.. member:: bool Covariance::Options::apply_loss_function
2142
2143 Default: `true`
2144
2145 Even though the residual blocks in the problem may contain loss
2146 functions, setting ``apply_loss_function`` to false will turn off
2147 the application of the loss function to the output of the cost
2148 function and in turn its effect on the covariance.
2149
2150.. class:: Covariance
2151
2152 :class:`Covariance::Options` as the name implies is used to control
2153 the covariance estimation algorithm. Covariance estimation is a
2154 complicated and numerically sensitive procedure. Please read the
2155 entire documentation for :class:`Covariance::Options` before using
2156 :class:`Covariance`.
2157
2158.. function:: bool Covariance::Compute(const vector<pair<const double*, const double*> >& covariance_blocks, Problem* problem)
2159
2160 Compute a part of the covariance matrix.
2161
2162 The vector ``covariance_blocks``, indexes into the covariance
2163 matrix block-wise using pairs of parameter blocks. This allows the
2164 covariance estimation algorithm to only compute and store these
2165 blocks.
2166
2167 Since the covariance matrix is symmetric, if the user passes
2168 ``<block1, block2>``, then ``GetCovarianceBlock`` can be called with
2169 ``block1``, ``block2`` as well as ``block2``, ``block1``.
2170
2171 ``covariance_blocks`` cannot contain duplicates. Bad things will
2172 happen if they do.
2173
2174 Note that the list of ``covariance_blocks`` is only used to
2175 determine what parts of the covariance matrix are computed. The
2176 full Jacobian is used to do the computation, i.e. they do not have
2177 an impact on what part of the Jacobian is used for computation.
2178
2179 The return value indicates the success or failure of the covariance
2180 computation. Please see the documentation for
2181 :class:`Covariance::Options` for more on the conditions under which
2182 this function returns ``false``.
2183
2184.. function:: bool GetCovarianceBlock(const double* parameter_block1, const double* parameter_block2, double* covariance_block) const
2185
2186 Return the block of the covariance matrix corresponding to
2187 ``parameter_block1`` and ``parameter_block2``.
2188
2189 Compute must be called before the first call to ``GetCovarianceBlock``
2190 and the pair ``<parameter_block1, parameter_block2>`` OR the pair
2191 ``<parameter_block2, parameter_block1>`` must have been present in the
2192 vector covariance_blocks when ``Compute`` was called. Otherwise
2193 ``GetCovarianceBlock`` will return false.
2194
2195 ``covariance_block`` must point to a memory location that can store
2196 a ``parameter_block1_size x parameter_block2_size`` matrix. The
2197 returned covariance will be a row-major matrix.
2198
2199Example Usage
2200-------------
2201
2202.. code-block:: c++
2203
2204 double x[3];
2205 double y[2];
2206
2207 Problem problem;
2208 problem.AddParameterBlock(x, 3);
2209 problem.AddParameterBlock(y, 2);
2210 <Build Problem>
2211 <Solve Problem>
2212
2213 Covariance::Options options;
2214 Covariance covariance(options);
2215
2216 vector<pair<const double*, const double*> > covariance_blocks;
2217 covariance_blocks.push_back(make_pair(x, x));
2218 covariance_blocks.push_back(make_pair(y, y));
2219 covariance_blocks.push_back(make_pair(x, y));
2220
2221 CHECK(covariance.Compute(covariance_blocks, &problem));
2222
2223 double covariance_xx[3 * 3];
2224 double covariance_yy[2 * 2];
2225 double covariance_xy[3 * 2];
2226 covariance.GetCovarianceBlock(x, x, covariance_xx)
2227 covariance.GetCovarianceBlock(y, y, covariance_yy)
2228 covariance.GetCovarianceBlock(x, y, covariance_xy)
2229