Blame - docs/source/solving.rst - ceres-solver

blob: d8b9f4ab12860a734f1ca7e2a9fa99fd0b476ec5 [file] [log] [blame]

Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1
				2	.. default-domain:: cpp
				3
				4	.. cpp:namespace:: ceres
				5
				6	.. _chapter-solving:
				7
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	8	=======
				9	Solving
				10	=======
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	11
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	12
				13	Introduction
				14	============
				15
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	16	Effective use of Ceres requires some familiarity with the basic
				17	components of a nonlinear least squares solver, so before we describe
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	18	how to configure and use the solver, we will take a brief look at how
				19	some of the core optimization algorithms in Ceres work.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	20
				21	Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
				22	variables, and
				23	:math:`F(x) = \left[f_1(x), ... , f_{m}(x) \right]^{\top}` be a
				24	:math:`m`-dimensional function of :math:`x`. We are interested in
				25	solving the following optimization problem [#f1]_ .
				26
				27	.. math:: \arg \min_x \frac{1}{2}\\|F(x)\\|^2\ .
				28	:label: nonlinsq
				29
				30	Here, the Jacobian :math:`J(x)` of :math:`F(x)` is an :math:`m\times
				31	n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)` and the
				32	gradient vector :math:`g(x) = \nabla \frac{1}{2}\\|F(x)\\|^2 = J(x)^\top
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	33	F(x)`. Since the efficient global minimization of :eq:`nonlinsq` for
				34	general :math:`F(x)` is an intractable problem, we will have to settle
				35	for finding a local minimum.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	36
				37	The general strategy when solving non-linear optimization problems is
				38	to solve a sequence of approximations to the original problem
				39	[NocedalWright]_. At each iteration, the approximation is solved to
				40	determine a correction :math:`\Delta x` to the vector :math:`x`. For
				41	non-linear least squares, an approximation can be constructed by using
				42	the linearization :math:`F(x+\Delta x) \approx F(x) + J(x)\Delta x`,
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	43	which leads to the following linear least squares problem:
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	44
				45	.. math:: \min_{\Delta x} \frac{1}{2}\\|J(x)\Delta x + F(x)\\|^2
				46	:label: linearapprox
				47
				48	Unfortunately, naively solving a sequence of these problems and
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	49	updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that
				50	may not converge. To get a convergent algorithm, we need to control
				51	the size of the step :math:`\Delta x`. Depending on how the size of
				52	the step :math:`\Delta x` is controlled, non-linear optimization
				53	algorithms can be divided into two major categories [NocedalWright]_.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	54
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	55	1. Trust Region The trust region approach approximates the
				56	objective function using using a model function (often a quadratic)
				57	over a subset of the search space known as the trust region. If the
				58	model function succeeds in minimizing the true objective function
				59	the trust region is expanded; conversely, otherwise it is
				60	contracted and the model optimization problem is solved again.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	61
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	62	2. Line Search The line search approach first finds a descent
				63	direction along which the objective function will be reduced and
				64	then computes a step size that decides how far should move along
				65	that direction. The descent direction can be computed by various
				66	methods, such as gradient descent, Newton's method and Quasi-Newton
				67	method. The step size can be determined either exactly or
				68	inexactly.
				69
				70	Trust region methods are in some sense dual to line search methods:
				71	trust region methods first choose a step size (the size of the trust
				72	region) and then a step direction while line search methods first
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	73	choose a step direction and then a step size. Ceres implements
				74	multiple algorithms in both categories.
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	75
				76	.. _section-trust-region-methods:
				77
				78	Trust Region Methods
				79	====================
				80
				81	The basic trust region algorithm looks something like this.
				82
				83	1. Given an initial point :math:`x` and a trust region radius :math:`\mu`.
				84	2. :math:`\arg \min_{\Delta x} \frac{1}{2}\\|J(x)\Delta
				85	x + F(x)\\|^2` s.t. :math:`\\|D(x)\Delta x\\|^2 \le \mu`
				86	3. :math:`\rho = \frac{\displaystyle \\|F(x + \Delta x)\\|^2 -
				87	\\|F(x)\\|^2}{\displaystyle \\|J(x)\Delta x + F(x)\\|^2 -
				88	\\|F(x)\\|^2}`
				89	4. if :math:`\rho > \epsilon` then :math:`x = x + \Delta x`.
				90	5. if :math:`\rho > \eta_1` then :math:`\rho = 2 \rho`
				91	6. else if :math:`\rho < \eta_2` then :math:`\rho = 0.5 * \rho`
				92	7. Goto 2.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	93
				94	Here, :math:`\mu` is the trust region radius, :math:`D(x)` is some
				95	matrix used to define a metric on the domain of :math:`F(x)` and
				96	:math:`\rho` measures the quality of the step :math:`\Delta x`, i.e.,
				97	how well did the linear model predict the decrease in the value of the
				98	non-linear objective. The idea is to increase or decrease the radius
				99	of the trust region depending on how well the linearization predicts
				100	the behavior of the non-linear objective, which in turn is reflected
				101	in the value of :math:`\rho`.
				102
				103	The key computational step in a trust-region algorithm is the solution
				104	of the constrained optimization problem
				105
				106	.. math:: \arg\min_{\Delta x} \frac{1}{2}\\|J(x)\Delta x + F(x)\\|^2\quad \text{such that}\quad \\|D(x)\Delta x\\|^2 \le \mu
				107	:label: trp
				108
				109	There are a number of different ways of solving this problem, each
				110	giving rise to a different concrete trust-region algorithm. Currently
				111	Ceres, implements two trust-region algorithms - Levenberg-Marquardt
				112	and Dogleg. The user can choose between them by setting
				113	:member:`Solver::Options::trust_region_strategy_type`.
				114
				115	.. rubric:: Footnotes
				116
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	117	.. [#f1] At the level of the non-linear solver, the block
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	118	structure is not relevant, therefore our discussion here is
				119	in terms of an optimization problem defined over a state
				120	vector of size :math:`n`.
				121
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	122
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	123	.. _section-levenberg-marquardt:
				124
				125	Levenberg-Marquardt
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	126	-------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	127
				128	The Levenberg-Marquardt algorithm [Levenberg]_ [Marquardt]_ is the
				129	most popular algorithm for solving non-linear least squares problems.
				130	It was also the first trust region algorithm to be developed
				131	[Levenberg]_ [Marquardt]_. Ceres implements an exact step [Madsen]_
				132	and an inexact step variant of the Levenberg-Marquardt algorithm
				133	[WrightHolt]_ [NashSofer]_.
				134
				135	It can be shown, that the solution to :eq:`trp` can be obtained by
				136	solving an unconstrained optimization of the form
				137
				138	.. math:: \arg\min_{\Delta x}& \frac{1}{2}\\|J(x)\Delta x + F(x)\\|^2 +\lambda \\|D(x)\Delta x\\|^2
				139
				140	Where, :math:`\lambda` is a Lagrange multiplier that is inverse
				141	related to :math:`\mu`. In Ceres, we solve for
				142
				143	.. math:: \arg\min_{\Delta x}& \frac{1}{2}\\|J(x)\Delta x + F(x)\\|^2 + \frac{1}{\mu} \\|D(x)\Delta x\\|^2
				144	:label: lsqr
				145
				146	The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
				147	the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
				148
				149	Before going further, let us make some notational simplifications. We
				150	will assume that the matrix :math:`\sqrt{\mu} D` has been concatenated
				151	at the bottom of the matrix :math:`J` and similarly a vector of zeros
				152	has been added to the bottom of the vector :math:`f` and the rest of
				153	our discussion will be in terms of :math:`J` and :math:`f`, i.e, the
				154	linear least squares problem.
				155
				156	.. math:: \min_{\Delta x} \frac{1}{2} \\|J(x)\Delta x + f(x)\\|^2 .
				157	:label: simple
				158
				159	For all but the smallest problems the solution of :eq:`simple` in
				160	each iteration of the Levenberg-Marquardt algorithm is the dominant
				161	computational cost in Ceres. Ceres provides a number of different
				162	options for solving :eq:`simple`. There are two major classes of
				163	methods - factorization and iterative.
				164
				165	The factorization methods are based on computing an exact solution of
				166	:eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact
				167	step Levenberg-Marquardt algorithm. But it is not clear if an exact
				168	solution of :eq:`lsqr` is necessary at each step of the LM algorithm
				169	to solve :eq:`nonlinsq`. In fact, we have already seen evidence
				170	that this may not be the case, as :eq:`lsqr` is itself a regularized
				171	version of :eq:`linearapprox`. Indeed, it is possible to
				172	construct non-linear optimization algorithms in which the linearized
				173	problem is solved approximately. These algorithms are known as inexact
				174	Newton or truncated Newton methods [NocedalWright]_.
				175
				176	An inexact Newton method requires two ingredients. First, a cheap
				177	method for approximately solving systems of linear
				178	equations. Typically an iterative linear solver like the Conjugate
				179	Gradients method is used for this
				180	purpose [NocedalWright]_. Second, a termination rule for
				181	the iterative solver. A typical termination rule is of the form
				182
				183	.. math:: \\|H(x) \Delta x + g(x)\\| \leq \eta_k \\|g(x)\\|.
				184	:label: inexact
				185
				186	Here, :math:`k` indicates the Levenberg-Marquardt iteration number and
				187	:math:`0 < \eta_k <1` is known as the forcing sequence. [WrightHolt]_
				188	prove that a truncated Levenberg-Marquardt algorithm that uses an
				189	inexact Newton step based on :eq:`inexact` converges for any
				190	sequence :math:`\eta_k \leq \eta_0 < 1` and the rate of convergence
				191	depends on the choice of the forcing sequence :math:`\eta_k`.
				192
				193	Ceres supports both exact and inexact step solution strategies. When
				194	the user chooses a factorization based linear solver, the exact step
				195	Levenberg-Marquardt algorithm is used. When the user chooses an
				196	iterative linear solver, the inexact step Levenberg-Marquardt
				197	algorithm is used.
				198
				199	.. _section-dogleg:
				200
				201	Dogleg
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	202	------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	203
				204	Another strategy for solving the trust region problem :eq:`trp` was
				205	introduced by M. J. D. Powell. The key idea there is to compute two
				206	vectors
				207
				208	.. math::
				209
				210	\Delta x^{\text{Gauss-Newton}} &= \arg \min_{\Delta x}\frac{1}{2} \\|J(x)\Delta x + f(x)\\|^2.\\
				211	\Delta x^{\text{Cauchy}} &= -\frac{\\|g(x)\\|^2}{\\|J(x)g(x)\\|^2}g(x).
				212
				213	Note that the vector :math:`\Delta x^{\text{Gauss-Newton}}` is the
				214	solution to :eq:`linearapprox` and :math:`\Delta
				215	x^{\text{Cauchy}}` is the vector that minimizes the linear
				216	approximation if we restrict ourselves to moving along the direction
				217	of the gradient. Dogleg methods finds a vector :math:`\Delta x`
				218	defined by :math:`\Delta x^{\text{Gauss-Newton}}` and :math:`\Delta
				219	x^{\text{Cauchy}}` that solves the trust region problem. Ceres
				220	supports two variants that can be chose by setting
				221	:member:`Solver::Options::dogleg_type`.
				222
				223	``TRADITIONAL_DOGLEG`` as described by Powell, constructs two line
				224	segments using the Gauss-Newton and Cauchy vectors and finds the point
				225	farthest along this line shaped like a dogleg (hence the name) that is
				226	contained in the trust-region. For more details on the exact reasoning
				227	and computations, please see Madsen et al [Madsen]_.
				228
				229	``SUBSPACE_DOGLEG`` is a more sophisticated method that considers the
				230	entire two dimensional subspace spanned by these two vectors and finds
Sameer Agarwal	fa21df8	2013-02-18 08:48:52 -0800	[diff] [blame]	231	the point that minimizes the trust region problem in this subspace
				232	[ByrdSchnabel]_.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	233
				234	The key advantage of the Dogleg over Levenberg Marquardt is that if
				235	the step computation for a particular choice of :math:`\mu` does not
				236	result in sufficient decrease in the value of the objective function,
				237	Levenberg-Marquardt solves the linear approximation from scratch with
				238	a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
				239	to compute the interpolation between the Gauss-Newton and the Cauchy
				240	vectors, as neither of them depend on the value of :math:`\mu`.
				241
				242	The Dogleg method can only be used with the exact factorization based
				243	linear solvers.
				244
				245	.. _section-inner-iterations:
				246
				247	Inner Iterations
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	248	----------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	249
				250	Some non-linear least squares problems have additional structure in
				251	the way the parameter blocks interact that it is beneficial to modify
				252	the way the trust region step is computed. e.g., consider the
				253	following regression problem
				254
				255	.. math:: y = a_1 e^{b_1 x} + a_2 e^{b_3 x^2 + c_1}
				256
				257
				258	Given a set of pairs :math:`\{(x_i, y_i)\}`, the user wishes to estimate
				259	:math:`a_1, a_2, b_1, b_2`, and :math:`c_1`.
				260
				261	Notice that the expression on the left is linear in :math:`a_1` and
				262	:math:`a_2`, and given any value for :math:`b_1, b_2` and :math:`c_1`,
				263	it is possible to use linear regression to estimate the optimal values
				264	of :math:`a_1` and :math:`a_2`. It's possible to analytically
				265	eliminate the variables :math:`a_1` and :math:`a_2` from the problem
				266	entirely. Problems like these are known as separable least squares
				267	problem and the most famous algorithm for solving them is the Variable
				268	Projection algorithm invented by Golub & Pereyra [GolubPereyra]_.
				269
				270	Similar structure can be found in the matrix factorization with
				271	missing data problem. There the corresponding algorithm is known as
				272	Wiberg's algorithm [Wiberg]_.
				273
				274	Ruhe & Wedin present an analysis of various algorithms for solving
				275	separable non-linear least squares problems and refer to *Variable
				276	Projection* as Algorithm I in their paper [RuheWedin]_.
				277
				278	Implementing Variable Projection is tedious and expensive. Ruhe &
				279	Wedin present a simpler algorithm with comparable convergence
				280	properties, which they call Algorithm II. Algorithm II performs an
				281	additional optimization step to estimate :math:`a_1` and :math:`a_2`
				282	exactly after computing a successful Newton step.
				283
				284
				285	This idea can be generalized to cases where the residual is not
				286	linear in :math:`a_1` and :math:`a_2`, i.e.,
				287
				288	.. math:: y = f_1(a_1, e^{b_1 x}) + f_2(a_2, e^{b_3 x^2 + c_1})
				289
				290	In this case, we solve for the trust region step for the full problem,
				291	and then use it as the starting point to further optimize just `a_1`
				292	and `a_2`. For the linear case, this amounts to doing a single linear
				293	least squares solve. For non-linear problems, any method for solving
				294	the `a_1` and `a_2` optimization problems will do. The only constraint
				295	on `a_1` and `a_2` (if they are two different parameter block) is that
				296	they do not co-occur in a residual block.
				297
				298	This idea can be further generalized, by not just optimizing
				299	:math:`(a_1, a_2)`, but decomposing the graph corresponding to the
				300	Hessian matrix's sparsity structure into a collection of
				301	non-overlapping independent sets and optimizing each of them.
				302
				303	Setting :member:`Solver::Options::use_inner_iterations` to ``true``
				304	enables the use of this non-linear generalization of Ruhe & Wedin's
				305	Algorithm II. This version of Ceres has a higher iteration
				306	complexity, but also displays better convergence behavior per
				307	iteration.
				308
				309	Setting :member:`Solver::Options::num_threads` to the maximum number
				310	possible is highly recommended.
				311
				312	.. _section-non-monotonic-steps:
				313
				314	Non-monotonic Steps
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	315	-------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	316
				317	Note that the basic trust-region algorithm described in
				318	Algorithm~\ref{alg:trust-region} is a descent algorithm in that they
				319	only accepts a point if it strictly reduces the value of the objective
				320	function.
				321
				322	Relaxing this requirement allows the algorithm to be more efficient in
				323	the long term at the cost of some local increase in the value of the
				324	objective function.
				325
				326	This is because allowing for non-decreasing objective function values
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	327	in a principled manner allows the algorithm to jump over boulders as
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	328	the method is not restricted to move into narrow valleys while
				329	preserving its convergence properties.
				330
				331	Setting :member:`Solver::Options::use_nonmonotonic_steps` to ``true``
				332	enables the non-monotonic trust region algorithm as described by Conn,
				333	Gould & Toint in [Conn]_.
				334
				335	Even though the value of the objective function may be larger
				336	than the minimum value encountered over the course of the
				337	optimization, the final parameters returned to the user are the
				338	ones corresponding to the minimum cost over all iterations.
				339
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	340	The option to take non-monotonic steps is available for all trust
				341	region strategies.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	342
				343
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	344	.. _section-line-search-methods:
				345
				346	Line Search Methods
				347	===================
				348
				349	**The implementation of line search algorithms in Ceres Solver is
				350	fairly new and not very well tested, so for now this part of the
				351	solver should be considered beta quality. We welcome reports of your
				352	experiences both good and bad on the mailinglist.**
				353
				354	Line search algorithms
				355
				356	1. Given an initial point :math:`x`
				357	2. :math:`\Delta x = -H^{-1}(x) g(x)`
				358	3. :math:`\arg \min_\mu \frac{1}{2} \\| F(x + \mu \Delta x) \\|^2`
				359	4. :math:`x = x + \mu \Delta x`
				360	5. Goto 2.
				361
				362	Here :math:`H(x)` is some approximation to the Hessian of the
				363	objective function, and :math:`g(x)` is the gradient at
				364	:math:`x`. Depending on the choice of :math:`H(x)` we get a variety of
				365	different search directions -`\Delta x`.
				366
				367	Step 4, which is a one dimensional optimization or `Line Search` along
				368	:math:`\Delta x` is what gives this class of methods its name.
				369
				370	Different line search algorithms differ in their choice of the search
				371	direction :math:`\Delta x` and the method used for one dimensional
				372	optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
				373	primary source of computational complexity in these
				374	methods. Currently, Ceres Solver supports three choices of search
				375	directions, all aimed at large scale problems.
				376
				377	1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
				378	be the identity matrix. This is not a good search direction for
				379	anything but the simplest of the problems. It is only included here
				380	for completeness.
				381
				382	2. ``NONLINEAR_CONJUGATE_GRADIENT`` A generalization of the Conjugate
				383	Gradient method to non-linear functions. The generalization can be
				384	performed in a number of different ways, resulting in a variety of
				385	search directions. Ceres Solver currently supports
				386	``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and ``HESTENES_STIEFEL``
				387	directions.
				388
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	389	3. ``BFGS`` A generalization of the Secant method to multiple
				390	dimensions in which a full, dense approximation to the inverse
				391	Hessian is maintained and used to compute a quasi-Newton step
				392	[NocedalWright]_. BFGS is currently the best known general
				393	quasi-Newton algorithm.
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	394
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	395	4. ``LBFGS`` A limited memory approximation to the full ``BFGS``
				396	method in which the last `M` iterations are used to approximate the
				397	inverse Hessian used to compute a quasi-Newton step [Nocedal]_,
				398	[ByrdNocedal]_.
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	399
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	400	Currently Ceres Solver supports both a backtracking and interpolation
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	401	based Armijo line search algorithm, and a sectioning / zoom
				402	interpolation (strong) Wolfe condition line search algorithm.
				403	However, note that in order for the assumptions underlying the
				404	``BFGS`` and ``LBFGS`` methods to be guaranteed to be satisfied the
				405	Wolfe line search algorithm should be used.
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	406
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	407	.. _section-linear-solver:
				408
				409	LinearSolver
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	410	============
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	411
				412	Recall that in both of the trust-region methods described above, the
				413	key computational cost is the solution of a linear least squares
				414	problem of the form
				415
				416	.. math:: \min_{\Delta x} \frac{1}{2} \\|J(x)\Delta x + f(x)\\|^2 .
				417	:label: simple2
				418
				419	Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
				420	f(x)`. For notational convenience let us also drop the dependence on
				421	:math:`x`. Then it is easy to see that solving :eq:`simple2` is
				422	equivalent to solving the normal equations.
				423
				424	.. math:: H \Delta x = g
				425	:label: normal
				426
				427	Ceres provides a number of different options for solving :eq:`normal`.
				428
				429	.. _section-qr:
				430
				431	``DENSE_QR``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	432	------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	433
				434	For small problems (a couple of hundred parameters and a few thousand
				435	residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
				436	of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
				437	:math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
				438	an upper triangular matrix [TrefethenBau]_. Then it can be shown that
				439	the solution to :eq:`normal` is given by
				440
				441	.. math:: \Delta x^* = -R^{-1}Q^\top f
				442
				443
				444	Ceres uses ``Eigen`` 's dense QR factorization routines.
				445
				446	.. _section-cholesky:
				447
				448	``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	449	------------------------------------------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	450
				451	Large non-linear least square problems are usually sparse. In such
				452	cases, using a dense QR factorization is inefficient. Let :math:`H =
				453	R^\top R` be the Cholesky factorization of the normal equations, where
				454	:math:`R` is an upper triangular matrix, then the solution to
				455	:eq:`normal` is given by
				456
				457	.. math::
				458
				459	\Delta x^* = R^{-1} R^{-\top} g.
				460
				461
				462	The observant reader will note that the :math:`R` in the Cholesky
				463	factorization of :math:`H` is the same upper triangular matrix
				464	:math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
				465	orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
				466	Q^\top Q R = R^\top R`. There are two variants of Cholesky
				467	factorization -- sparse and dense.
				468
				469	``DENSE_NORMAL_CHOLESKY`` as the name implies performs a dense
				470	Cholesky factorization of the normal equations. Ceres uses
				471	``Eigen`` 's dense LDLT factorization routines.
				472
				473	``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse
				474	Cholesky factorization of the normal equations. This leads to
				475	substantial savings in time and memory for large sparse
				476	problems. Ceres uses the sparse Cholesky factorization routines in
				477	Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_.
				478
				479	.. _section-schur:
				480
				481	``DENSE_SCHUR`` & ``SPARSE_SCHUR``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	482	----------------------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	483
				484	While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
				485	adjustment problems, bundle adjustment problem have a special
				486	structure, and a more efficient scheme for solving :eq:`normal`
				487	can be constructed.
				488
				489	Suppose that the SfM problem consists of :math:`p` cameras and
				490	:math:`q` points and the variable vector :math:`x` has the block
				491	structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
				492	:math:`y` and :math:`z` correspond to camera and point parameters,
				493	respectively. Further, let the camera blocks be of size :math:`c` and
				494	the point blocks be of size :math:`s` (for most problems :math:`c` =
				495	:math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
				496	requirement on these block sizes, but choosing them to be constant
				497	simplifies the exposition.
				498
				499	A key characteristic of the bundle adjustment problem is that there is
				500	no term :math:`f_{i}` that includes two or more point blocks. This in
				501	turn implies that the matrix :math:`H` is of the form
				502
				503	.. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
				504	:label: hblock
				505
				506	where, :math:`B \in \mathbb{R}^{pc\times pc}` is a block sparse matrix
				507	with :math:`p` blocks of size :math:`c\times c` and :math:`C \in
				508	\mathbb{R}^{qs\times qs}` is a block diagonal matrix with :math:`q` blocks
				509	of size :math:`s\times s`. :math:`E \in \mathbb{R}^{pc\times qs}` is a
				510	general block sparse matrix, with a block of size :math:`c\times s`
				511	for each observation. Let us now block partition :math:`\Delta x =
				512	[\Delta y,\Delta z]` and :math:`g=[v,w]` to restate :eq:`normal`
				513	as the block structured linear system
				514
				515	.. math:: \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix}
				516	\right]\left[ \begin{matrix} \Delta y \\ \Delta z
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	517	\end{matrix} \right] = \left[ \begin{matrix} v\\ w
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	518	\end{matrix} \right]\ ,
				519	:label: linear2
				520
				521	and apply Gaussian elimination to it. As we noted above, :math:`C` is
				522	a block diagonal matrix, with small diagonal blocks of size
				523	:math:`s\times s`. Thus, calculating the inverse of :math:`C` by
				524	inverting each of these blocks is cheap. This allows us to eliminate
				525	:math:`\Delta z` by observing that :math:`\Delta z = C^{-1}(w - E^\top
				526	\Delta y)`, giving us
				527
				528	.. math:: \left[B - EC^{-1}E^\top\right] \Delta y = v - EC^{-1}w\ .
				529	:label: schur
				530
				531	The matrix
				532
				533	.. math:: S = B - EC^{-1}E^\top
				534
				535	is the Schur complement of :math:`C` in :math:`H`. It is also known as
				536	the reduced camera matrix, because the only variables
				537	participating in :eq:`schur` are the ones corresponding to the
				538	cameras. :math:`S \in \mathbb{R}^{pc\times pc}` is a block structured
				539	symmetric positive definite matrix, with blocks of size :math:`c\times
				540	c`. The block :math:`S_{ij}` corresponding to the pair of images
				541	:math:`i` and :math:`j` is non-zero if and only if the two images
				542	observe at least one common point.
				543
				544
				545	Now, eq-linear2 can be solved by first forming :math:`S`, solving for
				546	:math:`\Delta y`, and then back-substituting :math:`\Delta y` to
				547	obtain the value of :math:`\Delta z`. Thus, the solution of what was
				548	an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
				549	inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
				550	and matrix-vector multiplies, and the solution of block sparse
				551	:math:`pc\times pc` linear system :eq:`schur`. For almost all
				552	problems, the number of cameras is much smaller than the number of
				553	points, :math:`p \ll q`, thus solving :eq:`schur` is
				554	significantly cheaper than solving :eq:`linear2`. This is the
				555	Schur complement trick [Brown]_.
				556
				557	This still leaves open the question of solving :eq:`schur`. The
				558	method of choice for solving symmetric positive definite systems
				559	exactly is via the Cholesky factorization [TrefethenBau]_ and
				560	depending upon the structure of the matrix, there are, in general, two
				561	options. The first is direct factorization, where we store and factor
				562	:math:`S` as a dense matrix [TrefethenBau]_. This method has
				563	:math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
				564	is only practical for problems with up to a few hundred cameras. Ceres
				565	implements this strategy as the ``DENSE_SCHUR`` solver.
				566
				567
				568	But, :math:`S` is typically a fairly sparse matrix, as most images
				569	only see a small fraction of the scene. This leads us to the second
				570	option: Sparse Direct Methods. These methods store :math:`S` as a
				571	sparse matrix, use row and column re-ordering algorithms to maximize
				572	the sparsity of the Cholesky decomposition, and focus their compute
				573	effort on the non-zero part of the factorization [Chen]_. Sparse
				574	direct methods, depending on the exact sparsity structure of the Schur
				575	complement, allow bundle adjustment algorithms to significantly scale
				576	up over those based on dense factorization. Ceres implements this
				577	strategy as the ``SPARSE_SCHUR`` solver.
				578
				579	.. _section-cgnr:
				580
				581	``CGNR``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	582	--------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	583
				584	For general sparse problems, if the problem is too large for
				585	``CHOLMOD`` or a sparse linear algebra library is not linked into
				586	Ceres, another option is the ``CGNR`` solver. This solver uses the
				587	Conjugate Gradients solver on the normal equations, but without
				588	forming the normal equations explicitly. It exploits the relation
				589
				590	.. math::
				591	H x = J^\top J x = J^\top(J x)
				592
				593
				594	When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres
				595	automatically switches from the exact step algorithm to an inexact
				596	step algorithm.
				597
				598	.. _section-iterative_schur:
				599
				600	``ITERATIVE_SCHUR``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	601	-------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	602
				603	Another option for bundle adjustment problems is to apply PCG to the
				604	reduced camera matrix :math:`S` instead of :math:`H`. One reason to do
				605	this is that :math:`S` is a much smaller matrix than :math:`H`, but
				606	more importantly, it can be shown that :math:`\kappa(S)\leq
				607	\kappa(H)`. Cseres implements PCG on :math:`S` as the
				608	``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR``
				609	as the linear solver, Ceres automatically switches from the exact step
				610	algorithm to an inexact step algorithm.
				611
				612	The cost of forming and storing the Schur complement :math:`S` can be
				613	prohibitive for large problems. Indeed, for an inexact Newton solver
				614	that computes :math:`S` and runs PCG on it, almost all of its time is
				615	spent in constructing :math:`S`; the time spent inside the PCG
				616	algorithm is negligible in comparison. Because PCG only needs access
				617	to :math:`S` via its product with a vector, one way to evaluate
				618	:math:`Sx` is to observe that
				619
				620	.. math:: x_1 &= E^\top x
				621	.. math:: x_2 &= C^{-1} x_1
				622	.. math:: x_3 &= Ex_2\\
				623	.. math:: x_4 &= Bx\\
				624	.. math:: Sx &= x_4 - x_3
				625	:label: schurtrick1
				626
				627	Thus, we can run PCG on :math:`S` with the same computational effort
				628	per iteration as PCG on :math:`H`, while reaping the benefits of a
				629	more powerful preconditioner. In fact, we do not even need to compute
				630	:math:`H`, :eq:`schurtrick1` can be implemented using just the columns
				631	of :math:`J`.
				632
				633	Equation :eq:`schurtrick1` is closely related to *Domain
				634	Decomposition methods* for solving large linear systems that arise in
				635	structural engineering and partial differential equations. In the
				636	language of Domain Decomposition, each point in a bundle adjustment
				637	problem is a domain, and the cameras form the interface between these
				638	domains. The iterative solution of the Schur complement then falls
				639	within the sub-category of techniques known as Iterative
				640	Sub-structuring [Saad]_ [Mathew]_.
				641
				642	.. _section-preconditioner:
				643
				644	Preconditioner
				645	--------------
				646
				647	The convergence rate of Conjugate Gradients for
				648	solving :eq:`normal` depends on the distribution of eigenvalues
				649	of :math:`H` [Saad]_. A useful upper bound is
				650	:math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
				651	number of the matrix :math:`H`. For most bundle adjustment problems,
				652	:math:`\kappa(H)` is high and a direct application of Conjugate
				653	Gradients to :eq:`normal` results in extremely poor performance.
				654
				655	The solution to this problem is to replace :eq:`normal` with a
				656	preconditioned system. Given a linear system, :math:`Ax =b` and a
				657	preconditioner :math:`M` the preconditioned system is given by
				658	:math:`M^{-1}Ax = M^{-1}b`. The resulting algorithm is known as
				659	Preconditioned Conjugate Gradients algorithm (PCG) and its worst case
				660	complexity now depends on the condition number of the preconditioned
				661	matrix :math:`\kappa(M^{-1}A)`.
				662
				663	The computational cost of using a preconditioner :math:`M` is the cost
				664	of computing :math:`M` and evaluating the product :math:`M^{-1}y` for
				665	arbitrary vectors :math:`y`. Thus, there are two competing factors to
				666	consider: How much of :math:`H`'s structure is captured by :math:`M`
				667	so that the condition number :math:`\kappa(HM^{-1})` is low, and the
				668	computational cost of constructing and using :math:`M`. The ideal
				669	preconditioner would be one for which :math:`\kappa(M^{-1}A)
				670	=1`. :math:`M=A` achieves this, but it is not a practical choice, as
				671	applying this preconditioner would require solving a linear system
				672	equivalent to the unpreconditioned problem. It is usually the case
				673	that the more information :math:`M` has about :math:`H`, the more
				674	expensive it is use. For example, Incomplete Cholesky factorization
				675	based preconditioners have much better convergence behavior than the
				676	Jacobi preconditioner, but are also much more expensive.
				677
				678
				679	The simplest of all preconditioners is the diagonal or Jacobi
				680	preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
				681	block structured matrices like :math:`H` can be generalized to the
				682	block Jacobi preconditioner.
				683
				684	For ``ITERATIVE_SCHUR`` there are two obvious choices for block
				685	diagonal preconditioners for :math:`S`. The block diagonal of the
				686	matrix :math:`B` [Mandel]_ and the block diagonal :math:`S`, i.e, the
				687	block Jacobi preconditioner for :math:`S`. Ceres's implements both of
				688	these preconditioners and refers to them as ``JACOBI`` and
				689	``SCHUR_JACOBI`` respectively.
				690
				691	For bundle adjustment problems arising in reconstruction from
				692	community photo collections, more effective preconditioners can be
				693	constructed by analyzing and exploiting the camera-point visibility
				694	structure of the scene [KushalAgarwal]. Ceres implements the two
				695	visibility based preconditioners described by Kushal & Agarwal as
				696	``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. These are fairly new
				697	preconditioners and Ceres' implementation of them is in its early
				698	stages and is not as mature as the other preconditioners described
				699	above.
				700
				701	.. _section-ordering:
				702
				703	Ordering
				704	--------
				705
				706	The order in which variables are eliminated in a linear solver can
				707	have a significant of impact on the efficiency and accuracy of the
				708	method. For example when doing sparse Cholesky factorization, there
				709	are matrices for which a good ordering will give a Cholesky factor
				710	with :math:`O(n)` storage, where as a bad ordering will result in an
				711	completely dense factor.
				712
				713	Ceres allows the user to provide varying amounts of hints to the
				714	solver about the variable elimination ordering to use. This can range
				715	from no hints, where the solver is free to decide the best ordering
				716	based on the user's choices like the linear solver being used, to an
				717	exact order in which the variables should be eliminated, and a variety
				718	of possibilities in between.
				719
				720	Instances of the :class:`ParameterBlockOrdering` class are used to
				721	communicate this information to Ceres.
				722
				723	Formally an ordering is an ordered partitioning of the parameter
				724	blocks. Each parameter block belongs to exactly one group, and each
				725	group has a unique integer associated with it, that determines its
				726	order in the set of groups. We call these groups Elimination Groups
				727
				728	Given such an ordering, Ceres ensures that the parameter blocks in the
				729	lowest numbered elimination group are eliminated first, and then the
				730	parameter blocks in the next lowest numbered elimination group and so
				731	on. Within each elimination group, Ceres is free to order the
				732	parameter blocks as it chooses. e.g. Consider the linear system
				733
				734	.. math::
				735	x + y &= 3\\
				736	2x + 3y &= 7
				737
				738	There are two ways in which it can be solved. First eliminating
				739	:math:`x` from the two equations, solving for y and then back
				740	substituting for :math:`x`, or first eliminating :math:`y`, solving
				741	for :math:`x` and back substituting for :math:`y`. The user can
				742	construct three orderings here.
				743
				744	1. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
				745	2. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
				746	3. :math:`\{0: x, y\}` : Solver gets to decide the elimination order.
				747
				748	Thus, to have Ceres determine the ordering automatically using
				749	heuristics, put all the variables in the same elimination group. The
				750	identity of the group does not matter. This is the same as not
				751	specifying an ordering at all. To control the ordering for every
				752	variable, create an elimination group per variable, ordering them in
				753	the desired order.
				754
				755	If the user is using one of the Schur solvers (``DENSE_SCHUR``,
				756	``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an
				757	ordering, it must have one important property. The lowest numbered
				758	elimination group must form an independent set in the graph
				759	corresponding to the Hessian, or in other words, no two parameter
				760	blocks in in the first elimination group should co-occur in the same
				761	residual block. For the best performance, this elimination group
				762	should be as large as possible. For standard bundle adjustment
				763	problems, this corresponds to the first elimination group containing
				764	all the 3d points, and the second containing the all the cameras
				765	parameter blocks.
				766
				767	If the user leaves the choice to Ceres, then the solver uses an
				768	approximate maximum independent set algorithm to identify the first
				769	elimination group [LiSaad]_.
				770
				771	.. _section-solver-options:
				772
				773	:class:`Solver::Options`
				774	------------------------
				775
				776	.. class:: Solver::Options
				777
				778	:class:`Solver::Options` controls the overall behavior of the
				779	solver. We list the various settings and their default values below.
				780
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	781
				782	.. member:: MinimizerType Solver::Options::minimizer_type
				783
				784	Default: ``TRUST_REGION``
				785
				786	Choose between ``LINE_SEARCH`` and ``TRUST_REGION`` algorithms. See
				787	:ref:`section-trust-region-methods` and
				788	:ref:`section-line-search-methods` for more details.
				789
				790	.. member:: LineSearchDirectionType Solver::Options::line_search_direction_type
				791
				792	Default: ``LBFGS``
				793
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	794	Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
				795	``BFGS`` and ``LBFGS``.
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	796
				797	.. member:: LineSearchType Solver::Options::line_search_type
				798
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	799	Default: ``WOLFE``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	800
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	801	Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
				802	Note that in order for the assumptions underlying the ``BFGS`` and
				803	``LBFGS`` line search direction algorithms to be guaranteed to be
				804	satisifed, the ``WOLFE`` line search should be used.
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	805
Sameer Agarwal	fa21df8	2013-02-18 08:48:52 -0800	[diff] [blame]	806	.. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	807
				808	Default: ``FLETCHER_REEVES``
				809
				810	Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and
				811	``HESTENES_STIEFEL``.
				812
				813	.. member:: int Solver::Options::max_lbfs_rank
				814
				815	Default: 20
				816
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	817	The L-BFGS hessian approximation is a low rank approximation to the
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	818	inverse of the Hessian matrix. The rank of the approximation
				819	determines (linearly) the space and time complexity of using the
				820	approximation. Higher the rank, the better is the quality of the
				821	approximation. The increase in quality is however is bounded for a
				822	number of reasons.
				823
				824	1. The method only uses secant information and not actual
				825	derivatives.
				826
				827	2. The Hessian approximation is constrained to be positive
				828	definite.
				829
				830	So increasing this rank to a large number will cost time and space
				831	complexity without the corresponding increase in solution
				832	quality. There are no hard and fast rules for choosing the maximum
				833	rank. The best choice usually requires some problem specific
				834	experimentation.
				835
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	836	.. member:: bool Solver::Options::use_approximate_eigenvalue_bfgs_scaling
				837
				838	Default: ``false``
				839
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	840	As part of the ``BFGS`` update step / ``LBFGS`` right-multiply
				841	step, the initial inverse Hessian approximation is taken to be the
				842	Identity. However, [Oren]_ showed that using instead :math:`I *
				843	\gamma`, where :math:`\gamma` is a scalar chosen to approximate an
				844	eigenvalue of the true inverse Hessian can result in improved
				845	convergence in a wide variety of cases. Setting
				846	``use_approximate_eigenvalue_bfgs_scaling`` to true enables this
				847	scaling in ``BFGS`` (before first iteration) and ``LBFGS`` (at each
				848	iteration).
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	849
				850	Precisely, approximate eigenvalue scaling equates to
				851
				852	.. math:: \gamma = \frac{y_k' s_k}{y_k' y_k}
				853
				854	With:
				855
				856	.. math:: y_k = \nabla f_{k+1} - \nabla f_k
				857	.. math:: s_k = x_{k+1} - x_k
				858
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	859	Where :math:`f()` is the line search objective and :math:`x` the
				860	vector of parameter values [NocedalWright]_.
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	861
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	862	It is important to note that approximate eigenvalue scaling does
				863	not always improve convergence, and that it can in fact
				864	significantly degrade performance for certain classes of problem,
				865	which is why it is disabled by default. In particular it can
				866	degrade performance when the sensitivity of the problem to different
				867	parameters varies significantly, as in this case a single scalar
				868	factor fails to capture this variation and detrimentally downscales
				869	parts of the Jacobian approximation which correspond to
				870	low-sensitivity parameters. It can also reduce the robustness of the
				871	solution to errors in the Jacobians.
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	872
Sameer Agarwal	0924401	2013-06-30 14:33:23 -0700	[diff] [blame]	873	.. member:: LineSearchIterpolationType Solver::Options::line_search_interpolation_type
				874
				875	Default: ``CUBIC``
				876
				877	Degree of the polynomial used to approximate the objective
				878	function. Valid values are ``BISECTION``, ``QUADRATIC`` and
				879	``CUBIC``.
				880
				881	.. member:: double Solver::Options::min_line_search_step_size
				882
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	883	The line search terminates if:
Sameer Agarwal	0924401	2013-06-30 14:33:23 -0700	[diff] [blame]	884
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	885	.. math:: \\|\Delta x_k\\|_\infty < \text{min_line_search_step_size}
				886
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	887	where :math:`\\|\cdot\\|_\infty` refers to the max norm, and
				888	:math:`\Delta x_k` is the step change in the parameter values at
				889	the :math:`k`-th iteration.
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	890
				891	.. member:: double Solver::Options::line_search_sufficient_function_decrease
				892
				893	Default: ``1e-4``
Sameer Agarwal	0924401	2013-06-30 14:33:23 -0700	[diff] [blame]	894
				895	Solving the line search problem exactly is computationally
				896	prohibitive. Fortunately, line search based optimization algorithms
				897	can still guarantee convergence if instead of an exact solution,
				898	the line search algorithm returns a solution which decreases the
				899	value of the objective function sufficiently. More precisely, we
				900	are looking for a step size s.t.
				901
				902	.. math:: f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]
				903
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	904	This condition is known as the Armijo condition.
Sameer Agarwal	0924401	2013-06-30 14:33:23 -0700	[diff] [blame]	905
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	906	.. member:: double Solver::Options::max_line_search_step_contraction
Sameer Agarwal	0924401	2013-06-30 14:33:23 -0700	[diff] [blame]	907
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	908	Default: ``1e-3``
Sameer Agarwal	0924401	2013-06-30 14:33:23 -0700	[diff] [blame]	909
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	910	In each iteration of the line search,
Sameer Agarwal	0924401	2013-06-30 14:33:23 -0700	[diff] [blame]	911
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	912	.. math:: \text{new_step_size} >= \text{max_line_search_step_contraction} * \text{step_size}
Sameer Agarwal	0924401	2013-06-30 14:33:23 -0700	[diff] [blame]	913
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	914	Note that by definition, for contraction:
				915
				916	.. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
				917
				918	.. member:: double Solver::Options::min_line_search_step_contraction
				919
				920	Default: ``0.6``
				921
				922	In each iteration of the line search,
				923
				924	.. math:: \text{new_step_size} <= \text{min_line_search_step_contraction} * \text{step_size}
				925
				926	Note that by definition, for contraction:
				927
				928	.. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
				929
				930	.. member:: int Solver::Options::max_num_line_search_step_size_iterations
				931
				932	Default: ``20``
				933
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	934	Maximum number of trial step size iterations during each line
				935	search, if a step size satisfying the search conditions cannot be
				936	found within this number of trials, the line search will stop.
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	937
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	938	As this is an 'artificial' constraint (one imposed by the user, not
				939	the underlying math), if ``WOLFE`` line search is being used, and
				940	points satisfying the Armijo sufficient (function) decrease
				941	condition have been found during the current search (in :math:`<=`
				942	``max_num_line_search_step_size_iterations``). Then, the step size
				943	with the lowest function value which satisfies the Armijo condition
				944	will be returned as the new valid step, even though it does not
				945	satisfy the strong Wolfe conditions. This behaviour protects
				946	against early termination of the optimizer at a sub-optimal point.
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	947
				948	.. member:: int Solver::Options::max_num_line_search_direction_restarts
				949
				950	Default: ``5``
				951
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	952	Maximum number of restarts of the line search direction algorithm
				953	before terminating the optimization. Restarts of the line search
				954	direction algorithm occur when the current algorithm fails to
				955	produce a new descent direction. This typically indicates a
				956	numerical failure, or a breakdown in the validity of the
				957	approximations used.
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	958
				959	.. member:: double Solver::Options::line_search_sufficient_curvature_decrease
				960
				961	Default: ``0.9``
				962
				963	The strong Wolfe conditions consist of the Armijo sufficient
				964	decrease condition, and an additional requirement that the
				965	step size be chosen s.t. the magnitude ('strong' Wolfe
				966	conditions) of the gradient along the search direction
				967	decreases sufficiently. Precisely, this second condition
				968	is that we seek a step size s.t.
				969
				970	.. math:: \\|f'(\text{step_size})\\| <= \text{sufficient_curvature_decrease} * \\|f'(0)\\|
				971
				972	Where :math:`f()` is the line search objective and :math:`f'()` is the derivative
				973	of :math:`f` with respect to the step size: :math:`\frac{d f}{d~\text{step size}}`.
				974
				975	.. member:: double Solver::Options::max_line_search_step_expansion
				976
				977	Default: ``10.0``
				978
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	979	During the bracketing phase of a Wolfe line search, the step size
				980	is increased until either a point satisfying the Wolfe conditions
				981	is found, or an upper bound for a bracket containinqg a point
				982	satisfying the conditions is found. Precisely, at each iteration
				983	of the expansion:
Alex Stewart	9aa0e3c	2013-07-05 20:22:37 +0100	[diff] [blame]	984
				985	.. math:: \text{new_step_size} <= \text{max_step_expansion} * \text{step_size}
				986
				987	By definition for expansion
				988
				989	.. math:: \text{max_step_expansion} > 1.0
Sameer Agarwal	0924401	2013-06-30 14:33:23 -0700	[diff] [blame]	990
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	991	.. member:: TrustRegionStrategyType Solver::Options::trust_region_strategy_type
				992
				993	Default: ``LEVENBERG_MARQUARDT``
				994
				995	The trust region step computation algorithm used by
				996	Ceres. Currently ``LEVENBERG_MARQUARDT`` and ``DOGLEG`` are the two
				997	valid choices. See :ref:`section-levenberg-marquardt` and
				998	:ref:`section-dogleg` for more details.
				999
				1000	.. member:: DoglegType Solver::Options::dogleg_type
				1001
				1002	Default: ``TRADITIONAL_DOGLEG``
				1003
				1004	Ceres supports two different dogleg strategies.
				1005	``TRADITIONAL_DOGLEG`` method by Powell and the ``SUBSPACE_DOGLEG``
Sameer Agarwal	fa21df8	2013-02-18 08:48:52 -0800	[diff] [blame]	1006	method described by [ByrdSchnabel]_ . See :ref:`section-dogleg`
				1007	for more details.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1008
				1009	.. member:: bool Solver::Options::use_nonmonotonic_steps
				1010
				1011	Default: ``false``
				1012
				1013	Relax the requirement that the trust-region algorithm take strictly
				1014	decreasing steps. See :ref:`section-non-monotonic-steps` for more
				1015	details.
				1016
				1017	.. member:: int Solver::Options::max_consecutive_nonmonotonic_steps
				1018
				1019	Default: ``5``
				1020
				1021	The window size used by the step selection algorithm to accept
				1022	non-monotonic steps.
				1023
				1024	.. member:: int Solver::Options::max_num_iterations
				1025
				1026	Default: ``50``
				1027
				1028	Maximum number of iterations for which the solver should run.
				1029
				1030	.. member:: double Solver::Options::max_solver_time_in_seconds
				1031
				1032	Default: ``1e6``
				1033	Maximum amount of time for which the solver should run.
				1034
				1035	.. member:: int Solver::Options::num_threads
				1036
				1037	Default: ``1``
				1038
				1039	Number of threads used by Ceres to evaluate the Jacobian.
				1040
				1041	.. member:: double Solver::Options::initial_trust_region_radius
				1042
				1043	Default: ``1e4``
				1044
				1045	The size of the initial trust region. When the
				1046	``LEVENBERG_MARQUARDT`` strategy is used, the reciprocal of this
				1047	number is the initial regularization parameter.
				1048
				1049	.. member:: double Solver::Options::max_trust_region_radius
				1050
				1051	Default: ``1e16``
				1052
				1053	The trust region radius is not allowed to grow beyond this value.
				1054
				1055	.. member:: double Solver::Options::min_trust_region_radius
				1056
				1057	Default: ``1e-32``
				1058
				1059	The solver terminates, when the trust region becomes smaller than
				1060	this value.
				1061
				1062	.. member:: double Solver::Options::min_relative_decrease
				1063
				1064	Default: ``1e-3``
				1065
				1066	Lower threshold for relative decrease before a trust-region step is
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	1067	accepted.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1068
Sameer Agarwal	eeedd2e	2013-07-07 23:04:31 -0700	[diff] [blame]	1069	.. member:: double Solver::Options::min_lm_diagonal
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1070
				1071	Default: ``1e6``
				1072
				1073	The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
				1074	regularize the the trust region step. This is the lower bound on
				1075	the values of this diagonal matrix.
				1076
Sameer Agarwal	eeedd2e	2013-07-07 23:04:31 -0700	[diff] [blame]	1077	.. member:: double Solver::Options::max_lm_diagonal
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1078
				1079	Default: ``1e32``
				1080
				1081	The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
				1082	regularize the the trust region step. This is the upper bound on
				1083	the values of this diagonal matrix.
				1084
				1085	.. member:: int Solver::Options::max_num_consecutive_invalid_steps
				1086
				1087	Default: ``5``
				1088
				1089	The step returned by a trust region strategy can sometimes be
				1090	numerically invalid, usually because of conditioning
				1091	issues. Instead of crashing or stopping the optimization, the
				1092	optimizer can go ahead and try solving with a smaller trust
				1093	region/better conditioned problem. This parameter sets the number
				1094	of consecutive retries before the minimizer gives up.
				1095
				1096	.. member:: double Solver::Options::function_tolerance
				1097
				1098	Default: ``1e-6``
				1099
				1100	Solver terminates if
				1101
				1102	.. math:: \frac{\|\Delta \text{cost}\|}{\text{cost} < \text{function_tolerance}}
				1103
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	1104	where, :math:`\Delta \text{cost}` is the change in objective
				1105	function value (up or down) in the current iteration of
				1106	Levenberg-Marquardt.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1107
				1108	.. member:: double Solver::Options::gradient_tolerance
				1109
				1110	Default: ``1e-10``
				1111
				1112	Solver terminates if
				1113
				1114	.. math:: \frac{\\|g(x)\\|_\infty}{\\|g(x_0)\\|_\infty} < \text{gradient_tolerance}
				1115
				1116	where :math:`\\|\cdot\\|_\infty` refers to the max norm, and :math:`x_0` is
				1117	the vector of initial parameter values.
				1118
				1119	.. member:: double Solver::Options::parameter_tolerance
				1120
				1121	Default: ``1e-8``
				1122
				1123	Solver terminates if
				1124
				1125	.. math:: \\|\Delta x\\| < (\\|x\\| + \text{parameter_tolerance}) * \text{parameter_tolerance}
				1126
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	1127	where :math:`\Delta x` is the step computed by the linear solver in
				1128	the current iteration of Levenberg-Marquardt.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1129
				1130	.. member:: LinearSolverType Solver::Options::linear_solver_type
				1131
				1132	Default: ``SPARSE_NORMAL_CHOLESKY`` / ``DENSE_QR``
				1133
				1134	Type of linear solver used to compute the solution to the linear
				1135	least squares problem in each iteration of the Levenberg-Marquardt
				1136	algorithm. If Ceres is build with ``SuiteSparse`` linked in then
				1137	the default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR``
				1138	otherwise.
				1139
				1140	.. member:: PreconditionerType Solver::Options::preconditioner_type
				1141
				1142	Default: ``JACOBI``
				1143
				1144	The preconditioner used by the iterative linear solver. The default
				1145	is the block Jacobi preconditioner. Valid values are (in increasing
				1146	order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``,
				1147	``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See
				1148	:ref:`section-preconditioner` for more details.
				1149
				1150	.. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library
				1151
				1152	Default:``SUITE_SPARSE``
				1153
				1154	Ceres supports the use of two sparse linear algebra libraries,
				1155	``SuiteSparse``, which is enabled by setting this parameter to
				1156	``SUITE_SPARSE`` and ``CXSparse``, which can be selected by setting
				1157	this parameter to ```CX_SPARSE``. ``SuiteSparse`` is a
				1158	sophisticated and complex sparse linear algebra library and should
				1159	be used in general. If your needs/platforms prevent you from using
				1160	``SuiteSparse``, consider using ``CXSparse``, which is a much
				1161	smaller, easier to build library. As can be expected, its
				1162	performance on large problems is not comparable to that of
				1163	``SuiteSparse``.
				1164
				1165	.. member:: int Solver::Options::num_linear_solver_threads
				1166
				1167	Default: ``1``
				1168
				1169	Number of threads used by the linear solver.
				1170
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1171	.. member:: ParameterBlockOrdering* Solver::Options::linear_solver_ordering
				1172
				1173	Default: ``NULL``
				1174
				1175	An instance of the ordering object informs the solver about the
				1176	desired order in which parameter blocks should be eliminated by the
				1177	linear solvers. See section~\ref{sec:ordering`` for more details.
				1178
				1179	If ``NULL``, the solver is free to choose an ordering that it
Sameer Agarwal	0939632	2013-05-28 22:29:36 -0700	[diff] [blame]	1180	thinks is best.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1181
				1182	See :ref:`section-ordering` for more details.
				1183
Sameer Agarwal	0939632	2013-05-28 22:29:36 -0700	[diff] [blame]	1184	.. member:: bool Solver::Options::use_post_ordering
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1185
Sameer Agarwal	0939632	2013-05-28 22:29:36 -0700	[diff] [blame]	1186	Default: ``false``
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1187
Sameer Agarwal	0939632	2013-05-28 22:29:36 -0700	[diff] [blame]	1188	Sparse Cholesky factorization algorithms use a fill-reducing
				1189	ordering to permute the columns of the Jacobian matrix. There are
				1190	two ways of doing this.
				1191
				1192	1. Compute the Jacobian matrix in some order and then have the
				1193	factorization algorithm permute the columns of the Jacobian.
				1194
Sameer Agarwal	f0b071b	2013-05-31 13:22:51 -0700	[diff] [blame]	1195	2. Compute the Jacobian with its columns already permuted.
Sameer Agarwal	0939632	2013-05-28 22:29:36 -0700	[diff] [blame]	1196
				1197	The first option incurs a significant memory penalty. The
				1198	factorization algorithm has to make a copy of the permuted Jacobian
				1199	matrix, thus Ceres pre-permutes the columns of the Jacobian matrix
				1200	and generally speaking, there is no performance penalty for doing
				1201	so.
				1202
				1203	In some rare cases, it is worth using a more complicated reordering
				1204	algorithm which has slightly better runtime performance at the
				1205	expense of an extra copy of the Jacobian matrix. Setting
				1206	``use_postordering`` to ``true`` enables this tradeoff.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1207
Sameer Agarwal	eeedd2e	2013-07-07 23:04:31 -0700	[diff] [blame]	1208	.. member:: int Solver::Options::min_linear_solver_iterations
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1209
				1210	Default: ``1``
				1211
				1212	Minimum number of iterations used by the linear solver. This only
				1213	makes sense when the linear solver is an iterative solver, e.g.,
				1214	``ITERATIVE_SCHUR`` or ``CGNR``.
				1215
Sameer Agarwal	eeedd2e	2013-07-07 23:04:31 -0700	[diff] [blame]	1216	.. member:: int Solver::Options::max_linear_solver_iterations
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1217
				1218	Default: ``500``
				1219
				1220	Minimum number of iterations used by the linear solver. This only
				1221	makes sense when the linear solver is an iterative solver, e.g.,
				1222	``ITERATIVE_SCHUR`` or ``CGNR``.
				1223
				1224	.. member:: double Solver::Options::eta
				1225
				1226	Default: ``1e-1``
				1227
				1228	Forcing sequence parameter. The truncated Newton solver uses this
				1229	number to control the relative accuracy with which the Newton step
				1230	is computed. This constant is passed to
				1231	``ConjugateGradientsSolver`` which uses it to terminate the
				1232	iterations when
				1233
				1234	.. math:: \frac{Q_i - Q_{i-1}}{Q_i} < \frac{\eta}{i}
				1235
				1236	.. member:: bool Solver::Options::jacobi_scaling
				1237
				1238	Default: ``true``
				1239
				1240	``true`` means that the Jacobian is scaled by the norm of its
				1241	columns before being passed to the linear solver. This improves the
				1242	numerical conditioning of the normal equations.
				1243
Sameer Agarwal	0939632	2013-05-28 22:29:36 -0700	[diff] [blame]	1244	.. member:: bool Solver::Options::use_inner_iterations
				1245
				1246	Default: ``false``
				1247
				1248	Use a non-linear version of a simplified variable projection
				1249	algorithm. Essentially this amounts to doing a further optimization
				1250	on each Newton/Trust region step using a coordinate descent
				1251	algorithm. For more details, see :ref:`section-inner-iterations`.
				1252
				1253	.. member:: double Solver::Options::inner_itearation_tolerance
				1254
				1255	Default: ``1e-3``
				1256
				1257	Generally speaking, inner iterations make significant progress in
				1258	the early stages of the solve and then their contribution drops
				1259	down sharply, at which point the time spent doing inner iterations
				1260	is not worth it.
				1261
				1262	Once the relative decrease in the objective function due to inner
				1263	iterations drops below ``inner_iteration_tolerance``, the use of
				1264	inner iterations in subsequent trust region minimizer iterations is
				1265	disabled.
				1266
				1267	.. member:: ParameterBlockOrdering* Solver::Options::inner_iteration_ordering
				1268
				1269	Default: ``NULL``
				1270
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	1271	If :member:`Solver::Options::use_inner_iterations` true, then the
				1272	user has two choices.
Sameer Agarwal	0939632	2013-05-28 22:29:36 -0700	[diff] [blame]	1273
				1274	1. Let the solver heuristically decide which parameter blocks to
				1275	optimize in each inner iteration. To do this, set
				1276	:member:`Solver::Options::inner_iteration_ordering` to ``NULL``.
				1277
				1278	2. Specify a collection of of ordered independent sets. The lower
				1279	numbered groups are optimized before the higher number groups
				1280	during the inner optimization phase. Each group must be an
				1281	independent set. Not all parameter blocks need to be included in
				1282	the ordering.
				1283
				1284	See :ref:`section-ordering` for more details.
				1285
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1286	.. member:: LoggingType Solver::Options::logging_type
				1287
				1288	Default: ``PER_MINIMIZER_ITERATION``
				1289
				1290	.. member:: bool Solver::Options::minimizer_progress_to_stdout
				1291
				1292	Default: ``false``
				1293
				1294	By default the :class:`Minimizer` progress is logged to ``STDERR``
				1295	depending on the ``vlog`` level. If this flag is set to true, and
				1296	:member:`Solver::Options::logging_type` is not ``SILENT``, the logging
				1297	output is sent to ``STDOUT``.
				1298
Sameer Agarwal	4437639	2013-06-03 09:20:49 -0700	[diff] [blame]	1299	For ``TRUST_REGION_MINIMIZER`` the progress display looks like
				1300
				1301	.. code-block:: bash
				1302
				1303	0: f: 1.250000e+01 d: 0.00e+00 g: 5.00e+00 h: 0.00e+00 rho: 0.00e+00 mu: 1.00e+04 li: 0 it: 6.91e-06 tt: 1.91e-03
				1304	1: f: 1.249750e-07 d: 1.25e+01 g: 5.00e-04 h: 5.00e+00 rho: 1.00e+00 mu: 3.00e+04 li: 1 it: 2.81e-05 tt: 1.99e-03
				1305	2: f: 1.388518e-16 d: 1.25e-07 g: 1.67e-08 h: 5.00e-04 rho: 1.00e+00 mu: 9.00e+04 li: 1 it: 1.00e-05 tt: 2.01e-03
				1306
				1307	Here
				1308
				1309	#. ``f`` is the value of the objective function.
				1310	#. ``d`` is the change in the value of the objective function if
				1311	the step computed in this iteration is accepted.
				1312	#. ``g`` is the max norm of the gradient.
				1313	#. ``h`` is the change in the parameter vector.
				1314	#. ``rho`` is the ratio of the actual change in the objective
				1315	function value to the change in the the value of the trust
				1316	region model.
				1317	#. ``mu`` is the size of the trust region radius.
				1318	#. ``li`` is the number of linear solver iterations used to compute
				1319	the trust region step. For direct/factorization based solvers it
				1320	is always 1, for iterative solvers like ``ITERATIVE_SCHUR`` it
				1321	is the number of iterations of the Conjugate Gradients
				1322	algorithm.
				1323	#. ``it`` is the time take by the current iteration.
				1324	#. ``tt`` is the the total time taken by the minimizer.
				1325
				1326	For ``LINE_SEARCH_MINIMIZER`` the progress display looks like
				1327
				1328	.. code-block:: bash
				1329
				1330	0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e: 0 it: 2.98e-02 tt: 8.50e-02
				1331	1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e: 1 it: 4.54e-02 tt: 1.31e-01
				1332	2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e: 1 it: 4.96e-02 tt: 1.81e-01
				1333
				1334	Here
				1335
				1336	#. ``f`` is the value of the objective function.
				1337	#. ``d`` is the change in the value of the objective function if
				1338	the step computed in this iteration is accepted.
				1339	#. ``g`` is the max norm of the gradient.
				1340	#. ``h`` is the change in the parameter vector.
				1341	#. ``s`` is the optimal step length computed by the line search.
				1342	#. ``it`` is the time take by the current iteration.
				1343	#. ``tt`` is the the total time taken by the minimizer.
				1344
Sameer Agarwal	c4a3291	2013-06-13 22:00:48 -0700	[diff] [blame]	1345	.. member:: vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1346
				1347	Default: ``empty``
				1348
Sameer Agarwal	c4a3291	2013-06-13 22:00:48 -0700	[diff] [blame]	1349	List of iterations at which the trust region minimizer should dump
				1350	the trust region problem. Useful for testing and benchmarking. If
				1351	``empty``, no problems are dumped.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1352
Sameer Agarwal	c4a3291	2013-06-13 22:00:48 -0700	[diff] [blame]	1353	.. member:: string Solver::Options::trust_region_problem_dump_directory
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1354
				1355	Default: ``/tmp``
				1356
Sameer Agarwal	c4a3291	2013-06-13 22:00:48 -0700	[diff] [blame]	1357	Directory to which the problems should be written to. Should be
				1358	non-empty if
				1359	:member:`Solver::Options::trust_region_minimizer_iterations_to_dump` is
				1360	non-empty and
				1361	:member:`Solver::Options::trust_region_problem_dump_format_type` is not
				1362	``CONSOLE``.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1363
Sameer Agarwal	c4a3291	2013-06-13 22:00:48 -0700	[diff] [blame]	1364	.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1365
				1366	Default: ``TEXTFILE``
				1367
Sameer Agarwal	c4a3291	2013-06-13 22:00:48 -0700	[diff] [blame]	1368	The format in which trust region problems should be logged when
				1369	:member:`Solver::Options::trust_region_minimizer_iterations_to_dump`
				1370	is non-empty. There are three options:
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1371
				1372	* ``CONSOLE`` prints the linear least squares problem in a human
				1373	readable format to ``stderr``. The Jacobian is printed as a
				1374	dense matrix. The vectors :math:`D`, :math:`x` and :math:`f` are
				1375	printed as dense vectors. This should only be used for small
				1376	problems.
				1377
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1378	* ``TEXTFILE`` Write out the linear least squares problem to the
Sameer Agarwal	c4a3291	2013-06-13 22:00:48 -0700	[diff] [blame]	1379	directory pointed to by
				1380	:member:`Solver::Options::trust_region_problem_dump_directory` as
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1381	text files which can be read into ``MATLAB/Octave``. The Jacobian
				1382	is dumped as a text file containing :math:`(i,j,s)` triplets, the
				1383	vectors :math:`D`, `x` and `f` are dumped as text files
				1384	containing a list of their values.
				1385
Sameer Agarwal	c4a3291	2013-06-13 22:00:48 -0700	[diff] [blame]	1386	A ``MATLAB/Octave`` script called
				1387	``ceres_solver_iteration_???.m`` is also output, which can be
				1388	used to parse and load the problem into memory.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1389
				1390	.. member:: bool Solver::Options::check_gradients
				1391
				1392	Default: ``false``
				1393
				1394	Check all Jacobians computed by each residual block with finite
				1395	differences. This is expensive since it involves computing the
				1396	derivative by normal means (e.g. user specified, autodiff, etc),
				1397	then also computing it using finite differences. The results are
				1398	compared, and if they differ substantially, details are printed to
				1399	the log.
				1400
				1401	.. member:: double Solver::Options::gradient_check_relative_precision
				1402
				1403	Default: ``1e08``
				1404
				1405	Precision to check for in the gradient checker. If the relative
				1406	difference between an element in a Jacobian exceeds this number,
				1407	then the Jacobian for that cost term is dumped.
				1408
				1409	.. member:: double Solver::Options::numeric_derivative_relative_step_size
				1410
				1411	Default: ``1e-6``
				1412
				1413	Relative shift used for taking numeric derivatives. For finite
				1414	differencing, each dimension is evaluated at slightly shifted
				1415	values, e.g., for forward differences, the numerical derivative is
				1416
				1417	.. math::
				1418
				1419	\delta &= numeric\_derivative\_relative\_step\_size\\
				1420	\Delta f &= \frac{f((1 + \delta) x) - f(x)}{\delta x}
				1421
				1422	The finite differencing is done along each dimension. The reason to
				1423	use a relative (rather than absolute) step size is that this way,
				1424	numeric differentiation works for functions where the arguments are
				1425	typically large (e.g. :math:`10^9`) and when the values are small
				1426	(e.g. :math:`10^{-5}`). It is possible to construct torture cases
				1427	which break this finite difference heuristic, but they do not come
				1428	up often in practice.
				1429
				1430	.. member:: vector<IterationCallback> Solver::Options::callbacks
				1431
				1432	Callbacks that are executed at the end of each iteration of the
				1433	:class:`Minimizer`. They are executed in the order that they are
				1434	specified in this vector. By default, parameter blocks are updated
				1435	only at the end of the optimization, i.e when the
				1436	:class:`Minimizer` terminates. This behavior is controlled by
Sameer Agarwal	c5bcfc0	2013-07-19 15:50:27 -0700	[diff] [blame]	1437	:member:`Solver::Options::update_state_every_variable`. If the user
				1438	wishes to have access to the update parameter blocks when his/her
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1439	callbacks are executed, then set
				1440	:member:`Solver::Options::update_state_every_iteration` to true.
				1441
				1442	The solver does NOT take ownership of these pointers.
				1443
				1444	.. member:: bool Solver::Options::update_state_every_iteration
				1445
				1446	Default: ``false``
				1447
				1448	Normally the parameter blocks are only updated when the solver
				1449	terminates. Setting this to true update them in every
				1450	iteration. This setting is useful when building an interactive
				1451	application using Ceres and using an :class:`IterationCallback`.
				1452
				1453	.. member:: string Solver::Options::solver_log
				1454
				1455	Default: ``empty``
				1456
				1457	If non-empty, a summary of the execution of the solver is recorded
				1458	to this file. This file is used for recording and Ceres'
				1459	performance. Currently, only the iteration number, total time and
				1460	the objective function value are logged. The format of this file is
				1461	expected to change over time as the performance evaluation
				1462	framework is fleshed out.
				1463
				1464	:class:`ParameterBlockOrdering`
				1465	-------------------------------
				1466
				1467	.. class:: ParameterBlockOrdering
				1468
Sameer Agarwal	0939632	2013-05-28 22:29:36 -0700	[diff] [blame]	1469	``ParameterBlockOrdering`` is a class for storing and manipulating
				1470	an ordered collection of groups/sets with the following semantics:
				1471
				1472	Group IDs are non-negative integer values. Elements are any type
				1473	that can serve as a key in a map or an element of a set.
				1474
				1475	An element can only belong to one group at a time. A group may
				1476	contain an arbitrary number of elements.
				1477
				1478	Groups are ordered by their group id.
				1479
				1480	.. function:: bool ParameterBlockOrdering::AddElementToGroup(const double* element, const int group)
				1481
				1482	Add an element to a group. If a group with this id does not exist,
				1483	one is created. This method can be called any number of times for
				1484	the same element. Group ids should be non-negative numbers. Return
				1485	value indicates if adding the element was a success.
				1486
				1487	.. function:: void ParameterBlockOrdering::Clear()
				1488
				1489	Clear the ordering.
				1490
				1491	.. function:: bool ParameterBlockOrdering::Remove(const double* element)
				1492
				1493	Remove the element, no matter what group it is in. If the element
				1494	is not a member of any group, calling this method will result in a
				1495	crash. Return value indicates if the element was actually removed.
				1496
				1497	.. function:: void ParameterBlockOrdering::Reverse()
				1498
				1499	Reverse the order of the groups in place.
				1500
				1501	.. function:: int ParameterBlockOrdering::GroupId(const double* element) const
				1502
				1503	Return the group id for the element. If the element is not a member
				1504	of any group, return -1.
				1505
				1506	.. function:: bool ParameterBlockOrdering::IsMember(const double* element) const
				1507
				1508	True if there is a group containing the parameter block.
				1509
				1510	.. function:: int ParameterBlockOrdering::GroupSize(const int group) const
				1511
				1512	This function always succeeds, i.e., implicitly there exists a
				1513	group for every integer.
				1514
				1515	.. function:: int ParameterBlockOrdering::NumElements() const
				1516
				1517	Number of elements in the ordering.
				1518
				1519	.. function:: int ParameterBlockOrdering::NumGroups() const
				1520
				1521	Number of groups with one or more elements.
				1522
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1523
				1524	:class:`IterationCallback`
				1525	--------------------------
				1526
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	1527	.. class:: IterationSummary
				1528
				1529	:class:`IterationSummary` describes the state of the optimizer
Sameer Agarwal	f0b071b	2013-05-31 13:22:51 -0700	[diff] [blame]	1530	after each iteration of the minimization. Note that all times are
				1531	wall times.
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	1532
				1533	.. code-block:: c++
				1534
				1535	struct IterationSummary {
				1536	// Current iteration number.
				1537	int32 iteration;
				1538
				1539	// Step was numerically valid, i.e., all values are finite and the
				1540	// step reduces the value of the linearized model.
				1541	//
				1542	// Note: step_is_valid is false when iteration = 0.
				1543	bool step_is_valid;
				1544
				1545	// Step did not reduce the value of the objective function
				1546	// sufficiently, but it was accepted because of the relaxed
				1547	// acceptance criterion used by the non-monotonic trust region
				1548	// algorithm.
				1549	//
				1550	// Note: step_is_nonmonotonic is false when iteration = 0;
				1551	bool step_is_nonmonotonic;
				1552
				1553	// Whether or not the minimizer accepted this step or not. If the
				1554	// ordinary trust region algorithm is used, this means that the
				1555	// relative reduction in the objective function value was greater
				1556	// than Solver::Options::min_relative_decrease. However, if the
				1557	// non-monotonic trust region algorithm is used
				1558	// (Solver::Options:use_nonmonotonic_steps = true), then even if the
				1559	// relative decrease is not sufficient, the algorithm may accept the
				1560	// step and the step is declared successful.
				1561	//
				1562	// Note: step_is_successful is false when iteration = 0.
				1563	bool step_is_successful;
				1564
				1565	// Value of the objective function.
				1566	double cost;
				1567
				1568	// Change in the value of the objective function in this
				1569	// iteration. This can be positive or negative.
				1570	double cost_change;
				1571
				1572	// Infinity norm of the gradient vector.
				1573	double gradient_max_norm;
				1574
				1575	// 2-norm of the size of the step computed by the optimization
				1576	// algorithm.
				1577	double step_norm;
				1578
				1579	// For trust region algorithms, the ratio of the actual change in
				1580	// cost and the change in the cost of the linearized approximation.
				1581	double relative_decrease;
				1582
				1583	// Size of the trust region at the end of the current iteration. For
				1584	// the Levenberg-Marquardt algorithm, the regularization parameter
				1585	// mu = 1.0 / trust_region_radius.
				1586	double trust_region_radius;
				1587
				1588	// For the inexact step Levenberg-Marquardt algorithm, this is the
				1589	// relative accuracy with which the Newton(LM) step is solved. This
				1590	// number affects only the iterative solvers capable of solving
				1591	// linear systems inexactly. Factorization-based exact solvers
				1592	// ignore it.
				1593	double eta;
				1594
				1595	// Step sized computed by the line search algorithm.
				1596	double step_size;
				1597
				1598	// Number of function evaluations used by the line search algorithm.
				1599	int line_search_function_evaluations;
				1600
				1601	// Number of iterations taken by the linear solver to solve for the
				1602	// Newton step.
				1603	int linear_solver_iterations;
				1604
				1605	// Time (in seconds) spent inside the minimizer loop in the current
				1606	// iteration.
				1607	double iteration_time_in_seconds;
				1608
				1609	// Time (in seconds) spent inside the trust region step solver.
				1610	double step_solver_time_in_seconds;
				1611
				1612	// Time (in seconds) since the user called Solve().
				1613	double cumulative_time_in_seconds;
				1614	};
				1615
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1616	.. class:: IterationCallback
				1617
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	1618	.. code-block:: c++
				1619
				1620	class IterationCallback {
				1621	public:
				1622	virtual ~IterationCallback() {}
				1623	virtual CallbackReturnType operator()(const IterationSummary& summary) = 0;
				1624	};
				1625
				1626	Interface for specifying callbacks that are executed at the end of
				1627	each iteration of the Minimizer. The solver uses the return value of
				1628	``operator()`` to decide whether to continue solving or to
				1629	terminate. The user can return three values.
				1630
				1631	#. ``SOLVER_ABORT`` indicates that the callback detected an abnormal
				1632	situation. The solver returns without updating the parameter
				1633	blocks (unless ``Solver::Options::update_state_every_iteration`` is
				1634	set true). Solver returns with ``Solver::Summary::termination_type``
				1635	set to ``USER_ABORT``.
				1636
				1637	#. ``SOLVER_TERMINATE_SUCCESSFULLY`` indicates that there is no need
				1638	to optimize anymore (some user specified termination criterion
				1639	has been met). Solver returns with
				1640	``Solver::Summary::termination_type``` set to ``USER_SUCCESS``.
				1641
				1642	#. ``SOLVER_CONTINUE`` indicates that the solver should continue
				1643	optimizing.
				1644
				1645	For example, the following ``IterationCallback`` is used internally
				1646	by Ceres to log the progress of the optimization.
				1647
				1648	.. code-block:: c++
				1649
				1650	class LoggingCallback : public IterationCallback {
				1651	public:
				1652	explicit LoggingCallback(bool log_to_stdout)
				1653	: log_to_stdout_(log_to_stdout) {}
				1654
				1655	~LoggingCallback() {}
				1656
				1657	CallbackReturnType operator()(const IterationSummary& summary) {
				1658	const char* kReportRowFormat =
				1659	"% 4d: f:% 8e d:% 3.2e g:% 3.2e h:% 3.2e "
				1660	"rho:% 3.2e mu:% 3.2e eta:% 3.2e li:% 3d";
				1661	string output = StringPrintf(kReportRowFormat,
				1662	summary.iteration,
				1663	summary.cost,
				1664	summary.cost_change,
				1665	summary.gradient_max_norm,
				1666	summary.step_norm,
				1667	summary.relative_decrease,
				1668	summary.trust_region_radius,
				1669	summary.eta,
				1670	summary.linear_solver_iterations);
				1671	if (log_to_stdout_) {
				1672	cout << output << endl;
				1673	} else {
				1674	VLOG(1) << output;
				1675	}
				1676	return SOLVER_CONTINUE;
				1677	}
				1678
				1679	private:
				1680	const bool log_to_stdout_;
				1681	};
				1682
				1683
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1684
				1685	:class:`CRSMatrix`
				1686	------------------
				1687
				1688	.. class:: CRSMatrix
				1689
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	1690	.. code-block:: c++
				1691
				1692	struct CRSMatrix {
				1693	int num_rows;
				1694	int num_cols;
				1695	vector<int> cols;
				1696	vector<int> rows;
				1697	vector<double> values;
				1698	};
				1699
				1700	A compressed row sparse matrix used primarily for communicating the
				1701	Jacobian matrix to the user.
				1702
				1703	A compressed row matrix stores its contents in three arrays,
				1704	``rows``, ``cols`` and ``values``.
				1705
				1706	``rows`` is a ``num_rows + 1`` sized array that points into the ``cols`` and
				1707	``values`` array. For each row ``i``:
				1708
				1709	``cols[rows[i]]`` ... ``cols[rows[i + 1] - 1]`` are the indices of the
				1710	non-zero columns of row ``i``.
				1711
				1712	``values[rows[i]]`` ... ``values[rows[i + 1] - 1]`` are the values of the
				1713	corresponding entries.
				1714
				1715	``cols`` and ``values`` contain as many entries as there are
				1716	non-zeros in the matrix.
				1717
				1718	e.g, consider the 3x4 sparse matrix
				1719
				1720	.. code-block:: c++
				1721
				1722	0 10 0 4
				1723	0 2 -3 2
				1724	1 2 0 0
				1725
				1726	The three arrays will be:
				1727
				1728	.. code-block:: c++
				1729
				1730	-row0- ---row1--- -row2-
				1731	rows = [ 0, 2, 5, 7]
				1732	cols = [ 1, 3, 1, 2, 3, 0, 1]
				1733	values = [10, 4, 2, -3, 2, 1, 2]
				1734
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1735
				1736	:class:`Solver::Summary`
				1737	------------------------
				1738
				1739	.. class:: Solver::Summary
				1740
Sameer Agarwal	f0b071b	2013-05-31 13:22:51 -0700	[diff] [blame]	1741	Note that all times reported in this struct are wall times.
				1742
Sameer Agarwal	ebbb984	2013-05-26 12:40:12 -0700	[diff] [blame]	1743	.. code-block:: c++
				1744
				1745	struct Summary {
				1746	// A brief one line description of the state of the solver after
				1747	// termination.
				1748	string BriefReport() const;
				1749
				1750	// A full multiline description of the state of the solver after
				1751	// termination.
				1752	string FullReport() const;
				1753
				1754	// Minimizer summary -------------------------------------------------
				1755	MinimizerType minimizer_type;
				1756
				1757	SolverTerminationType termination_type;
				1758
				1759	// If the solver did not run, or there was a failure, a
				1760	// description of the error.
				1761	string error;
				1762
				1763	// Cost of the problem before and after the optimization. See
				1764	// problem.h for definition of the cost of a problem.
				1765	double initial_cost;
				1766	double final_cost;
				1767
				1768	// The part of the total cost that comes from residual blocks that
				1769	// were held fixed by the preprocessor because all the parameter
				1770	// blocks that they depend on were fixed.
				1771	double fixed_cost;
				1772
				1773	vector<IterationSummary> iterations;
				1774
				1775	int num_successful_steps;
				1776	int num_unsuccessful_steps;
				1777	int num_inner_iteration_steps;
				1778
				1779	// When the user calls Solve, before the actual optimization
				1780	// occurs, Ceres performs a number of preprocessing steps. These
				1781	// include error checks, memory allocations, and reorderings. This
				1782	// time is accounted for as preprocessing time.
				1783	double preprocessor_time_in_seconds;
				1784
				1785	// Time spent in the TrustRegionMinimizer.
				1786	double minimizer_time_in_seconds;
				1787
				1788	// After the Minimizer is finished, some time is spent in
				1789	// re-evaluating residuals etc. This time is accounted for in the
				1790	// postprocessor time.
				1791	double postprocessor_time_in_seconds;
				1792
				1793	// Some total of all time spent inside Ceres when Solve is called.
				1794	double total_time_in_seconds;
				1795
				1796	double linear_solver_time_in_seconds;
				1797	double residual_evaluation_time_in_seconds;
				1798	double jacobian_evaluation_time_in_seconds;
				1799	double inner_iteration_time_in_seconds;
				1800
				1801	// Preprocessor summary.
				1802	int num_parameter_blocks;
				1803	int num_parameters;
				1804	int num_effective_parameters;
				1805	int num_residual_blocks;
				1806	int num_residuals;
				1807
				1808	int num_parameter_blocks_reduced;
				1809	int num_parameters_reduced;
				1810	int num_effective_parameters_reduced;
				1811	int num_residual_blocks_reduced;
				1812	int num_residuals_reduced;
				1813
				1814	int num_eliminate_blocks_given;
				1815	int num_eliminate_blocks_used;
				1816
				1817	int num_threads_given;
				1818	int num_threads_used;
				1819
				1820	int num_linear_solver_threads_given;
				1821	int num_linear_solver_threads_used;
				1822
				1823	LinearSolverType linear_solver_type_given;
				1824	LinearSolverType linear_solver_type_used;
				1825
				1826	vector<int> linear_solver_ordering_given;
				1827	vector<int> linear_solver_ordering_used;
				1828
				1829	bool inner_iterations_given;
				1830	bool inner_iterations_used;
				1831
				1832	vector<int> inner_iteration_ordering_given;
				1833	vector<int> inner_iteration_ordering_used;
				1834
				1835	PreconditionerType preconditioner_type;
				1836
				1837	TrustRegionStrategyType trust_region_strategy_type;
				1838	DoglegType dogleg_type;
				1839
				1840	SparseLinearAlgebraLibraryType sparse_linear_algebra_library;
				1841
				1842	LineSearchDirectionType line_search_direction_type;
				1843	LineSearchType line_search_type;
				1844	int max_lbfgs_rank;
				1845	};
				1846
				1847
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	1848	Covariance Estimation
				1849	=====================
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1850
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	1851	Background
				1852	----------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1853
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	1854	One way to assess the quality of the solution returned by a
				1855	non-linear least squares solve is to analyze the covariance of the
				1856	solution.
				1857
				1858	Let us consider the non-linear regression problem
				1859
				1860	.. math:: y = f(x) + N(0, I)
				1861
				1862	i.e., the observation :math:`y` is a random non-linear function of the
				1863	independent variable :math:`x` with mean :math:`f(x)` and identity
				1864	covariance. Then the maximum likelihood estimate of :math:`x` given
				1865	observations :math:`y` is the solution to the non-linear least squares
				1866	problem:
				1867
				1868	.. math:: x^* = \arg \min_x \\|f(x)\\|^2
				1869
				1870	And the covariance of :math:`x^*` is given by
				1871
				1872	.. math:: C(x^) = \left(J'(x^)J(x^*)\right)^{-1}
				1873
				1874	Here :math:`J(x^)` is the Jacobian of :math:`f` at :math:`x^`. The
				1875	above formula assumes that :math:`J(x^*)` has full column rank.
				1876
				1877	If :math:`J(x^)` is rank deficient, then the covariance matrix :math:`C(x^)`
				1878	is also rank deficient and is given by the Moore-Penrose pseudo inverse.
				1879
				1880	.. math:: C(x^) = \left(J'(x^)J(x^*)\right)^{\dagger}
				1881
				1882	Note that in the above, we assumed that the covariance matrix for
				1883	:math:`y` was identity. This is an important assumption. If this is
				1884	not the case and we have
				1885
				1886	.. math:: y = f(x) + N(0, S)
				1887
				1888	Where :math:`S` is a positive semi-definite matrix denoting the
				1889	covariance of :math:`y`, then the maximum likelihood problem to be
				1890	solved is
				1891
				1892	.. math:: x^* = \arg \min_x f'(x) S^{-1} f(x)
				1893
				1894	and the corresponding covariance estimate of :math:`x^*` is given by
				1895
				1896	.. math:: C(x^) = \left(J'(x^) S^{-1} J(x^*)\right)^{-1}
				1897
				1898	So, if it is the case that the observations being fitted to have a
				1899	covariance matrix not equal to identity, then it is the user's
				1900	responsibility that the corresponding cost functions are correctly
				1901	scaled, e.g. in the above case the cost function for this problem
				1902	should evaluate :math:`S^{-1/2} f(x)` instead of just :math:`f(x)`,
				1903	where :math:`S^{-1/2}` is the inverse square root of the covariance
				1904	matrix :math:`S`.
				1905
				1906	Gauge Invariance
				1907	----------------
				1908
				1909	In structure from motion (3D reconstruction) problems, the
				1910	reconstruction is ambiguous upto a similarity transform. This is
				1911	known as a Gauge Ambiguity. Handling Gauges correctly requires the
				1912	use of SVD or custom inversion algorithms. For small problems the
				1913	user can use the dense algorithm. For more details see the work of
				1914	Kanatani & Morris [KanataniMorris]_.
				1915
				1916
				1917	:class:`Covariance`
				1918	-------------------
				1919
				1920	:class:`Covariance` allows the user to evaluate the covariance for a
				1921	non-linear least squares problem and provides random access to its
				1922	blocks. The computation assumes that the cost functions compute
				1923	residuals such that their covariance is identity.
				1924
				1925	Since the computation of the covariance matrix requires computing the
				1926	inverse of a potentially large matrix, this can involve a rather large
				1927	amount of time and memory. However, it is usually the case that the
				1928	user is only interested in a small part of the covariance
				1929	matrix. Quite often just the block diagonal. :class:`Covariance`
				1930	allows the user to specify the parts of the covariance matrix that she
				1931	is interested in and then uses this information to only compute and
				1932	store those parts of the covariance matrix.
				1933
				1934	Rank of the Jacobian
				1935	--------------------
				1936
				1937	As we noted above, if the Jacobian is rank deficient, then the inverse
				1938	of :math:`J'J` is not defined and instead a pseudo inverse needs to be
				1939	computed.
				1940
				1941	The rank deficiency in :math:`J` can be structural -- columns
				1942	which are always known to be zero or numerical -- depending on the
				1943	exact values in the Jacobian.
				1944
				1945	Structural rank deficiency occurs when the problem contains parameter
				1946	blocks that are constant. This class correctly handles structural rank
				1947	deficiency like that.
				1948
				1949	Numerical rank deficiency, where the rank of the matrix cannot be
				1950	predicted by its sparsity structure and requires looking at its
				1951	numerical values is more complicated. Here again there are two
				1952	cases.
				1953
				1954	a. The rank deficiency arises from overparameterization. e.g., a
				1955	four dimensional quaternion used to parameterize :math:`SO(3)`,
				1956	which is a three dimensional manifold. In cases like this, the
				1957	user should use an appropriate
				1958	:class:`LocalParameterization`. Not only will this lead to better
				1959	numerical behaviour of the Solver, it will also expose the rank
				1960	deficiency to the :class:`Covariance` object so that it can
				1961	handle it correctly.
				1962
				1963	b. More general numerical rank deficiency in the Jacobian requires
				1964	the computation of the so called Singular Value Decomposition
				1965	(SVD) of :math:`J'J`. We do not know how to do this for large
				1966	sparse matrices efficiently. For small and moderate sized
				1967	problems this is done using dense linear algebra.
				1968
				1969
				1970	:class:`Covariance::Options`
				1971
				1972	.. class:: Covariance::Options
				1973
				1974	.. member:: int Covariance::Options::num_threads
				1975
				1976	Default: ``1``
				1977
				1978	Number of threads to be used for evaluating the Jacobian and
				1979	estimation of covariance.
				1980
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	1981	.. member:: CovarianceAlgorithmType Covariance::Options::algorithm_type
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	1982
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	1983	Default: ``SPARSE_QR`` or ``DENSE_SVD``
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	1984
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	1985	Ceres supports three different algorithms for covariance
				1986	estimation, which represent different tradeoffs in speed, accuracy
				1987	and reliability.
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	1988
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	1989	1. ``DENSE_SVD`` uses ``Eigen``'s ``JacobiSVD`` to perform the
				1990	computations. It computes the singular value decomposition
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	1991
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	1992	.. math:: U S V^\top = J
				1993
				1994	and then uses it to compute the pseudo inverse of J'J as
				1995
				1996	.. math:: (J'J)^{\dagger} = V S^{\dagger} V^\top
				1997
				1998	It is an accurate but slow method and should only be used for
				1999	small to moderate sized problems. It can handle full-rank as
				2000	well as rank deficient Jacobians.
				2001
				2002	2. ``SPARSE_CHOLESKY`` uses the ``CHOLMOD`` sparse Cholesky
				2003	factorization library to compute the decomposition :
				2004
				2005	.. math:: R^\top R = J^\top J
				2006
				2007	and then
				2008
				2009	.. math:: \left(J^\top J\right)^{-1} = \left(R^\top R\right)^{-1}
				2010
				2011	It a fast algorithm for sparse matrices that should be used when
				2012	the Jacobian matrix J is well conditioned. For ill-conditioned
				2013	matrices, this algorithm can fail unpredictabily. This is
				2014	because Cholesky factorization is not a rank-revealing
				2015	factorization, i.e., it cannot reliably detect when the matrix
				2016	being factorized is not of full
				2017	rank. ``SuiteSparse``/``CHOLMOD`` supplies a heuristic for
				2018	checking if the matrix is rank deficient (cholmod_rcond), but it
				2019	is only a heuristic and can have both false positive and false
				2020	negatives.
				2021
				2022	Recent versions of ``SuiteSparse`` (>= 4.2.0) provide a much more
				2023	efficient method for solving for rows of the covariance
				2024	matrix. Therefore, if you are doing ``SPARSE_CHOLESKY``, we strongly
				2025	recommend using a recent version of ``SuiteSparse``.
				2026
				2027	3. ``SPARSE_QR`` uses the ``SuiteSparseQR`` sparse QR factorization
				2028	library to compute the decomposition
				2029
				2030	.. math::
				2031
				2032	QR &= J\\
				2033	\left(J^\top J\right)^{-1} &= \left(R^\top R\right)^{-1}
				2034
				2035	It is a moderately fast algorithm for sparse matrices, which at
				2036	the price of more time and memory than the ``SPARSE_CHOLESKY``
				2037	algorithm is numerically better behaved and is rank revealing,
				2038	i.e., it can reliably detect when the Jacobian matrix is rank
				2039	deficient.
				2040
				2041	Neither ``SPARSE_CHOLESKY`` or ``SPARSE_QR`` are capable of computing
				2042	the covariance if the Jacobian is rank deficient.
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2043
				2044	.. member:: int Covariance::Options::min_reciprocal_condition_number
				2045
				2046	Default: :math:`10^{-14}`
				2047
				2048	If the Jacobian matrix is near singular, then inverting :math:`J'J`
				2049	will result in unreliable results, e.g, if
				2050
				2051	.. math::
				2052
				2053	J = \begin{bmatrix}
				2054	1.0& 1.0 \\
				2055	1.0& 1.0000001
				2056	\end{bmatrix}
				2057
				2058	which is essentially a rank deficient matrix, we have
				2059
				2060	.. math::
				2061
				2062	(J'J)^{-1} = \begin{bmatrix}
				2063	2.0471e+14& -2.0471e+14 \\
				2064	-2.0471e+14 2.0471e+14
				2065	\end{bmatrix}
				2066
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2067
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2068	This is not a useful result. Therefore, by default
				2069	:func:`Covariance::Compute` will return ``false`` if a rank
				2070	deficient Jacobian is encountered. How rank deficiency is detected
				2071	depends on the algorithm being used.
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2072
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2073	1. ``DENSE_SVD``
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2074
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2075	.. math:: \frac{\sigma_{\text{min}}}{\sigma_{\text{max}}} < \sqrt{\text{min_reciprocal_condition_number}}
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2076
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2077	where :math:`\sigma_{\text{min}}` and
				2078	:math:`\sigma_{\text{max}}` are the minimum and maxiumum
				2079	singular values of :math:`J` respectively.
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2080
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2081	2. ``SPARSE_CHOLESKY``
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2082
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2083	.. math:: \text{cholmod_rcond} < \text{min_reciprocal_conditioner_number}
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2084
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2085	Here cholmod_rcond is a crude estimate of the reciprocal
				2086	condition number of :math:`J^\top J` by using the maximum and
				2087	minimum diagonal entries of the Cholesky factor :math:`R`. There
				2088	are no theoretical guarantees associated with this test. It can
				2089	give false positives and negatives. Use at your own risk. The
				2090	default value of ``min_reciprocal_condition_number`` has been
				2091	set to a conservative value, and sometimes the
				2092	:func:`Covariance::Compute` may return false even if it is
				2093	possible to estimate the covariance reliably. In such cases, the
				2094	user should exercise their judgement before lowering the value
				2095	of ``min_reciprocal_condition_number``.
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2096
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2097	3. ``SPARSE_QR``
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2098
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2099	.. math:: \operatorname{rank}(J) < \operatorname{num\_col}(J)
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2100
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2101	Here :\math:`\operatorname{rank}(J)` is the estimate of the
				2102	rank of `J` returned by the ``SuiteSparseQR`` algorithm. It is
				2103	a fairly reliable indication of rank deficiency.
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2104
				2105	.. member:: int Covariance::Options::null_space_rank
				2106
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2107	When using ``DENSE_SVD``, the user has more control in dealing
				2108	with singular and near singular covariance matrices.
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2109
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2110	As mentioned above, when the covariance matrix is near singular,
				2111	instead of computing the inverse of :math:`J'J`, the Moore-Penrose
				2112	pseudoinverse of :math:`J'J` should be computed.
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2113
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2114	If :math:`J'J` has the eigen decomposition :math:`(\lambda_i,
				2115	e_i)`, where :math:`lambda_i` is the :math:`i^\textrm{th}`
				2116	eigenvalue and :math:`e_i` is the corresponding eigenvector, then
				2117	the inverse of :math:`J'J` is
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2118
Sameer Agarwal	42be9ca	2013-07-18 08:02:08 -0700	[diff] [blame]	2119	.. math:: (J'J)^{-1} = \sum_i \frac{1}{\lambda_i} e_i e_i'
				2120
				2121	and computing the pseudo inverse involves dropping terms from this
				2122	sum that correspond to small eigenvalues.
				2123
				2124	How terms are dropped is controlled by
				2125	`min_reciprocal_condition_number` and `null_space_rank`.
				2126
				2127	If `null_space_rank` is non-negative, then the smallest
				2128	`null_space_rank` eigenvalue/eigenvectors are dropped irrespective
				2129	of the magnitude of :math:`\lambda_i`. If the ratio of the
				2130	smallest non-zero eigenvalue to the largest eigenvalue in the
				2131	truncated matrix is still below min_reciprocal_condition_number,
				2132	then the `Covariance::Compute()` will fail and return `false`.
				2133
				2134	Setting `null_space_rank = -1` drops all terms for which
				2135
				2136	.. math:: \frac{\lambda_i}{\lambda_{\textrm{max}}} < \textrm{min_reciprocal_condition_number}
				2137
				2138	This option has no effect on ``SPARSE_QR`` and ``SPARSE_CHOLESKY``
				2139	algorithms.
Sameer Agarwal	1f17f56	2013-06-06 10:30:50 -0700	[diff] [blame]	2140
				2141	.. member:: bool Covariance::Options::apply_loss_function
				2142
				2143	Default: `true`
				2144
				2145	Even though the residual blocks in the problem may contain loss
				2146	functions, setting ``apply_loss_function`` to false will turn off
				2147	the application of the loss function to the output of the cost
				2148	function and in turn its effect on the covariance.
				2149
				2150	.. class:: Covariance
				2151
				2152	:class:`Covariance::Options` as the name implies is used to control
				2153	the covariance estimation algorithm. Covariance estimation is a
				2154	complicated and numerically sensitive procedure. Please read the
				2155	entire documentation for :class:`Covariance::Options` before using
				2156	:class:`Covariance`.
				2157
				2158	.. function:: bool Covariance::Compute(const vector<pair<const double, const double> >& covariance_blocks, Problem* problem)
				2159
				2160	Compute a part of the covariance matrix.
				2161
				2162	The vector ``covariance_blocks``, indexes into the covariance
				2163	matrix block-wise using pairs of parameter blocks. This allows the
				2164	covariance estimation algorithm to only compute and store these
				2165	blocks.
				2166
				2167	Since the covariance matrix is symmetric, if the user passes
				2168	``<block1, block2>``, then ``GetCovarianceBlock`` can be called with
				2169	``block1``, ``block2`` as well as ``block2``, ``block1``.
				2170
				2171	``covariance_blocks`` cannot contain duplicates. Bad things will
				2172	happen if they do.
				2173
				2174	Note that the list of ``covariance_blocks`` is only used to
				2175	determine what parts of the covariance matrix are computed. The
				2176	full Jacobian is used to do the computation, i.e. they do not have
				2177	an impact on what part of the Jacobian is used for computation.
				2178
				2179	The return value indicates the success or failure of the covariance
				2180	computation. Please see the documentation for
				2181	:class:`Covariance::Options` for more on the conditions under which
				2182	this function returns ``false``.
				2183
				2184	.. function:: bool GetCovarianceBlock(const double* parameter_block1, const double* parameter_block2, double* covariance_block) const
				2185
				2186	Return the block of the covariance matrix corresponding to
				2187	``parameter_block1`` and ``parameter_block2``.
				2188
				2189	Compute must be called before the first call to ``GetCovarianceBlock``
				2190	and the pair ``<parameter_block1, parameter_block2>`` OR the pair
				2191	``<parameter_block2, parameter_block1>`` must have been present in the
				2192	vector covariance_blocks when ``Compute`` was called. Otherwise
				2193	``GetCovarianceBlock`` will return false.
				2194
				2195	``covariance_block`` must point to a memory location that can store
				2196	a ``parameter_block1_size x parameter_block2_size`` matrix. The
				2197	returned covariance will be a row-major matrix.
				2198
				2199	Example Usage
				2200	-------------
				2201
				2202	.. code-block:: c++
				2203
				2204	double x[3];
				2205	double y[2];
				2206
				2207	Problem problem;
				2208	problem.AddParameterBlock(x, 3);
				2209	problem.AddParameterBlock(y, 2);
				2210	<Build Problem>
				2211	<Solve Problem>
				2212
				2213	Covariance::Options options;
				2214	Covariance covariance(options);
				2215
				2216	vector<pair<const double, const double> > covariance_blocks;
				2217	covariance_blocks.push_back(make_pair(x, x));
				2218	covariance_blocks.push_back(make_pair(y, y));
				2219	covariance_blocks.push_back(make_pair(x, y));
				2220
				2221	CHECK(covariance.Compute(covariance_blocks, &problem));
				2222
				2223	double covariance_xx[3 * 3];
				2224	double covariance_yy[2 * 2];
				2225	double covariance_xy[3 * 2];
				2226	covariance.GetCovarianceBlock(x, x, covariance_xx)
				2227	covariance.GetCovarianceBlock(y, y, covariance_yy)
				2228	covariance.GetCovarianceBlock(x, y, covariance_xy)
				2229