Blame - docs/source/solving.rst - ceres-solver

blob: 7f58ec8264e3116367d207d39e1e3709e2107d66 [file] [log] [blame]

Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	1
				2	.. default-domain:: cpp
				3
				4	.. cpp:namespace:: ceres
				5
				6	.. _chapter-solving:
				7
				8	==========
				9	Solver API
				10	==========
				11
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	12
				13	Introduction
				14	============
				15
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	16	Effective use of Ceres requires some familiarity with the basic
				17	components of a nonlinear least squares solver, so before we describe
				18	how to configure the solver, we will begin by taking a brief look at
				19	how some of the core optimization algorithms in Ceres work and the
				20	various linear solvers and preconditioners that power it.
				21
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	22
				23	Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
				24	variables, and
				25	:math:`F(x) = \left[f_1(x), ... , f_{m}(x) \right]^{\top}` be a
				26	:math:`m`-dimensional function of :math:`x`. We are interested in
				27	solving the following optimization problem [#f1]_ .
				28
				29	.. math:: \arg \min_x \frac{1}{2}\\|F(x)\\|^2\ .
				30	:label: nonlinsq
				31
				32	Here, the Jacobian :math:`J(x)` of :math:`F(x)` is an :math:`m\times
				33	n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)` and the
				34	gradient vector :math:`g(x) = \nabla \frac{1}{2}\\|F(x)\\|^2 = J(x)^\top
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	35	F(x)`. Since the efficient global minimization of :eq:`nonlinsq` for
				36	general :math:`F(x)` is an intractable problem, we will have to settle
				37	for finding a local minimum.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	38
				39	The general strategy when solving non-linear optimization problems is
				40	to solve a sequence of approximations to the original problem
				41	[NocedalWright]_. At each iteration, the approximation is solved to
				42	determine a correction :math:`\Delta x` to the vector :math:`x`. For
				43	non-linear least squares, an approximation can be constructed by using
				44	the linearization :math:`F(x+\Delta x) \approx F(x) + J(x)\Delta x`,
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	45	which leads to the following linear least squares problem:
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	46
				47	.. math:: \min_{\Delta x} \frac{1}{2}\\|J(x)\Delta x + F(x)\\|^2
				48	:label: linearapprox
				49
				50	Unfortunately, naively solving a sequence of these problems and
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	51	updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that
				52	may not converge. To get a convergent algorithm, we need to control
				53	the size of the step :math:`\Delta x`. Depending on how the size of
				54	the step :math:`\Delta x` is controlled, non-linear optimization
				55	algorithms can be divided into two major categories [NocedalWright]_.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	56
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	57	1. Trust Region The trust region approach approximates the
				58	objective function using using a model function (often a quadratic)
				59	over a subset of the search space known as the trust region. If the
				60	model function succeeds in minimizing the true objective function
				61	the trust region is expanded; conversely, otherwise it is
				62	contracted and the model optimization problem is solved again.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	63
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	64	2. Line Search The line search approach first finds a descent
				65	direction along which the objective function will be reduced and
				66	then computes a step size that decides how far should move along
				67	that direction. The descent direction can be computed by various
				68	methods, such as gradient descent, Newton's method and Quasi-Newton
				69	method. The step size can be determined either exactly or
				70	inexactly.
				71
				72	Trust region methods are in some sense dual to line search methods:
				73	trust region methods first choose a step size (the size of the trust
				74	region) and then a step direction while line search methods first
				75	choose a step direction and then a step size.
				76
				77	Ceres implements multiple algorithms in both categories.
				78
				79	.. _section-trust-region-methods:
				80
				81	Trust Region Methods
				82	====================
				83
				84	The basic trust region algorithm looks something like this.
				85
				86	1. Given an initial point :math:`x` and a trust region radius :math:`\mu`.
				87	2. :math:`\arg \min_{\Delta x} \frac{1}{2}\\|J(x)\Delta
				88	x + F(x)\\|^2` s.t. :math:`\\|D(x)\Delta x\\|^2 \le \mu`
				89	3. :math:`\rho = \frac{\displaystyle \\|F(x + \Delta x)\\|^2 -
				90	\\|F(x)\\|^2}{\displaystyle \\|J(x)\Delta x + F(x)\\|^2 -
				91	\\|F(x)\\|^2}`
				92	4. if :math:`\rho > \epsilon` then :math:`x = x + \Delta x`.
				93	5. if :math:`\rho > \eta_1` then :math:`\rho = 2 \rho`
				94	6. else if :math:`\rho < \eta_2` then :math:`\rho = 0.5 * \rho`
				95	7. Goto 2.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	96
				97	Here, :math:`\mu` is the trust region radius, :math:`D(x)` is some
				98	matrix used to define a metric on the domain of :math:`F(x)` and
				99	:math:`\rho` measures the quality of the step :math:`\Delta x`, i.e.,
				100	how well did the linear model predict the decrease in the value of the
				101	non-linear objective. The idea is to increase or decrease the radius
				102	of the trust region depending on how well the linearization predicts
				103	the behavior of the non-linear objective, which in turn is reflected
				104	in the value of :math:`\rho`.
				105
				106	The key computational step in a trust-region algorithm is the solution
				107	of the constrained optimization problem
				108
				109	.. math:: \arg\min_{\Delta x} \frac{1}{2}\\|J(x)\Delta x + F(x)\\|^2\quad \text{such that}\quad \\|D(x)\Delta x\\|^2 \le \mu
				110	:label: trp
				111
				112	There are a number of different ways of solving this problem, each
				113	giving rise to a different concrete trust-region algorithm. Currently
				114	Ceres, implements two trust-region algorithms - Levenberg-Marquardt
				115	and Dogleg. The user can choose between them by setting
				116	:member:`Solver::Options::trust_region_strategy_type`.
				117
				118	.. rubric:: Footnotes
				119
				120	.. [#f1] At the level of the non-linear solver, the block and
				121	structure is not relevant, therefore our discussion here is
				122	in terms of an optimization problem defined over a state
				123	vector of size :math:`n`.
				124
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	125
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	126	.. _section-levenberg-marquardt:
				127
				128	Levenberg-Marquardt
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	129	-------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	130
				131	The Levenberg-Marquardt algorithm [Levenberg]_ [Marquardt]_ is the
				132	most popular algorithm for solving non-linear least squares problems.
				133	It was also the first trust region algorithm to be developed
				134	[Levenberg]_ [Marquardt]_. Ceres implements an exact step [Madsen]_
				135	and an inexact step variant of the Levenberg-Marquardt algorithm
				136	[WrightHolt]_ [NashSofer]_.
				137
				138	It can be shown, that the solution to :eq:`trp` can be obtained by
				139	solving an unconstrained optimization of the form
				140
				141	.. math:: \arg\min_{\Delta x}& \frac{1}{2}\\|J(x)\Delta x + F(x)\\|^2 +\lambda \\|D(x)\Delta x\\|^2
				142
				143	Where, :math:`\lambda` is a Lagrange multiplier that is inverse
				144	related to :math:`\mu`. In Ceres, we solve for
				145
				146	.. math:: \arg\min_{\Delta x}& \frac{1}{2}\\|J(x)\Delta x + F(x)\\|^2 + \frac{1}{\mu} \\|D(x)\Delta x\\|^2
				147	:label: lsqr
				148
				149	The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
				150	the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
				151
				152	Before going further, let us make some notational simplifications. We
				153	will assume that the matrix :math:`\sqrt{\mu} D` has been concatenated
				154	at the bottom of the matrix :math:`J` and similarly a vector of zeros
				155	has been added to the bottom of the vector :math:`f` and the rest of
				156	our discussion will be in terms of :math:`J` and :math:`f`, i.e, the
				157	linear least squares problem.
				158
				159	.. math:: \min_{\Delta x} \frac{1}{2} \\|J(x)\Delta x + f(x)\\|^2 .
				160	:label: simple
				161
				162	For all but the smallest problems the solution of :eq:`simple` in
				163	each iteration of the Levenberg-Marquardt algorithm is the dominant
				164	computational cost in Ceres. Ceres provides a number of different
				165	options for solving :eq:`simple`. There are two major classes of
				166	methods - factorization and iterative.
				167
				168	The factorization methods are based on computing an exact solution of
				169	:eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact
				170	step Levenberg-Marquardt algorithm. But it is not clear if an exact
				171	solution of :eq:`lsqr` is necessary at each step of the LM algorithm
				172	to solve :eq:`nonlinsq`. In fact, we have already seen evidence
				173	that this may not be the case, as :eq:`lsqr` is itself a regularized
				174	version of :eq:`linearapprox`. Indeed, it is possible to
				175	construct non-linear optimization algorithms in which the linearized
				176	problem is solved approximately. These algorithms are known as inexact
				177	Newton or truncated Newton methods [NocedalWright]_.
				178
				179	An inexact Newton method requires two ingredients. First, a cheap
				180	method for approximately solving systems of linear
				181	equations. Typically an iterative linear solver like the Conjugate
				182	Gradients method is used for this
				183	purpose [NocedalWright]_. Second, a termination rule for
				184	the iterative solver. A typical termination rule is of the form
				185
				186	.. math:: \\|H(x) \Delta x + g(x)\\| \leq \eta_k \\|g(x)\\|.
				187	:label: inexact
				188
				189	Here, :math:`k` indicates the Levenberg-Marquardt iteration number and
				190	:math:`0 < \eta_k <1` is known as the forcing sequence. [WrightHolt]_
				191	prove that a truncated Levenberg-Marquardt algorithm that uses an
				192	inexact Newton step based on :eq:`inexact` converges for any
				193	sequence :math:`\eta_k \leq \eta_0 < 1` and the rate of convergence
				194	depends on the choice of the forcing sequence :math:`\eta_k`.
				195
				196	Ceres supports both exact and inexact step solution strategies. When
				197	the user chooses a factorization based linear solver, the exact step
				198	Levenberg-Marquardt algorithm is used. When the user chooses an
				199	iterative linear solver, the inexact step Levenberg-Marquardt
				200	algorithm is used.
				201
				202	.. _section-dogleg:
				203
				204	Dogleg
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	205	------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	206
				207	Another strategy for solving the trust region problem :eq:`trp` was
				208	introduced by M. J. D. Powell. The key idea there is to compute two
				209	vectors
				210
				211	.. math::
				212
				213	\Delta x^{\text{Gauss-Newton}} &= \arg \min_{\Delta x}\frac{1}{2} \\|J(x)\Delta x + f(x)\\|^2.\\
				214	\Delta x^{\text{Cauchy}} &= -\frac{\\|g(x)\\|^2}{\\|J(x)g(x)\\|^2}g(x).
				215
				216	Note that the vector :math:`\Delta x^{\text{Gauss-Newton}}` is the
				217	solution to :eq:`linearapprox` and :math:`\Delta
				218	x^{\text{Cauchy}}` is the vector that minimizes the linear
				219	approximation if we restrict ourselves to moving along the direction
				220	of the gradient. Dogleg methods finds a vector :math:`\Delta x`
				221	defined by :math:`\Delta x^{\text{Gauss-Newton}}` and :math:`\Delta
				222	x^{\text{Cauchy}}` that solves the trust region problem. Ceres
				223	supports two variants that can be chose by setting
				224	:member:`Solver::Options::dogleg_type`.
				225
				226	``TRADITIONAL_DOGLEG`` as described by Powell, constructs two line
				227	segments using the Gauss-Newton and Cauchy vectors and finds the point
				228	farthest along this line shaped like a dogleg (hence the name) that is
				229	contained in the trust-region. For more details on the exact reasoning
				230	and computations, please see Madsen et al [Madsen]_.
				231
				232	``SUBSPACE_DOGLEG`` is a more sophisticated method that considers the
				233	entire two dimensional subspace spanned by these two vectors and finds
Sameer Agarwal	fa21df8	2013-02-18 08:48:52 -0800	[diff] [blame^]	234	the point that minimizes the trust region problem in this subspace
				235	[ByrdSchnabel]_.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	236
				237	The key advantage of the Dogleg over Levenberg Marquardt is that if
				238	the step computation for a particular choice of :math:`\mu` does not
				239	result in sufficient decrease in the value of the objective function,
				240	Levenberg-Marquardt solves the linear approximation from scratch with
				241	a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
				242	to compute the interpolation between the Gauss-Newton and the Cauchy
				243	vectors, as neither of them depend on the value of :math:`\mu`.
				244
				245	The Dogleg method can only be used with the exact factorization based
				246	linear solvers.
				247
				248	.. _section-inner-iterations:
				249
				250	Inner Iterations
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	251	----------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	252
				253	Some non-linear least squares problems have additional structure in
				254	the way the parameter blocks interact that it is beneficial to modify
				255	the way the trust region step is computed. e.g., consider the
				256	following regression problem
				257
				258	.. math:: y = a_1 e^{b_1 x} + a_2 e^{b_3 x^2 + c_1}
				259
				260
				261	Given a set of pairs :math:`\{(x_i, y_i)\}`, the user wishes to estimate
				262	:math:`a_1, a_2, b_1, b_2`, and :math:`c_1`.
				263
				264	Notice that the expression on the left is linear in :math:`a_1` and
				265	:math:`a_2`, and given any value for :math:`b_1, b_2` and :math:`c_1`,
				266	it is possible to use linear regression to estimate the optimal values
				267	of :math:`a_1` and :math:`a_2`. It's possible to analytically
				268	eliminate the variables :math:`a_1` and :math:`a_2` from the problem
				269	entirely. Problems like these are known as separable least squares
				270	problem and the most famous algorithm for solving them is the Variable
				271	Projection algorithm invented by Golub & Pereyra [GolubPereyra]_.
				272
				273	Similar structure can be found in the matrix factorization with
				274	missing data problem. There the corresponding algorithm is known as
				275	Wiberg's algorithm [Wiberg]_.
				276
				277	Ruhe & Wedin present an analysis of various algorithms for solving
				278	separable non-linear least squares problems and refer to *Variable
				279	Projection* as Algorithm I in their paper [RuheWedin]_.
				280
				281	Implementing Variable Projection is tedious and expensive. Ruhe &
				282	Wedin present a simpler algorithm with comparable convergence
				283	properties, which they call Algorithm II. Algorithm II performs an
				284	additional optimization step to estimate :math:`a_1` and :math:`a_2`
				285	exactly after computing a successful Newton step.
				286
				287
				288	This idea can be generalized to cases where the residual is not
				289	linear in :math:`a_1` and :math:`a_2`, i.e.,
				290
				291	.. math:: y = f_1(a_1, e^{b_1 x}) + f_2(a_2, e^{b_3 x^2 + c_1})
				292
				293	In this case, we solve for the trust region step for the full problem,
				294	and then use it as the starting point to further optimize just `a_1`
				295	and `a_2`. For the linear case, this amounts to doing a single linear
				296	least squares solve. For non-linear problems, any method for solving
				297	the `a_1` and `a_2` optimization problems will do. The only constraint
				298	on `a_1` and `a_2` (if they are two different parameter block) is that
				299	they do not co-occur in a residual block.
				300
				301	This idea can be further generalized, by not just optimizing
				302	:math:`(a_1, a_2)`, but decomposing the graph corresponding to the
				303	Hessian matrix's sparsity structure into a collection of
				304	non-overlapping independent sets and optimizing each of them.
				305
				306	Setting :member:`Solver::Options::use_inner_iterations` to ``true``
				307	enables the use of this non-linear generalization of Ruhe & Wedin's
				308	Algorithm II. This version of Ceres has a higher iteration
				309	complexity, but also displays better convergence behavior per
				310	iteration.
				311
				312	Setting :member:`Solver::Options::num_threads` to the maximum number
				313	possible is highly recommended.
				314
				315	.. _section-non-monotonic-steps:
				316
				317	Non-monotonic Steps
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	318	-------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	319
				320	Note that the basic trust-region algorithm described in
				321	Algorithm~\ref{alg:trust-region} is a descent algorithm in that they
				322	only accepts a point if it strictly reduces the value of the objective
				323	function.
				324
				325	Relaxing this requirement allows the algorithm to be more efficient in
				326	the long term at the cost of some local increase in the value of the
				327	objective function.
				328
				329	This is because allowing for non-decreasing objective function values
				330	in a princpled manner allows the algorithm to jump over boulders as
				331	the method is not restricted to move into narrow valleys while
				332	preserving its convergence properties.
				333
				334	Setting :member:`Solver::Options::use_nonmonotonic_steps` to ``true``
				335	enables the non-monotonic trust region algorithm as described by Conn,
				336	Gould & Toint in [Conn]_.
				337
				338	Even though the value of the objective function may be larger
				339	than the minimum value encountered over the course of the
				340	optimization, the final parameters returned to the user are the
				341	ones corresponding to the minimum cost over all iterations.
				342
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	343	The option to take non-monotonic steps is available for all trust
				344	region strategies.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	345
				346
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	347	.. _section-line-search-methods:
				348
				349	Line Search Methods
				350	===================
				351
				352	**The implementation of line search algorithms in Ceres Solver is
				353	fairly new and not very well tested, so for now this part of the
				354	solver should be considered beta quality. We welcome reports of your
				355	experiences both good and bad on the mailinglist.**
				356
				357	Line search algorithms
				358
				359	1. Given an initial point :math:`x`
				360	2. :math:`\Delta x = -H^{-1}(x) g(x)`
				361	3. :math:`\arg \min_\mu \frac{1}{2} \\| F(x + \mu \Delta x) \\|^2`
				362	4. :math:`x = x + \mu \Delta x`
				363	5. Goto 2.
				364
				365	Here :math:`H(x)` is some approximation to the Hessian of the
				366	objective function, and :math:`g(x)` is the gradient at
				367	:math:`x`. Depending on the choice of :math:`H(x)` we get a variety of
				368	different search directions -`\Delta x`.
				369
				370	Step 4, which is a one dimensional optimization or `Line Search` along
				371	:math:`\Delta x` is what gives this class of methods its name.
				372
				373	Different line search algorithms differ in their choice of the search
				374	direction :math:`\Delta x` and the method used for one dimensional
				375	optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
				376	primary source of computational complexity in these
				377	methods. Currently, Ceres Solver supports three choices of search
				378	directions, all aimed at large scale problems.
				379
				380	1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
				381	be the identity matrix. This is not a good search direction for
				382	anything but the simplest of the problems. It is only included here
				383	for completeness.
				384
				385	2. ``NONLINEAR_CONJUGATE_GRADIENT`` A generalization of the Conjugate
				386	Gradient method to non-linear functions. The generalization can be
				387	performed in a number of different ways, resulting in a variety of
				388	search directions. Ceres Solver currently supports
				389	``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and ``HESTENES_STIEFEL``
				390	directions.
				391
				392	3. ``LBFGS`` In this method, a limited memory approximation to the
				393	inverse Hessian is maintained and used to compute a quasi-Newton
				394	step [Nocedal]_, [ByrdNocedal]_.
				395
				396	Currently Ceres Solver uses a backtracking and interpolation based
				397	Armijo line search algorithm.
				398
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	399	.. _section-linear-solver:
				400
				401	LinearSolver
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	402	============
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	403
				404	Recall that in both of the trust-region methods described above, the
				405	key computational cost is the solution of a linear least squares
				406	problem of the form
				407
				408	.. math:: \min_{\Delta x} \frac{1}{2} \\|J(x)\Delta x + f(x)\\|^2 .
				409	:label: simple2
				410
				411	Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
				412	f(x)`. For notational convenience let us also drop the dependence on
				413	:math:`x`. Then it is easy to see that solving :eq:`simple2` is
				414	equivalent to solving the normal equations.
				415
				416	.. math:: H \Delta x = g
				417	:label: normal
				418
				419	Ceres provides a number of different options for solving :eq:`normal`.
				420
				421	.. _section-qr:
				422
				423	``DENSE_QR``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	424	------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	425
				426	For small problems (a couple of hundred parameters and a few thousand
				427	residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
				428	of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
				429	:math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
				430	an upper triangular matrix [TrefethenBau]_. Then it can be shown that
				431	the solution to :eq:`normal` is given by
				432
				433	.. math:: \Delta x^* = -R^{-1}Q^\top f
				434
				435
				436	Ceres uses ``Eigen`` 's dense QR factorization routines.
				437
				438	.. _section-cholesky:
				439
				440	``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	441	------------------------------------------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	442
				443	Large non-linear least square problems are usually sparse. In such
				444	cases, using a dense QR factorization is inefficient. Let :math:`H =
				445	R^\top R` be the Cholesky factorization of the normal equations, where
				446	:math:`R` is an upper triangular matrix, then the solution to
				447	:eq:`normal` is given by
				448
				449	.. math::
				450
				451	\Delta x^* = R^{-1} R^{-\top} g.
				452
				453
				454	The observant reader will note that the :math:`R` in the Cholesky
				455	factorization of :math:`H` is the same upper triangular matrix
				456	:math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
				457	orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
				458	Q^\top Q R = R^\top R`. There are two variants of Cholesky
				459	factorization -- sparse and dense.
				460
				461	``DENSE_NORMAL_CHOLESKY`` as the name implies performs a dense
				462	Cholesky factorization of the normal equations. Ceres uses
				463	``Eigen`` 's dense LDLT factorization routines.
				464
				465	``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse
				466	Cholesky factorization of the normal equations. This leads to
				467	substantial savings in time and memory for large sparse
				468	problems. Ceres uses the sparse Cholesky factorization routines in
				469	Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_.
				470
				471	.. _section-schur:
				472
				473	``DENSE_SCHUR`` & ``SPARSE_SCHUR``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	474	----------------------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	475
				476	While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
				477	adjustment problems, bundle adjustment problem have a special
				478	structure, and a more efficient scheme for solving :eq:`normal`
				479	can be constructed.
				480
				481	Suppose that the SfM problem consists of :math:`p` cameras and
				482	:math:`q` points and the variable vector :math:`x` has the block
				483	structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
				484	:math:`y` and :math:`z` correspond to camera and point parameters,
				485	respectively. Further, let the camera blocks be of size :math:`c` and
				486	the point blocks be of size :math:`s` (for most problems :math:`c` =
				487	:math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
				488	requirement on these block sizes, but choosing them to be constant
				489	simplifies the exposition.
				490
				491	A key characteristic of the bundle adjustment problem is that there is
				492	no term :math:`f_{i}` that includes two or more point blocks. This in
				493	turn implies that the matrix :math:`H` is of the form
				494
				495	.. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
				496	:label: hblock
				497
				498	where, :math:`B \in \mathbb{R}^{pc\times pc}` is a block sparse matrix
				499	with :math:`p` blocks of size :math:`c\times c` and :math:`C \in
				500	\mathbb{R}^{qs\times qs}` is a block diagonal matrix with :math:`q` blocks
				501	of size :math:`s\times s`. :math:`E \in \mathbb{R}^{pc\times qs}` is a
				502	general block sparse matrix, with a block of size :math:`c\times s`
				503	for each observation. Let us now block partition :math:`\Delta x =
				504	[\Delta y,\Delta z]` and :math:`g=[v,w]` to restate :eq:`normal`
				505	as the block structured linear system
				506
				507	.. math:: \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix}
				508	\right]\left[ \begin{matrix} \Delta y \\ \Delta z
				509	\end{matrix} \right] = \left[ \begin{matrix} v\\ w
				510	\end{matrix} \right]\ ,
				511	:label: linear2
				512
				513	and apply Gaussian elimination to it. As we noted above, :math:`C` is
				514	a block diagonal matrix, with small diagonal blocks of size
				515	:math:`s\times s`. Thus, calculating the inverse of :math:`C` by
				516	inverting each of these blocks is cheap. This allows us to eliminate
				517	:math:`\Delta z` by observing that :math:`\Delta z = C^{-1}(w - E^\top
				518	\Delta y)`, giving us
				519
				520	.. math:: \left[B - EC^{-1}E^\top\right] \Delta y = v - EC^{-1}w\ .
				521	:label: schur
				522
				523	The matrix
				524
				525	.. math:: S = B - EC^{-1}E^\top
				526
				527	is the Schur complement of :math:`C` in :math:`H`. It is also known as
				528	the reduced camera matrix, because the only variables
				529	participating in :eq:`schur` are the ones corresponding to the
				530	cameras. :math:`S \in \mathbb{R}^{pc\times pc}` is a block structured
				531	symmetric positive definite matrix, with blocks of size :math:`c\times
				532	c`. The block :math:`S_{ij}` corresponding to the pair of images
				533	:math:`i` and :math:`j` is non-zero if and only if the two images
				534	observe at least one common point.
				535
				536
				537	Now, eq-linear2 can be solved by first forming :math:`S`, solving for
				538	:math:`\Delta y`, and then back-substituting :math:`\Delta y` to
				539	obtain the value of :math:`\Delta z`. Thus, the solution of what was
				540	an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
				541	inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
				542	and matrix-vector multiplies, and the solution of block sparse
				543	:math:`pc\times pc` linear system :eq:`schur`. For almost all
				544	problems, the number of cameras is much smaller than the number of
				545	points, :math:`p \ll q`, thus solving :eq:`schur` is
				546	significantly cheaper than solving :eq:`linear2`. This is the
				547	Schur complement trick [Brown]_.
				548
				549	This still leaves open the question of solving :eq:`schur`. The
				550	method of choice for solving symmetric positive definite systems
				551	exactly is via the Cholesky factorization [TrefethenBau]_ and
				552	depending upon the structure of the matrix, there are, in general, two
				553	options. The first is direct factorization, where we store and factor
				554	:math:`S` as a dense matrix [TrefethenBau]_. This method has
				555	:math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
				556	is only practical for problems with up to a few hundred cameras. Ceres
				557	implements this strategy as the ``DENSE_SCHUR`` solver.
				558
				559
				560	But, :math:`S` is typically a fairly sparse matrix, as most images
				561	only see a small fraction of the scene. This leads us to the second
				562	option: Sparse Direct Methods. These methods store :math:`S` as a
				563	sparse matrix, use row and column re-ordering algorithms to maximize
				564	the sparsity of the Cholesky decomposition, and focus their compute
				565	effort on the non-zero part of the factorization [Chen]_. Sparse
				566	direct methods, depending on the exact sparsity structure of the Schur
				567	complement, allow bundle adjustment algorithms to significantly scale
				568	up over those based on dense factorization. Ceres implements this
				569	strategy as the ``SPARSE_SCHUR`` solver.
				570
				571	.. _section-cgnr:
				572
				573	``CGNR``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	574	--------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	575
				576	For general sparse problems, if the problem is too large for
				577	``CHOLMOD`` or a sparse linear algebra library is not linked into
				578	Ceres, another option is the ``CGNR`` solver. This solver uses the
				579	Conjugate Gradients solver on the normal equations, but without
				580	forming the normal equations explicitly. It exploits the relation
				581
				582	.. math::
				583	H x = J^\top J x = J^\top(J x)
				584
				585
				586	When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres
				587	automatically switches from the exact step algorithm to an inexact
				588	step algorithm.
				589
				590	.. _section-iterative_schur:
				591
				592	``ITERATIVE_SCHUR``
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	593	-------------------
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	594
				595	Another option for bundle adjustment problems is to apply PCG to the
				596	reduced camera matrix :math:`S` instead of :math:`H`. One reason to do
				597	this is that :math:`S` is a much smaller matrix than :math:`H`, but
				598	more importantly, it can be shown that :math:`\kappa(S)\leq
				599	\kappa(H)`. Cseres implements PCG on :math:`S` as the
				600	``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR``
				601	as the linear solver, Ceres automatically switches from the exact step
				602	algorithm to an inexact step algorithm.
				603
				604	The cost of forming and storing the Schur complement :math:`S` can be
				605	prohibitive for large problems. Indeed, for an inexact Newton solver
				606	that computes :math:`S` and runs PCG on it, almost all of its time is
				607	spent in constructing :math:`S`; the time spent inside the PCG
				608	algorithm is negligible in comparison. Because PCG only needs access
				609	to :math:`S` via its product with a vector, one way to evaluate
				610	:math:`Sx` is to observe that
				611
				612	.. math:: x_1 &= E^\top x
				613	.. math:: x_2 &= C^{-1} x_1
				614	.. math:: x_3 &= Ex_2\\
				615	.. math:: x_4 &= Bx\\
				616	.. math:: Sx &= x_4 - x_3
				617	:label: schurtrick1
				618
				619	Thus, we can run PCG on :math:`S` with the same computational effort
				620	per iteration as PCG on :math:`H`, while reaping the benefits of a
				621	more powerful preconditioner. In fact, we do not even need to compute
				622	:math:`H`, :eq:`schurtrick1` can be implemented using just the columns
				623	of :math:`J`.
				624
				625	Equation :eq:`schurtrick1` is closely related to *Domain
				626	Decomposition methods* for solving large linear systems that arise in
				627	structural engineering and partial differential equations. In the
				628	language of Domain Decomposition, each point in a bundle adjustment
				629	problem is a domain, and the cameras form the interface between these
				630	domains. The iterative solution of the Schur complement then falls
				631	within the sub-category of techniques known as Iterative
				632	Sub-structuring [Saad]_ [Mathew]_.
				633
				634	.. _section-preconditioner:
				635
				636	Preconditioner
				637	--------------
				638
				639	The convergence rate of Conjugate Gradients for
				640	solving :eq:`normal` depends on the distribution of eigenvalues
				641	of :math:`H` [Saad]_. A useful upper bound is
				642	:math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
				643	number of the matrix :math:`H`. For most bundle adjustment problems,
				644	:math:`\kappa(H)` is high and a direct application of Conjugate
				645	Gradients to :eq:`normal` results in extremely poor performance.
				646
				647	The solution to this problem is to replace :eq:`normal` with a
				648	preconditioned system. Given a linear system, :math:`Ax =b` and a
				649	preconditioner :math:`M` the preconditioned system is given by
				650	:math:`M^{-1}Ax = M^{-1}b`. The resulting algorithm is known as
				651	Preconditioned Conjugate Gradients algorithm (PCG) and its worst case
				652	complexity now depends on the condition number of the preconditioned
				653	matrix :math:`\kappa(M^{-1}A)`.
				654
				655	The computational cost of using a preconditioner :math:`M` is the cost
				656	of computing :math:`M` and evaluating the product :math:`M^{-1}y` for
				657	arbitrary vectors :math:`y`. Thus, there are two competing factors to
				658	consider: How much of :math:`H`'s structure is captured by :math:`M`
				659	so that the condition number :math:`\kappa(HM^{-1})` is low, and the
				660	computational cost of constructing and using :math:`M`. The ideal
				661	preconditioner would be one for which :math:`\kappa(M^{-1}A)
				662	=1`. :math:`M=A` achieves this, but it is not a practical choice, as
				663	applying this preconditioner would require solving a linear system
				664	equivalent to the unpreconditioned problem. It is usually the case
				665	that the more information :math:`M` has about :math:`H`, the more
				666	expensive it is use. For example, Incomplete Cholesky factorization
				667	based preconditioners have much better convergence behavior than the
				668	Jacobi preconditioner, but are also much more expensive.
				669
				670
				671	The simplest of all preconditioners is the diagonal or Jacobi
				672	preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
				673	block structured matrices like :math:`H` can be generalized to the
				674	block Jacobi preconditioner.
				675
				676	For ``ITERATIVE_SCHUR`` there are two obvious choices for block
				677	diagonal preconditioners for :math:`S`. The block diagonal of the
				678	matrix :math:`B` [Mandel]_ and the block diagonal :math:`S`, i.e, the
				679	block Jacobi preconditioner for :math:`S`. Ceres's implements both of
				680	these preconditioners and refers to them as ``JACOBI`` and
				681	``SCHUR_JACOBI`` respectively.
				682
				683	For bundle adjustment problems arising in reconstruction from
				684	community photo collections, more effective preconditioners can be
				685	constructed by analyzing and exploiting the camera-point visibility
				686	structure of the scene [KushalAgarwal]. Ceres implements the two
				687	visibility based preconditioners described by Kushal & Agarwal as
				688	``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. These are fairly new
				689	preconditioners and Ceres' implementation of them is in its early
				690	stages and is not as mature as the other preconditioners described
				691	above.
				692
				693	.. _section-ordering:
				694
				695	Ordering
				696	--------
				697
				698	The order in which variables are eliminated in a linear solver can
				699	have a significant of impact on the efficiency and accuracy of the
				700	method. For example when doing sparse Cholesky factorization, there
				701	are matrices for which a good ordering will give a Cholesky factor
				702	with :math:`O(n)` storage, where as a bad ordering will result in an
				703	completely dense factor.
				704
				705	Ceres allows the user to provide varying amounts of hints to the
				706	solver about the variable elimination ordering to use. This can range
				707	from no hints, where the solver is free to decide the best ordering
				708	based on the user's choices like the linear solver being used, to an
				709	exact order in which the variables should be eliminated, and a variety
				710	of possibilities in between.
				711
				712	Instances of the :class:`ParameterBlockOrdering` class are used to
				713	communicate this information to Ceres.
				714
				715	Formally an ordering is an ordered partitioning of the parameter
				716	blocks. Each parameter block belongs to exactly one group, and each
				717	group has a unique integer associated with it, that determines its
				718	order in the set of groups. We call these groups Elimination Groups
				719
				720	Given such an ordering, Ceres ensures that the parameter blocks in the
				721	lowest numbered elimination group are eliminated first, and then the
				722	parameter blocks in the next lowest numbered elimination group and so
				723	on. Within each elimination group, Ceres is free to order the
				724	parameter blocks as it chooses. e.g. Consider the linear system
				725
				726	.. math::
				727	x + y &= 3\\
				728	2x + 3y &= 7
				729
				730	There are two ways in which it can be solved. First eliminating
				731	:math:`x` from the two equations, solving for y and then back
				732	substituting for :math:`x`, or first eliminating :math:`y`, solving
				733	for :math:`x` and back substituting for :math:`y`. The user can
				734	construct three orderings here.
				735
				736	1. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
				737	2. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
				738	3. :math:`\{0: x, y\}` : Solver gets to decide the elimination order.
				739
				740	Thus, to have Ceres determine the ordering automatically using
				741	heuristics, put all the variables in the same elimination group. The
				742	identity of the group does not matter. This is the same as not
				743	specifying an ordering at all. To control the ordering for every
				744	variable, create an elimination group per variable, ordering them in
				745	the desired order.
				746
				747	If the user is using one of the Schur solvers (``DENSE_SCHUR``,
				748	``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an
				749	ordering, it must have one important property. The lowest numbered
				750	elimination group must form an independent set in the graph
				751	corresponding to the Hessian, or in other words, no two parameter
				752	blocks in in the first elimination group should co-occur in the same
				753	residual block. For the best performance, this elimination group
				754	should be as large as possible. For standard bundle adjustment
				755	problems, this corresponds to the first elimination group containing
				756	all the 3d points, and the second containing the all the cameras
				757	parameter blocks.
				758
				759	If the user leaves the choice to Ceres, then the solver uses an
				760	approximate maximum independent set algorithm to identify the first
				761	elimination group [LiSaad]_.
				762
				763	.. _section-solver-options:
				764
				765	:class:`Solver::Options`
				766	------------------------
				767
				768	.. class:: Solver::Options
				769
				770	:class:`Solver::Options` controls the overall behavior of the
				771	solver. We list the various settings and their default values below.
				772
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	773
				774	.. member:: MinimizerType Solver::Options::minimizer_type
				775
				776	Default: ``TRUST_REGION``
				777
				778	Choose between ``LINE_SEARCH`` and ``TRUST_REGION`` algorithms. See
				779	:ref:`section-trust-region-methods` and
				780	:ref:`section-line-search-methods` for more details.
				781
				782	.. member:: LineSearchDirectionType Solver::Options::line_search_direction_type
				783
				784	Default: ``LBFGS``
				785
				786	Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``
				787	and ``LBFGS``.
				788
				789	.. member:: LineSearchType Solver::Options::line_search_type
				790
				791	Default: ``ARMIJO``
				792
				793	``ARMIJO`` is the only choice right now.
				794
Sameer Agarwal	fa21df8	2013-02-18 08:48:52 -0800	[diff] [blame^]	795	.. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type
Sameer Agarwal	fbbea46	2013-02-15 11:25:03 -0800	[diff] [blame]	796
				797	Default: ``FLETCHER_REEVES``
				798
				799	Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and
				800	``HESTENES_STIEFEL``.
				801
				802	.. member:: int Solver::Options::max_lbfs_rank
				803
				804	Default: 20
				805
				806	The LBFGS hessian approximation is a low rank approximation to the
				807	inverse of the Hessian matrix. The rank of the approximation
				808	determines (linearly) the space and time complexity of using the
				809	approximation. Higher the rank, the better is the quality of the
				810	approximation. The increase in quality is however is bounded for a
				811	number of reasons.
				812
				813	1. The method only uses secant information and not actual
				814	derivatives.
				815
				816	2. The Hessian approximation is constrained to be positive
				817	definite.
				818
				819	So increasing this rank to a large number will cost time and space
				820	complexity without the corresponding increase in solution
				821	quality. There are no hard and fast rules for choosing the maximum
				822	rank. The best choice usually requires some problem specific
				823	experimentation.
				824
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	825	.. member:: TrustRegionStrategyType Solver::Options::trust_region_strategy_type
				826
				827	Default: ``LEVENBERG_MARQUARDT``
				828
				829	The trust region step computation algorithm used by
				830	Ceres. Currently ``LEVENBERG_MARQUARDT`` and ``DOGLEG`` are the two
				831	valid choices. See :ref:`section-levenberg-marquardt` and
				832	:ref:`section-dogleg` for more details.
				833
				834	.. member:: DoglegType Solver::Options::dogleg_type
				835
				836	Default: ``TRADITIONAL_DOGLEG``
				837
				838	Ceres supports two different dogleg strategies.
				839	``TRADITIONAL_DOGLEG`` method by Powell and the ``SUBSPACE_DOGLEG``
Sameer Agarwal	fa21df8	2013-02-18 08:48:52 -0800	[diff] [blame^]	840	method described by [ByrdSchnabel]_ . See :ref:`section-dogleg`
				841	for more details.
Sameer Agarwal	3d87b72	2013-02-02 00:49:31 -0800	[diff] [blame]	842
				843	.. member:: bool Solver::Options::use_nonmonotonic_steps
				844
				845	Default: ``false``
				846
				847	Relax the requirement that the trust-region algorithm take strictly
				848	decreasing steps. See :ref:`section-non-monotonic-steps` for more
				849	details.
				850
				851	.. member:: int Solver::Options::max_consecutive_nonmonotonic_steps
				852
				853	Default: ``5``
				854
				855	The window size used by the step selection algorithm to accept
				856	non-monotonic steps.
				857
				858	.. member:: int Solver::Options::max_num_iterations
				859
				860	Default: ``50``
				861
				862	Maximum number of iterations for which the solver should run.
				863
				864	.. member:: double Solver::Options::max_solver_time_in_seconds
				865
				866	Default: ``1e6``
				867	Maximum amount of time for which the solver should run.
				868
				869	.. member:: int Solver::Options::num_threads
				870
				871	Default: ``1``
				872
				873	Number of threads used by Ceres to evaluate the Jacobian.
				874
				875	.. member:: double Solver::Options::initial_trust_region_radius
				876
				877	Default: ``1e4``
				878
				879	The size of the initial trust region. When the
				880	``LEVENBERG_MARQUARDT`` strategy is used, the reciprocal of this
				881	number is the initial regularization parameter.
				882
				883	.. member:: double Solver::Options::max_trust_region_radius
				884
				885	Default: ``1e16``
				886
				887	The trust region radius is not allowed to grow beyond this value.
				888
				889	.. member:: double Solver::Options::min_trust_region_radius
				890
				891	Default: ``1e-32``
				892
				893	The solver terminates, when the trust region becomes smaller than
				894	this value.
				895
				896	.. member:: double Solver::Options::min_relative_decrease
				897
				898	Default: ``1e-3``
				899
				900	Lower threshold for relative decrease before a trust-region step is
				901	acceped.
				902
				903	.. member:: double Solver::Options::lm_min_diagonal
				904
				905	Default: ``1e6``
				906
				907	The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
				908	regularize the the trust region step. This is the lower bound on
				909	the values of this diagonal matrix.
				910
				911	.. member:: double Solver::Options::lm_max_diagonal
				912
				913	Default: ``1e32``
				914
				915	The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
				916	regularize the the trust region step. This is the upper bound on
				917	the values of this diagonal matrix.
				918
				919	.. member:: int Solver::Options::max_num_consecutive_invalid_steps
				920
				921	Default: ``5``
				922
				923	The step returned by a trust region strategy can sometimes be
				924	numerically invalid, usually because of conditioning
				925	issues. Instead of crashing or stopping the optimization, the
				926	optimizer can go ahead and try solving with a smaller trust
				927	region/better conditioned problem. This parameter sets the number
				928	of consecutive retries before the minimizer gives up.
				929
				930	.. member:: double Solver::Options::function_tolerance
				931
				932	Default: ``1e-6``
				933
				934	Solver terminates if
				935
				936	.. math:: \frac{\|\Delta \text{cost}\|}{\text{cost} < \text{function_tolerance}}
				937
				938	where, :math:`\Delta \text{cost}` is the change in objective function
				939	value (up or down) in the current iteration of Levenberg-Marquardt.
				940
				941	.. member:: double Solver::Options::gradient_tolerance
				942
				943	Default: ``1e-10``
				944
				945	Solver terminates if
				946
				947	.. math:: \frac{\\|g(x)\\|_\infty}{\\|g(x_0)\\|_\infty} < \text{gradient_tolerance}
				948
				949	where :math:`\\|\cdot\\|_\infty` refers to the max norm, and :math:`x_0` is
				950	the vector of initial parameter values.
				951
				952	.. member:: double Solver::Options::parameter_tolerance
				953
				954	Default: ``1e-8``
				955
				956	Solver terminates if
				957
				958	.. math:: \\|\Delta x\\| < (\\|x\\| + \text{parameter_tolerance}) * \text{parameter_tolerance}
				959
				960	where :math:`\Delta x` is the step computed by the linear solver in the
				961	current iteration of Levenberg-Marquardt.
				962
				963	.. member:: LinearSolverType Solver::Options::linear_solver_type
				964
				965	Default: ``SPARSE_NORMAL_CHOLESKY`` / ``DENSE_QR``
				966
				967	Type of linear solver used to compute the solution to the linear
				968	least squares problem in each iteration of the Levenberg-Marquardt
				969	algorithm. If Ceres is build with ``SuiteSparse`` linked in then
				970	the default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR``
				971	otherwise.
				972
				973	.. member:: PreconditionerType Solver::Options::preconditioner_type
				974
				975	Default: ``JACOBI``
				976
				977	The preconditioner used by the iterative linear solver. The default
				978	is the block Jacobi preconditioner. Valid values are (in increasing
				979	order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``,
				980	``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See
				981	:ref:`section-preconditioner` for more details.
				982
				983	.. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library
				984
				985	Default:``SUITE_SPARSE``
				986
				987	Ceres supports the use of two sparse linear algebra libraries,
				988	``SuiteSparse``, which is enabled by setting this parameter to
				989	``SUITE_SPARSE`` and ``CXSparse``, which can be selected by setting
				990	this parameter to ```CX_SPARSE``. ``SuiteSparse`` is a
				991	sophisticated and complex sparse linear algebra library and should
				992	be used in general. If your needs/platforms prevent you from using
				993	``SuiteSparse``, consider using ``CXSparse``, which is a much
				994	smaller, easier to build library. As can be expected, its
				995	performance on large problems is not comparable to that of
				996	``SuiteSparse``.
				997
				998	.. member:: int Solver::Options::num_linear_solver_threads
				999
				1000	Default: ``1``
				1001
				1002	Number of threads used by the linear solver.
				1003
				1004	.. member:: bool Solver::Options::use_inner_iterations
				1005
				1006	Default: ``false``
				1007
				1008	Use a non-linear version of a simplified variable projection
				1009	algorithm. Essentially this amounts to doing a further optimization
				1010	on each Newton/Trust region step using a coordinate descent
				1011	algorithm. For more details, see :ref:`section-inner-iterations`.
				1012
				1013	.. member:: ParameterBlockOrdering* Solver::Options::inner_iteration_ordering
				1014
				1015	Default: ``NULL``
				1016
				1017	If :member:`Solver::Options::use_inner_iterations` true, then the user has
				1018	two choices.
				1019
				1020	1. Let the solver heuristically decide which parameter blocks to
				1021	optimize in each inner iteration. To do this, set
				1022	:member:`Solver::Options::inner_iteration_ordering` to ``NULL``.
				1023
				1024	2. Specify a collection of of ordered independent sets. The lower
				1025	numbered groups are optimized before the higher number groups
				1026	during the inner optimization phase. Each group must be an
				1027	independent set.
				1028
				1029	See :ref:`section-ordering` for more details.
				1030
				1031	.. member:: ParameterBlockOrdering* Solver::Options::linear_solver_ordering
				1032
				1033	Default: ``NULL``
				1034
				1035	An instance of the ordering object informs the solver about the
				1036	desired order in which parameter blocks should be eliminated by the
				1037	linear solvers. See section~\ref{sec:ordering`` for more details.
				1038
				1039	If ``NULL``, the solver is free to choose an ordering that it
				1040	thinks is best. Note: currently, this option only has an effect on
				1041	the Schur type solvers, support for the ``SPARSE_NORMAL_CHOLESKY``
				1042	solver is forth coming.
				1043
				1044	See :ref:`section-ordering` for more details.
				1045
				1046	.. member:: bool Solver::Options::use_block_amd
				1047
				1048	Default: ``true``
				1049
				1050	By virtue of the modeling layer in Ceres being block oriented, all
				1051	the matrices used by Ceres are also block oriented. When doing
				1052	sparse direct factorization of these matrices, the fill-reducing
				1053	ordering algorithms can either be run on the block or the scalar
				1054	form of these matrices. Running it on the block form exposes more
				1055	of the super-nodal structure of the matrix to the Cholesky
				1056	factorization routines. This leads to substantial gains in
				1057	factorization performance. Setting this parameter to true, enables
				1058	the use of a block oriented Approximate Minimum Degree ordering
				1059	algorithm. Settings it to ``false``, uses a scalar AMD
				1060	algorithm. This option only makes sense when using
				1061	:member:`Solver::Options::sparse_linear_algebra_library` = ``SUITE_SPARSE``
				1062	as it uses the ``AMD`` package that is part of ``SuiteSparse``.
				1063
				1064	.. member:: int Solver::Options::linear_solver_min_num_iterations
				1065
				1066	Default: ``1``
				1067
				1068	Minimum number of iterations used by the linear solver. This only
				1069	makes sense when the linear solver is an iterative solver, e.g.,
				1070	``ITERATIVE_SCHUR`` or ``CGNR``.
				1071
				1072	.. member:: int Solver::Options::linear_solver_max_num_iterations
				1073
				1074	Default: ``500``
				1075
				1076	Minimum number of iterations used by the linear solver. This only
				1077	makes sense when the linear solver is an iterative solver, e.g.,
				1078	``ITERATIVE_SCHUR`` or ``CGNR``.
				1079
				1080	.. member:: double Solver::Options::eta
				1081
				1082	Default: ``1e-1``
				1083
				1084	Forcing sequence parameter. The truncated Newton solver uses this
				1085	number to control the relative accuracy with which the Newton step
				1086	is computed. This constant is passed to
				1087	``ConjugateGradientsSolver`` which uses it to terminate the
				1088	iterations when
				1089
				1090	.. math:: \frac{Q_i - Q_{i-1}}{Q_i} < \frac{\eta}{i}
				1091
				1092	.. member:: bool Solver::Options::jacobi_scaling
				1093
				1094	Default: ``true``
				1095
				1096	``true`` means that the Jacobian is scaled by the norm of its
				1097	columns before being passed to the linear solver. This improves the
				1098	numerical conditioning of the normal equations.
				1099
				1100	.. member:: LoggingType Solver::Options::logging_type
				1101
				1102	Default: ``PER_MINIMIZER_ITERATION``
				1103
				1104	.. member:: bool Solver::Options::minimizer_progress_to_stdout
				1105
				1106	Default: ``false``
				1107
				1108	By default the :class:`Minimizer` progress is logged to ``STDERR``
				1109	depending on the ``vlog`` level. If this flag is set to true, and
				1110	:member:`Solver::Options::logging_type` is not ``SILENT``, the logging
				1111	output is sent to ``STDOUT``.
				1112
				1113	.. member:: bool Solver::Options::return_initial_residuals
				1114
				1115	Default: ``false``
				1116
				1117	.. member:: bool Solver::Options::return_final_residuals
				1118
				1119	Default: ``false``
				1120
				1121	If true, the vectors :member:`Solver::Summary::initial_residuals` and
				1122	:member:`Solver::Summary::final_residuals` are filled with the residuals
				1123	before and after the optimization. The entries of these vectors are
				1124	in the order in which ResidualBlocks were added to the Problem
				1125	object.
				1126
				1127	.. member:: bool Solver::Options::return_initial_gradient
				1128
				1129	Default: ``false``
				1130
				1131	.. member:: bool Solver::Options::return_final_gradient
				1132
				1133	Default: ``false``
				1134
				1135	If true, the vectors :member:`Solver::Summary::initial_gradient` and
				1136	:member:`Solver::Summary::final_gradient` are filled with the gradient
				1137	before and after the optimization. The entries of these vectors are
				1138	in the order in which ParameterBlocks were added to the Problem
				1139	object.
				1140
				1141	Since :member:`Problem::AddResidualBlock` adds ParameterBlocks to
				1142	the :class:`Problem` automatically if they do not already exist,
				1143	if you wish to have explicit control over the ordering of the
				1144	vectors, then use :member:`Problem::AddParameterBlock` to
				1145	explicitly add the ParameterBlocks in the order desired.
				1146
				1147	.. member:: bool Solver::Options::return_initial_jacobian
				1148
				1149	Default: ``false``
				1150
				1151	.. member:: bool Solver::Options::return_initial_jacobian
				1152
				1153	Default: ``false``
				1154
				1155	If ``true``, the Jacobian matrices before and after the
				1156	optimization are returned in
				1157	:member:`Solver::Summary::initial_jacobian` and
				1158	:member:`Solver::Summary::final_jacobian` respectively.
				1159
				1160	The rows of these matrices are in the same order in which the
				1161	ResidualBlocks were added to the Problem object. The columns are in
				1162	the same order in which the ParameterBlocks were added to the
				1163	Problem object.
				1164
				1165	Since :member:`Problem::AddResidualBlock` adds ParameterBlocks to
				1166	the :class:`Problem` automatically if they do not already exist,
				1167	if you wish to have explicit control over the ordering of the
				1168	vectors, then use :member:`Problem::AddParameterBlock` to
				1169	explicitly add the ParameterBlocks in the order desired.
				1170
				1171	The Jacobian matrices are stored as compressed row sparse
				1172	matrices. Please see ``include/ceres/crs_matrix.h`` for more
				1173	details of the format.
				1174
				1175	.. member:: vector<int> Solver::Options::lsqp_iterations_to_dump
				1176
				1177	Default: ``empty``
				1178
				1179	List of iterations at which the optimizer should dump the linear
				1180	least squares problem to disk. Useful for testing and
				1181	benchmarking. If ``empty``, no problems are dumped.
				1182
				1183	.. member:: string Solver::Options::lsqp_dump_directory
				1184
				1185	Default: ``/tmp``
				1186
				1187	If :member:`Solver::Options::lsqp_iterations_to_dump` is non-empty, then
				1188	this setting determines the directory to which the files containing
				1189	the linear least squares problems are written to.
				1190
				1191	.. member:: DumpFormatType Solver::Options::lsqp_dump_format
				1192
				1193	Default: ``TEXTFILE``
				1194
				1195	The format in which linear least squares problems should be logged
				1196	when :member:`Solver::Options::lsqp_iterations_to_dump` is non-empty.
				1197	There are three options:
				1198
				1199	* ``CONSOLE`` prints the linear least squares problem in a human
				1200	readable format to ``stderr``. The Jacobian is printed as a
				1201	dense matrix. The vectors :math:`D`, :math:`x` and :math:`f` are
				1202	printed as dense vectors. This should only be used for small
				1203	problems.
				1204
				1205	* ``PROTOBUF`` Write out the linear least squares problem to the
				1206	directory pointed to by :member:`Solver::Options::lsqp_dump_directory` as
				1207	a protocol buffer. ``linear_least_squares_problems.h/cc``
				1208	contains routines for loading these problems. For details on the
				1209	on disk format used, see ``matrix.proto``. The files are named
				1210	``lm_iteration_???.lsqp``. This requires that ``protobuf`` be
				1211	linked into Ceres Solver.
				1212
				1213	* ``TEXTFILE`` Write out the linear least squares problem to the
				1214	directory pointed to by member::`Solver::Options::lsqp_dump_directory` as
				1215	text files which can be read into ``MATLAB/Octave``. The Jacobian
				1216	is dumped as a text file containing :math:`(i,j,s)` triplets, the
				1217	vectors :math:`D`, `x` and `f` are dumped as text files
				1218	containing a list of their values.
				1219
				1220	A ``MATLAB/Octave`` script called ``lm_iteration_???.m`` is also
				1221	output, which can be used to parse and load the problem into memory.
				1222
				1223	.. member:: bool Solver::Options::check_gradients
				1224
				1225	Default: ``false``
				1226
				1227	Check all Jacobians computed by each residual block with finite
				1228	differences. This is expensive since it involves computing the
				1229	derivative by normal means (e.g. user specified, autodiff, etc),
				1230	then also computing it using finite differences. The results are
				1231	compared, and if they differ substantially, details are printed to
				1232	the log.
				1233
				1234	.. member:: double Solver::Options::gradient_check_relative_precision
				1235
				1236	Default: ``1e08``
				1237
				1238	Precision to check for in the gradient checker. If the relative
				1239	difference between an element in a Jacobian exceeds this number,
				1240	then the Jacobian for that cost term is dumped.
				1241
				1242	.. member:: double Solver::Options::numeric_derivative_relative_step_size
				1243
				1244	Default: ``1e-6``
				1245
				1246	Relative shift used for taking numeric derivatives. For finite
				1247	differencing, each dimension is evaluated at slightly shifted
				1248	values, e.g., for forward differences, the numerical derivative is
				1249
				1250	.. math::
				1251
				1252	\delta &= numeric\_derivative\_relative\_step\_size\\
				1253	\Delta f &= \frac{f((1 + \delta) x) - f(x)}{\delta x}
				1254
				1255	The finite differencing is done along each dimension. The reason to
				1256	use a relative (rather than absolute) step size is that this way,
				1257	numeric differentiation works for functions where the arguments are
				1258	typically large (e.g. :math:`10^9`) and when the values are small
				1259	(e.g. :math:`10^{-5}`). It is possible to construct torture cases
				1260	which break this finite difference heuristic, but they do not come
				1261	up often in practice.
				1262
				1263	.. member:: vector<IterationCallback> Solver::Options::callbacks
				1264
				1265	Callbacks that are executed at the end of each iteration of the
				1266	:class:`Minimizer`. They are executed in the order that they are
				1267	specified in this vector. By default, parameter blocks are updated
				1268	only at the end of the optimization, i.e when the
				1269	:class:`Minimizer` terminates. This behavior is controlled by
				1270	:member:`Solver::Options::update_state_every_variable`. If the user wishes
				1271	to have access to the update parameter blocks when his/her
				1272	callbacks are executed, then set
				1273	:member:`Solver::Options::update_state_every_iteration` to true.
				1274
				1275	The solver does NOT take ownership of these pointers.
				1276
				1277	.. member:: bool Solver::Options::update_state_every_iteration
				1278
				1279	Default: ``false``
				1280
				1281	Normally the parameter blocks are only updated when the solver
				1282	terminates. Setting this to true update them in every
				1283	iteration. This setting is useful when building an interactive
				1284	application using Ceres and using an :class:`IterationCallback`.
				1285
				1286	.. member:: string Solver::Options::solver_log
				1287
				1288	Default: ``empty``
				1289
				1290	If non-empty, a summary of the execution of the solver is recorded
				1291	to this file. This file is used for recording and Ceres'
				1292	performance. Currently, only the iteration number, total time and
				1293	the objective function value are logged. The format of this file is
				1294	expected to change over time as the performance evaluation
				1295	framework is fleshed out.
				1296
				1297	:class:`ParameterBlockOrdering`
				1298	-------------------------------
				1299
				1300	.. class:: ParameterBlockOrdering
				1301
				1302	TBD
				1303
				1304	:class:`IterationCallback`
				1305	--------------------------
				1306
				1307	.. class:: IterationCallback
				1308
				1309	TBD
				1310
				1311	:class:`CRSMatrix`
				1312	------------------
				1313
				1314	.. class:: CRSMatrix
				1315
				1316	TBD
				1317
				1318	:class:`Solver::Summary`
				1319	------------------------
				1320
				1321	.. class:: Solver::Summary
				1322
				1323	TBD
				1324
				1325	:class:`GradientChecker`
				1326	------------------------
				1327
				1328	.. class:: GradientChecker
				1329
				1330
				1331
				1332
				1333