Update documentation.
Change-Id: Idb03741fab9facbbbda85d5a82723f0b4c1c6c60
diff --git a/.gitignore b/.gitignore
index 17c8045..0f0d1c7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -19,3 +19,4 @@
docs/minted.sty
obj/
jni/libs/
+.buildinfo
diff --git a/README b/README
index 14fd355..14be3fe 100644
--- a/README
+++ b/README
@@ -1,3 +1,3 @@
Ceres Solver - A non-linear least squares minimizer
==================================================
-Please see ceres-solver.pdf in docs/ for a tutorial and reference.
+Please see docs/html/index.html for a tutorial and reference.
diff --git a/docs/source/bibliography.rst b/docs/source/bibliography.rst
index a1a9883..e80e483 100644
--- a/docs/source/bibliography.rst
+++ b/docs/source/bibliography.rst
@@ -15,7 +15,11 @@
multiple station analytical stereo triangulation**, Technical
Report 43, Patrick Airforce Base, Florida, 1958.
-.. [Byrd] R.H. Byrd, R.B. Schnabel, and G.A. Shultz, **Approximate
+.. [ByrdNocedal] R. H. Byrd, J. Nocedal, R. B. Schanbel,
+ **Representations of Quasi-Newton Matrices and their use in Limited
+ Memory Methods**, *Mathematical Programming* 63(4):129–-156, 1994.
+
+.. [ByrdSchanbel] R.H. Byrd, R.B. Schnabel, and G.A. Shultz, **Approximate
solution of the trust region problem by minimization over
two dimensional subspaces**, *Mathematical programming*,
40(1):247–263, 1988.
@@ -65,6 +69,9 @@
within a truncated newton method**, *Operations Research Letters*,
9(4):219–221, 1990.
+.. [Nocedal] J. Nocedal, **Updating Quasi-Newton Matrices with Limited
+ Storage**, *Mathematics of Computation*, 35(151): 773--782, 1980.
+
.. [NocedalWright] J. Nocedal & S. Wright, **Numerical Optimization**,
Springer, 2004.
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 35b2e51..7fa6641 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -166,8 +166,6 @@
# Output file base name for HTML help builder.
htmlhelp_basename = 'CeresSolverdoc'
-mathjax_path = "mathjax/MathJax.js?config=TeX-AMS_HTML"
-
# -- Options for LaTeX output --------------------------------------------------
latex_elements = {
diff --git a/docs/source/solving.rst b/docs/source/solving.rst
index 43eedee..fb48bb3 100644
--- a/docs/source/solving.rst
+++ b/docs/source/solving.rst
@@ -9,16 +9,16 @@
Solver API
==========
+
+Introduction
+============
+
Effective use of Ceres requires some familiarity with the basic
components of a nonlinear least squares solver, so before we describe
how to configure the solver, we will begin by taking a brief look at
how some of the core optimization algorithms in Ceres work and the
various linear solvers and preconditioners that power it.
-.. _section-trust-region-methods:
-
-Trust Region Methods
---------------------
Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
variables, and
@@ -32,10 +32,9 @@
Here, the Jacobian :math:`J(x)` of :math:`F(x)` is an :math:`m\times
n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)` and the
gradient vector :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2 = J(x)^\top
-F(x)`. Since the efficient global minimization of :eq:`nonlinsq` for general
-:math:`F(x)` is an intractable problem, we will have to settle for
-finding a local minimum.
-
+F(x)`. Since the efficient global minimization of :eq:`nonlinsq` for
+general :math:`F(x)` is an intractable problem, we will have to settle
+for finding a local minimum.
The general strategy when solving non-linear optimization problems is
to solve a sequence of approximations to the original problem
@@ -43,31 +42,57 @@
determine a correction :math:`\Delta x` to the vector :math:`x`. For
non-linear least squares, an approximation can be constructed by using
the linearization :math:`F(x+\Delta x) \approx F(x) + J(x)\Delta x`,
-which leads to the following linear least squares problem:
+which leads to the following linear least squares problem:
.. math:: \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2
:label: linearapprox
Unfortunately, naively solving a sequence of these problems and
-updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that may not
-converge. To get a convergent algorithm, we need to control the size
-of the step :math:`\Delta x`. And this is where the idea of a trust-region
-comes in.
+updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that
+may not converge. To get a convergent algorithm, we need to control
+the size of the step :math:`\Delta x`. Depending on how the size of
+the step :math:`\Delta x` is controlled, non-linear optimization
+algorithms can be divided into two major categories [NocedalWright]_.
-.. Algorithm~\ref{alg:trust-region} describes the basic trust-region
-.. loop for non-linear least squares problems.
+1. **Trust Region** The trust region approach approximates the
+ objective function using using a model function (often a quadratic)
+ over a subset of the search space known as the trust region. If the
+ model function succeeds in minimizing the true objective function
+ the trust region is expanded; conversely, otherwise it is
+ contracted and the model optimization problem is solved again.
-.. \begin{algorithm} \caption{The basic trust-region
- algorithm.\label{alg:trust-region}} \begin{algorithmic} \REQUIRE
- Initial point `x` and a trust region radius `\mu`. \LOOP
- \STATE{Solve `\arg \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x +
- F(x)\|^2` s.t. `\|D(x)\Delta x\|^2 \le \mu`} \STATE{`\rho =
- \frac{\displaystyle \|F(x + \Delta x)\|^2 -
- \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 - \|F(x)\|^2}`}
- \IF {`\rho > \epsilon`} \STATE{`x = x + \Delta x`} \ENDIF \IF {`\rho
- > \eta_1`} \STATE{`\rho = 2 * \rho`} \ELSE \IF {`\rho < \eta_2`}
- \STATE {`\rho = 0.5 * \rho`} \ENDIF \ENDIF \ENDLOOP
- \end{algorithmic} \end{algorithm}
+2. **Line Search** The line search approach first finds a descent
+ direction along which the objective function will be reduced and
+ then computes a step size that decides how far should move along
+ that direction. The descent direction can be computed by various
+ methods, such as gradient descent, Newton's method and Quasi-Newton
+ method. The step size can be determined either exactly or
+ inexactly.
+
+Trust region methods are in some sense dual to line search methods:
+trust region methods first choose a step size (the size of the trust
+region) and then a step direction while line search methods first
+choose a step direction and then a step size.
+
+Ceres implements multiple algorithms in both categories.
+
+.. _section-trust-region-methods:
+
+Trust Region Methods
+====================
+
+The basic trust region algorithm looks something like this.
+
+ 1. Given an initial point :math:`x` and a trust region radius :math:`\mu`.
+ 2. :math:`\arg \min_{\Delta x} \frac{1}{2}\|J(x)\Delta
+ x + F(x)\|^2` s.t. :math:`\|D(x)\Delta x\|^2 \le \mu`
+ 3. :math:`\rho = \frac{\displaystyle \|F(x + \Delta x)\|^2 -
+ \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 -
+ \|F(x)\|^2}`
+ 4. if :math:`\rho > \epsilon` then :math:`x = x + \Delta x`.
+ 5. if :math:`\rho > \eta_1` then :math:`\rho = 2 \rho`
+ 6. else if :math:`\rho < \eta_2` then :math:`\rho = 0.5 * \rho`
+ 7. Goto 2.
Here, :math:`\mu` is the trust region radius, :math:`D(x)` is some
matrix used to define a metric on the domain of :math:`F(x)` and
@@ -97,10 +122,11 @@
in terms of an optimization problem defined over a state
vector of size :math:`n`.
+
.. _section-levenberg-marquardt:
Levenberg-Marquardt
-^^^^^^^^^^^^^^^^^^^
+-------------------
The Levenberg-Marquardt algorithm [Levenberg]_ [Marquardt]_ is the
most popular algorithm for solving non-linear least squares problems.
@@ -176,7 +202,7 @@
.. _section-dogleg:
Dogleg
-^^^^^^
+------
Another strategy for solving the trust region problem :eq:`trp` was
introduced by M. J. D. Powell. The key idea there is to compute two
@@ -206,7 +232,7 @@
``SUBSPACE_DOGLEG`` is a more sophisticated method that considers the
entire two dimensional subspace spanned by these two vectors and finds
the point that minimizes the trust region problem in this
-subspace [Byrd]_.
+subspace [ByrdSchanbel]_.
The key advantage of the Dogleg over Levenberg Marquardt is that if
the step computation for a particular choice of :math:`\mu` does not
@@ -222,7 +248,7 @@
.. _section-inner-iterations:
Inner Iterations
-^^^^^^^^^^^^^^^^
+----------------
Some non-linear least squares problems have additional structure in
the way the parameter blocks interact that it is beneficial to modify
@@ -289,7 +315,7 @@
.. _section-non-monotonic-steps:
Non-monotonic Steps
-^^^^^^^^^^^^^^^^^^^
+-------------------
Note that the basic trust-region algorithm described in
Algorithm~\ref{alg:trust-region} is a descent algorithm in that they
@@ -314,14 +340,66 @@
optimization, the final parameters returned to the user are the
ones corresponding to the minimum cost over all iterations.
-The option to take non-monotonic is available for all trust region
-strategies.
+The option to take non-monotonic steps is available for all trust
+region strategies.
+.. _section-line-search-methods:
+
+Line Search Methods
+===================
+
+**The implementation of line search algorithms in Ceres Solver is
+fairly new and not very well tested, so for now this part of the
+solver should be considered beta quality. We welcome reports of your
+experiences both good and bad on the mailinglist.**
+
+Line search algorithms
+
+ 1. Given an initial point :math:`x`
+ 2. :math:`\Delta x = -H^{-1}(x) g(x)`
+ 3. :math:`\arg \min_\mu \frac{1}{2} \| F(x + \mu \Delta x) \|^2`
+ 4. :math:`x = x + \mu \Delta x`
+ 5. Goto 2.
+
+Here :math:`H(x)` is some approximation to the Hessian of the
+objective function, and :math:`g(x)` is the gradient at
+:math:`x`. Depending on the choice of :math:`H(x)` we get a variety of
+different search directions -`\Delta x`.
+
+Step 4, which is a one dimensional optimization or `Line Search` along
+:math:`\Delta x` is what gives this class of methods its name.
+
+Different line search algorithms differ in their choice of the search
+direction :math:`\Delta x` and the method used for one dimensional
+optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
+primary source of computational complexity in these
+methods. Currently, Ceres Solver supports three choices of search
+directions, all aimed at large scale problems.
+
+1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
+ be the identity matrix. This is not a good search direction for
+ anything but the simplest of the problems. It is only included here
+ for completeness.
+
+2. ``NONLINEAR_CONJUGATE_GRADIENT`` A generalization of the Conjugate
+ Gradient method to non-linear functions. The generalization can be
+ performed in a number of different ways, resulting in a variety of
+ search directions. Ceres Solver currently supports
+ ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and ``HESTENES_STIEFEL``
+ directions.
+
+3. ``LBFGS`` In this method, a limited memory approximation to the
+ inverse Hessian is maintained and used to compute a quasi-Newton
+ step [Nocedal]_, [ByrdNocedal]_.
+
+Currently Ceres Solver uses a backtracking and interpolation based
+Armijo line search algorithm.
+
.. _section-linear-solver:
LinearSolver
-------------
+============
Recall that in both of the trust-region methods described above, the
key computational cost is the solution of a linear least squares
@@ -343,7 +421,7 @@
.. _section-qr:
``DENSE_QR``
-^^^^^^^^^^^^
+------------
For small problems (a couple of hundred parameters and a few thousand
residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
@@ -360,7 +438,7 @@
.. _section-cholesky:
``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+------------------------------------------------------
Large non-linear least square problems are usually sparse. In such
cases, using a dense QR factorization is inefficient. Let :math:`H =
@@ -393,7 +471,7 @@
.. _section-schur:
``DENSE_SCHUR`` & ``SPARSE_SCHUR``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+----------------------------------
While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
adjustment problems, bundle adjustment problem have a special
@@ -493,7 +571,7 @@
.. _section-cgnr:
``CGNR``
-^^^^^^^^
+--------
For general sparse problems, if the problem is too large for
``CHOLMOD`` or a sparse linear algebra library is not linked into
@@ -512,7 +590,7 @@
.. _section-iterative_schur:
``ITERATIVE_SCHUR``
-^^^^^^^^^^^^^^^^^^^
+-------------------
Another option for bundle adjustment problems is to apply PCG to the
reduced camera matrix :math:`S` instead of :math:`H`. One reason to do
@@ -692,6 +770,58 @@
:class:`Solver::Options` controls the overall behavior of the
solver. We list the various settings and their default values below.
+
+.. member:: MinimizerType Solver::Options::minimizer_type
+
+ Default: ``TRUST_REGION``
+
+ Choose between ``LINE_SEARCH`` and ``TRUST_REGION`` algorithms. See
+ :ref:`section-trust-region-methods` and
+ :ref:`section-line-search-methods` for more details.
+
+.. member:: LineSearchDirectionType Solver::Options::line_search_direction_type
+
+ Default: ``LBFGS``
+
+ Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``
+ and ``LBFGS``.
+
+.. member:: LineSearchType Solver::Options::line_search_type
+
+ Default: ``ARMIJO``
+
+ ``ARMIJO`` is the only choice right now.
+
+.. member:: NonlinearConjugateGradientType Solver::Options::nonlinear conjugate_gradient_type
+
+ Default: ``FLETCHER_REEVES``
+
+ Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and
+ ``HESTENES_STIEFEL``.
+
+.. member:: int Solver::Options::max_lbfs_rank
+
+ Default: 20
+
+ The LBFGS hessian approximation is a low rank approximation to the
+ inverse of the Hessian matrix. The rank of the approximation
+ determines (linearly) the space and time complexity of using the
+ approximation. Higher the rank, the better is the quality of the
+ approximation. The increase in quality is however is bounded for a
+ number of reasons.
+
+ 1. The method only uses secant information and not actual
+ derivatives.
+
+ 2. The Hessian approximation is constrained to be positive
+ definite.
+
+ So increasing this rank to a large number will cost time and space
+ complexity without the corresponding increase in solution
+ quality. There are no hard and fast rules for choosing the maximum
+ rank. The best choice usually requires some problem specific
+ experimentation.
+
.. member:: TrustRegionStrategyType Solver::Options::trust_region_strategy_type
Default: ``LEVENBERG_MARQUARDT``
@@ -707,7 +837,7 @@
Ceres supports two different dogleg strategies.
``TRADITIONAL_DOGLEG`` method by Powell and the ``SUBSPACE_DOGLEG``
- method described by [Byrd]_. See :ref:`section-dogleg` for more
+ method described by [ByrdSchnabel]_. See :ref:`section-dogleg` for more
details.
.. member:: bool Solver::Options::use_nonmonotonic_steps
diff --git a/docs/source/version_history.rst b/docs/source/version_history.rst
index 010b477..dbfc758 100644
--- a/docs/source/version_history.rst
+++ b/docs/source/version_history.rst
@@ -37,11 +37,19 @@
speeds up the solver a bit.
#. Automatic differenatiation with a dynamic number of parameter
- blocks. (Based on an initial implementation by Thad Hughes).
+ blocks. (Based on an idea by Thad Hughes).
+
+#. Speeded up problem construction destruction.
+
+#. Added matrix adapters to ``rotation.h`` so that the rotation matrix
+ routines can work with row and column major matrices. (Markus Moll)
Bug Fixes
---------
+#. Fixed a bug in ``solver_impl.cc`` residual evaluation. (Markus
+ Moll)
+
#. Fixed varidic evaluation bug in ``AutoDiff``.
#. Fixed ``SolverImpl`` tests.