Update documentation.

Change-Id: Idb03741fab9facbbbda85d5a82723f0b4c1c6c60
diff --git a/.gitignore b/.gitignore
index 17c8045..0f0d1c7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -19,3 +19,4 @@
 docs/minted.sty
 obj/
 jni/libs/
+.buildinfo
diff --git a/README b/README
index 14fd355..14be3fe 100644
--- a/README
+++ b/README
@@ -1,3 +1,3 @@
 Ceres Solver - A non-linear least squares minimizer
 ==================================================
-Please see ceres-solver.pdf in docs/ for a tutorial and reference.
+Please see docs/html/index.html for a tutorial and reference.
diff --git a/docs/source/bibliography.rst b/docs/source/bibliography.rst
index a1a9883..e80e483 100644
--- a/docs/source/bibliography.rst
+++ b/docs/source/bibliography.rst
@@ -15,7 +15,11 @@
    multiple station analytical stereo triangulation**,  Technical
    Report 43, Patrick Airforce Base, Florida, 1958.
 
-.. [Byrd] R.H. Byrd, R.B. Schnabel, and G.A. Shultz, **Approximate
+.. [ByrdNocedal] R. H. Byrd, J. Nocedal, R. B. Schanbel,
+   **Representations of Quasi-Newton Matrices and their use in Limited
+   Memory Methods**, *Mathematical Programming* 63(4):129–-156, 1994.
+
+.. [ByrdSchanbel] R.H. Byrd, R.B. Schnabel, and G.A. Shultz, **Approximate
    solution of the trust region problem by minimization over
    two dimensional subspaces**, *Mathematical programming*,
    40(1):247–263, 1988.
@@ -65,6 +69,9 @@
    within a truncated newton method**, *Operations Research Letters*,
    9(4):219–221, 1990.
 
+.. [Nocedal] J. Nocedal, **Updating Quasi-Newton Matrices with Limited
+   Storage**, *Mathematics of Computation*, 35(151): 773--782, 1980.
+
 .. [NocedalWright] J. Nocedal & S. Wright, **Numerical Optimization**,
    Springer, 2004.
 
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 35b2e51..7fa6641 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -166,8 +166,6 @@
 # Output file base name for HTML help builder.
 htmlhelp_basename = 'CeresSolverdoc'
 
-mathjax_path = "mathjax/MathJax.js?config=TeX-AMS_HTML"
-
 # -- Options for LaTeX output --------------------------------------------------
 
 latex_elements = {
diff --git a/docs/source/solving.rst b/docs/source/solving.rst
index 43eedee..fb48bb3 100644
--- a/docs/source/solving.rst
+++ b/docs/source/solving.rst
@@ -9,16 +9,16 @@
 Solver API
 ==========
 
+
+Introduction
+============
+
 Effective use of Ceres requires some familiarity with the basic
 components of a nonlinear least squares solver, so before we describe
 how to configure the solver, we will begin by taking a brief look at
 how some of the core optimization algorithms in Ceres work and the
 various linear solvers and preconditioners that power it.
 
-.. _section-trust-region-methods:
-
-Trust Region Methods
---------------------
 
 Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
 variables, and
@@ -32,10 +32,9 @@
 Here, the Jacobian :math:`J(x)` of :math:`F(x)` is an :math:`m\times
 n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)` and the
 gradient vector :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2 = J(x)^\top
-F(x)`. Since the efficient global minimization of :eq:`nonlinsq` for general
-:math:`F(x)` is an intractable problem, we will have to settle for
-finding a local minimum.
-
+F(x)`. Since the efficient global minimization of :eq:`nonlinsq` for
+general :math:`F(x)` is an intractable problem, we will have to settle
+for finding a local minimum.
 
 The general strategy when solving non-linear optimization problems is
 to solve a sequence of approximations to the original problem
@@ -43,31 +42,57 @@
 determine a correction :math:`\Delta x` to the vector :math:`x`. For
 non-linear least squares, an approximation can be constructed by using
 the linearization :math:`F(x+\Delta x) \approx F(x) + J(x)\Delta x`,
-which leads to the following linear least squares  problem:
+which leads to the following linear least squares problem:
 
 .. math:: \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2
    :label: linearapprox
 
 Unfortunately, naively solving a sequence of these problems and
-updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that may not
-converge.  To get a convergent algorithm, we need to control the size
-of the step :math:`\Delta x`. And this is where the idea of a trust-region
-comes in.
+updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that
+may not converge.  To get a convergent algorithm, we need to control
+the size of the step :math:`\Delta x`. Depending on how the size of
+the step :math:`\Delta x` is controlled, non-linear optimization
+algorithms can be divided into two major categories [NocedalWright]_.
 
-.. Algorithm~\ref{alg:trust-region} describes the basic trust-region
-.. loop for non-linear least squares problems.
+1. **Trust Region** The trust region approach approximates the
+   objective function using using a model function (often a quadratic)
+   over a subset of the search space known as the trust region. If the
+   model function succeeds in minimizing the true objective function
+   the trust region is expanded; conversely, otherwise it is
+   contracted and the model optimization problem is solved again.
 
-.. \begin{algorithm} \caption{The basic trust-region
-  algorithm.\label{alg:trust-region}} \begin{algorithmic} \REQUIRE
-  Initial point `x` and a trust region radius `\mu`.  \LOOP
-  \STATE{Solve `\arg \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x +
-  F(x)\|^2` s.t. `\|D(x)\Delta x\|^2 \le \mu`} \STATE{`\rho =
-  \frac{\displaystyle \|F(x + \Delta x)\|^2 -
-  \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 - \|F(x)\|^2}`}
-  \IF {`\rho > \epsilon`} \STATE{`x = x + \Delta x`} \ENDIF \IF {`\rho
-  > \eta_1`} \STATE{`\rho = 2 * \rho`} \ELSE \IF {`\rho < \eta_2`}
-  \STATE {`\rho = 0.5 * \rho`} \ENDIF \ENDIF \ENDLOOP
-  \end{algorithmic} \end{algorithm}
+2. **Line Search** The line search approach first finds a descent
+   direction along which the objective function will be reduced and
+   then computes a step size that decides how far should move along
+   that direction. The descent direction can be computed by various
+   methods, such as gradient descent, Newton's method and Quasi-Newton
+   method. The step size can be determined either exactly or
+   inexactly.
+
+Trust region methods are in some sense dual to line search methods:
+trust region methods first choose a step size (the size of the trust
+region) and then a step direction while line search methods first
+choose a step direction and then a step size.
+
+Ceres implements multiple algorithms in both categories.
+
+.. _section-trust-region-methods:
+
+Trust Region Methods
+====================
+
+The basic trust region algorithm looks something like this.
+
+   1. Given an initial point :math:`x` and a trust region radius :math:`\mu`.
+   2. :math:`\arg \min_{\Delta x} \frac{1}{2}\|J(x)\Delta
+      x + F(x)\|^2` s.t. :math:`\|D(x)\Delta x\|^2 \le \mu`
+   3. :math:`\rho = \frac{\displaystyle \|F(x + \Delta x)\|^2 -
+      \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 -
+      \|F(x)\|^2}`
+   4. if :math:`\rho > \epsilon` then  :math:`x = x + \Delta x`.
+   5. if :math:`\rho > \eta_1` then :math:`\rho = 2  \rho`
+   6. else if :math:`\rho < \eta_2` then :math:`\rho = 0.5 * \rho`
+   7. Goto 2.
 
 Here, :math:`\mu` is the trust region radius, :math:`D(x)` is some
 matrix used to define a metric on the domain of :math:`F(x)` and
@@ -97,10 +122,11 @@
          in terms of an optimization problem defined over a state
          vector of size :math:`n`.
 
+
 .. _section-levenberg-marquardt:
 
 Levenberg-Marquardt
-^^^^^^^^^^^^^^^^^^^
+-------------------
 
 The Levenberg-Marquardt algorithm [Levenberg]_ [Marquardt]_ is the
 most popular algorithm for solving non-linear least squares problems.
@@ -176,7 +202,7 @@
 .. _section-dogleg:
 
 Dogleg
-^^^^^^
+------
 
 Another strategy for solving the trust region problem :eq:`trp` was
 introduced by M. J. D. Powell. The key idea there is to compute two
@@ -206,7 +232,7 @@
 ``SUBSPACE_DOGLEG`` is a more sophisticated method that considers the
 entire two dimensional subspace spanned by these two vectors and finds
 the point that minimizes the trust region problem in this
-subspace [Byrd]_.
+subspace [ByrdSchanbel]_.
 
 The key advantage of the Dogleg over Levenberg Marquardt is that if
 the step computation for a particular choice of :math:`\mu` does not
@@ -222,7 +248,7 @@
 .. _section-inner-iterations:
 
 Inner Iterations
-^^^^^^^^^^^^^^^^
+----------------
 
 Some non-linear least squares problems have additional structure in
 the way the parameter blocks interact that it is beneficial to modify
@@ -289,7 +315,7 @@
 .. _section-non-monotonic-steps:
 
 Non-monotonic Steps
-^^^^^^^^^^^^^^^^^^^
+-------------------
 
 Note that the basic trust-region algorithm described in
 Algorithm~\ref{alg:trust-region} is a descent algorithm in that they
@@ -314,14 +340,66 @@
 optimization, the final parameters returned to the user are the
 ones corresponding to the minimum cost over all iterations.
 
-The option to take non-monotonic is available for all trust region
-strategies.
+The option to take non-monotonic steps is available for all trust
+region strategies.
 
 
+.. _section-line-search-methods:
+
+Line Search Methods
+===================
+
+**The implementation of line search algorithms in Ceres Solver is
+fairly new and not very well tested, so for now this part of the
+solver should be considered beta quality. We welcome reports of your
+experiences both good and bad on the mailinglist.**
+
+Line search algorithms
+
+   1. Given an initial point :math:`x`
+   2. :math:`\Delta x = -H^{-1}(x) g(x)`
+   3. :math:`\arg \min_\mu \frac{1}{2} \| F(x + \mu \Delta x) \|^2`
+   4. :math:`x = x + \mu \Delta x`
+   5. Goto 2.
+
+Here :math:`H(x)` is some approximation to the Hessian of the
+objective function, and :math:`g(x)` is the gradient at
+:math:`x`. Depending on the choice of :math:`H(x)` we get a variety of
+different search directions -`\Delta x`.
+
+Step 4, which is a one dimensional optimization or `Line Search` along
+:math:`\Delta x` is what gives this class of methods its name.
+
+Different line search algorithms differ in their choice of the search
+direction :math:`\Delta x` and the method used for one dimensional
+optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
+primary source of computational complexity in these
+methods. Currently, Ceres Solver supports three choices of search
+directions, all aimed at large scale problems.
+
+1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
+   be the identity matrix. This is not a good search direction for
+   anything but the simplest of the problems. It is only included here
+   for completeness.
+
+2. ``NONLINEAR_CONJUGATE_GRADIENT`` A generalization of the Conjugate
+   Gradient method to non-linear functions. The generalization can be
+   performed in a number of different ways, resulting in a variety of
+   search directions. Ceres Solver currently supports
+   ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and ``HESTENES_STIEFEL``
+   directions.
+
+3. ``LBFGS`` In this method, a limited memory approximation to the
+   inverse Hessian is maintained and used to compute a quasi-Newton
+   step [Nocedal]_, [ByrdNocedal]_.
+
+Currently Ceres Solver uses a backtracking and interpolation based
+Armijo line search algorithm.
+
 .. _section-linear-solver:
 
 LinearSolver
-------------
+============
 
 Recall that in both of the trust-region methods described above, the
 key computational cost is the solution of a linear least squares
@@ -343,7 +421,7 @@
 .. _section-qr:
 
 ``DENSE_QR``
-^^^^^^^^^^^^
+------------
 
 For small problems (a couple of hundred parameters and a few thousand
 residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
@@ -360,7 +438,7 @@
 .. _section-cholesky:
 
 ``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+------------------------------------------------------
 
 Large non-linear least square problems are usually sparse. In such
 cases, using a dense QR factorization is inefficient. Let :math:`H =
@@ -393,7 +471,7 @@
 .. _section-schur:
 
 ``DENSE_SCHUR`` & ``SPARSE_SCHUR``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+----------------------------------
 
 While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
 adjustment problems, bundle adjustment problem have a special
@@ -493,7 +571,7 @@
 .. _section-cgnr:
 
 ``CGNR``
-^^^^^^^^
+--------
 
 For general sparse problems, if the problem is too large for
 ``CHOLMOD`` or a sparse linear algebra library is not linked into
@@ -512,7 +590,7 @@
 .. _section-iterative_schur:
 
 ``ITERATIVE_SCHUR``
-^^^^^^^^^^^^^^^^^^^
+-------------------
 
 Another option for bundle adjustment problems is to apply PCG to the
 reduced camera matrix :math:`S` instead of :math:`H`. One reason to do
@@ -692,6 +770,58 @@
   :class:`Solver::Options` controls the overall behavior of the
   solver. We list the various settings and their default values below.
 
+
+.. member:: MinimizerType Solver::Options::minimizer_type
+
+   Default: ``TRUST_REGION``
+
+   Choose between ``LINE_SEARCH`` and ``TRUST_REGION`` algorithms. See
+   :ref:`section-trust-region-methods` and
+   :ref:`section-line-search-methods` for more details.
+
+.. member:: LineSearchDirectionType Solver::Options::line_search_direction_type
+
+   Default: ``LBFGS``
+
+   Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``
+   and ``LBFGS``.
+
+.. member:: LineSearchType Solver::Options::line_search_type
+
+   Default: ``ARMIJO``
+
+   ``ARMIJO`` is the only choice right now.
+
+.. member:: NonlinearConjugateGradientType Solver::Options::nonlinear conjugate_gradient_type
+
+   Default: ``FLETCHER_REEVES``
+
+   Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and
+   ``HESTENES_STIEFEL``.
+
+.. member:: int Solver::Options::max_lbfs_rank
+
+   Default: 20
+
+   The LBFGS hessian approximation is a low rank approximation to the
+   inverse of the Hessian matrix. The rank of the approximation
+   determines (linearly) the space and time complexity of using the
+   approximation. Higher the rank, the better is the quality of the
+   approximation. The increase in quality is however is bounded for a
+   number of reasons.
+
+     1. The method only uses secant information and not actual
+        derivatives.
+
+     2. The Hessian approximation is constrained to be positive
+        definite.
+
+   So increasing this rank to a large number will cost time and space
+   complexity without the corresponding increase in solution
+   quality. There are no hard and fast rules for choosing the maximum
+   rank. The best choice usually requires some problem specific
+   experimentation.
+
 .. member:: TrustRegionStrategyType Solver::Options::trust_region_strategy_type
 
    Default: ``LEVENBERG_MARQUARDT``
@@ -707,7 +837,7 @@
 
    Ceres supports two different dogleg strategies.
    ``TRADITIONAL_DOGLEG`` method by Powell and the ``SUBSPACE_DOGLEG``
-   method described by [Byrd]_.  See :ref:`section-dogleg` for more
+   method described by [ByrdSchnabel]_.  See :ref:`section-dogleg` for more
    details.
 
 .. member:: bool Solver::Options::use_nonmonotonic_steps
diff --git a/docs/source/version_history.rst b/docs/source/version_history.rst
index 010b477..dbfc758 100644
--- a/docs/source/version_history.rst
+++ b/docs/source/version_history.rst
@@ -37,11 +37,19 @@
    speeds up the solver a bit.
 
 #. Automatic differenatiation with a dynamic number of parameter
-   blocks. (Based on an initial implementation by Thad Hughes).
+   blocks. (Based on an idea by Thad Hughes).
+
+#. Speeded up problem construction destruction.
+
+#. Added matrix adapters to ``rotation.h`` so that the rotation matrix
+   routines can work with row and column major matrices. (Markus Moll)
 
 Bug Fixes
 ---------
 
+#. Fixed a bug in ``solver_impl.cc`` residual evaluation. (Markus
+   Moll)
+
 #. Fixed varidic evaluation bug in ``AutoDiff``.
 
 #. Fixed ``SolverImpl`` tests.