Documentation update.
1. Renamed ceres.tex to ceres-solver.tex
2. Updated Solver::Options docs to reflect the recent changes.
3. Updated .gitignore to ignore ceres-solver.pdf
Change-Id: Iea19f8ff5fa1638a498422c8be5ed2e6da2950c9
diff --git a/docs/solving.tex b/docs/solving.tex
index c83103c..ddc7bae 100644
--- a/docs/solving.tex
+++ b/docs/solving.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
\chapter{Solving}
Effective use of Ceres requires some familiarity with the basic components of a nonlinear least squares solver.
@@ -59,7 +59,7 @@
\subsection{\texttt{DENSE\_QR}}
For small problems (a couple of hundred parameters and a few thousand residuals) with relatively dense Jacobians, \texttt{DENSE\_QR} is the method of choice~\cite{bjorck1996numerical}. Let $J = QR$ be the QR-decomposition of $J$, where $Q$ is an orthonormal matrix and $R$ is an upper triangular matrix~\cite{trefethen1997numerical}. Then it can be shown that the solution to~\eqref{eq:normal} is given by
\begin{align}
- \Delta x^* = -R^{-1}Q^\top f
+ \Delta x^* = -R^{-1}Q^\top f
\end{align}
Ceres uses \texttt{Eigen}'s dense QR decomposition routines.
@@ -67,7 +67,7 @@
\subsection{\texttt{SPARSE\_NORMAL\_CHOLESKY}}
Large non-linear least square problems are usually sparse. In such cases, using a dense QR factorization is inefficient. Let $H = R^\top R$ be the Cholesky factorization of the normal equations, where $R$ is an upper triangular matrix, then the solution to ~\eqref{eq:normal} is given by
\begin{equation}
- \Delta x^* = R^{-1} R^{-\top} g.
+ \Delta x^* = R^{-1} R^{-\top} g.
\end{equation}
The observant reader will note that the $R$ in the Cholesky factorization of $H$ is the same upper triangular matrix $R$ in the QR factorization of $J$. Since $Q$ is an orthonormal matrix, $J=QR$ implies that $J^\top J = R^\top Q^\top Q R = R^\top R$.
@@ -142,7 +142,7 @@
\subsection{\texttt{CGNR}}
For general sparse problems, if the problem is too large for \texttt{CHOLMOD} or a sparse linear algebra library is not linked into Ceres, another option is the \texttt{CGNR} solver. This solver uses the Conjugate Gradients solver on the {\em normal equations}, but without forming the normal equations explicitly. It exploits the relation
\begin{align}
- H x = J^\top J x = J^\top(J x)
+ H x = J^\top J x = J^\top(J x)
\end{align}
When the user chooses \texttt{ITERATIVE\_SCHUR} as the linear solver, Ceres automatically switches from the exact step algorithm to an inexact step algorithm.
@@ -198,21 +198,30 @@
\texttt{Solver::Options} controls the overall behavior of the solver. We list the various settings and their default values below.
\begin{enumerate}
-\item{\texttt{minimizer\_type}}(\texttt{LEVENBERG\_MARQUARDT}) The minimization algorithm used by Ceres. \texttt{LEVENBERG\_MARQUARDT} is currently the only valid value.
+
+\item{\texttt{trust\_region\_strategy\_type}}(\texttt{LEVENBERG\_MARQUARDT}) The trust region step computation algorithm used by Ceres. \texttt{LEVENBERG\_MARQUARDT} is currently the only valid value.
\item{\texttt{max\_num\_iterations}}(\texttt{50}) Maximum number of iterations for Levenberg-Marquardt.
-\item{\texttt{max\_solver\_time\_sec}}(\texttt{1e9}) Maximum amount of time (in seconds) for which the solver should run.
+\item{\texttt{max\_solver\_time\_sec}} ($10^9$) Maximum amount of time (in seconds) for which the solver should run.
\item{\texttt{num\_threads}}(\texttt{1})
Number of threads used by Ceres to evaluate the Jacobian.
-\item{\texttt{tau}}(\texttt{1e-4}) Initial value of the regularization parameter $\mu$ used by the Levenberg-Marquardt algorithm. The size of this parameter indicate the user's guess of how far the initial solution is from the minimum. Large values indicates that the solution is far away.
+\item{\texttt{initial\_trust\_region\_radius} ($10^4$)} The size of the initial trust region. When the \texttt{LEVENBERG\_MARQUARDT} strategy is used, the reciprocal of this number is the initial regularization parameter.
-\item{\texttt{min\_relative\_decrease}}(\texttt{1e-3}) Lower threshold for relative decrease before a Levenberg-Marquardt step is acceped.
+\item{\texttt{max\_trust\_region\_radius} ($10^{16}$)} The trust region radius is not allowed to grow beyond this value.
+\item{\texttt{max\_trust\_region\_radius} ($10^{-32}$)} The solver terminates, when the trust region becomes smaller than this value.
+\item{\texttt{min\_relative\_decrease}}($10^{-3}$) Lower threshold for relative decrease before a Levenberg-Marquardt step is acceped.
-\item{\texttt{function\_tolerance}}(\texttt{1e-6}) Solver terminates if
+\item{\texttt{lm\_min\_diagonal} ($10^6$)} The \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to regularize the the trust region step. This is the lower bound on the values of this diagonal matrix.
+
+\item{\texttt{lm\_max\_diagonal} ($10^{32}$)} The \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to regularize the the trust region step. This is the upper bound on the values of this diagonal matrix.
+
+\item{\texttt{max\_num\_consecutive\_invalid\_steps} (5)} The step returned by a trust region strategy can sometimes be numerically invalid, usually because of conditioning issues. Instead of crashing or stopping the optimization, the optimizer can go ahead and try solving with a smaller trust region/better conditioned problem. This parameter sets the number of consecutive retries before the minimizer gives up.
+
+\item{\texttt{function\_tolerance}}($10^{-6}$) Solver terminates if
\begin{align}
\frac{|\Delta \text{cost}|}{\text{cost}} < \texttt{function\_tolerance}
\end{align}
@@ -224,9 +233,9 @@
\end{equation}
where $\|\cdot\|_\infty$ refers to the max norm, and $x_0$ is the vector of initial parameter values.
-\item{\texttt{parameter\_tolerance}}(\texttt{1e-8}) Solver terminates if
+\item{\texttt{parameter\_tolerance}}($10^{-8}$) Solver terminates if
\begin{equation}
- \frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance}
+ \frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance}
\end{equation}
where $\Delta x$ is the step computed by the linear solver in the current iteration of Levenberg-Marquardt.
@@ -236,6 +245,8 @@
\item{\texttt{preconditioner\_type}}(\texttt{JACOBI}) The preconditioner used by the iterative linear solver. The default is the block Jacobi preconditioner. Valid values are (in increasing order of complexity) \texttt{IDENTITY},\texttt{JACOBI}, \texttt{SCHUR\_JACOBI}, \texttt{CLUSTER\_JACOBI} and \texttt{CLUSTER\_TRIDIAGONAL}.
+\item{\texttt{sparse\_linear\_algebra\_library} (\texttt{SUITE\_SPARSE})} Ceres supports the use of two sparse linear algebra libraries, \texttt{SuiteSparse}, which is enabled by setting this parameter to \texttt{SUITE\_SPARSE} and \texttt{CXSparse}, which can be selected by setting this parameter to $\texttt{CX\_SPARSE}$. \texttt{SuiteSparse} is a sophisticated and complex sparse linear algebra library and should be used in general. If your needs/platforms prevent you from using \texttt{SuiteSparse}, consider using \texttt{CXSparse}, which is a much smaller, easier to build library. As can be expected, its performance on large problems is not comparable to that of \texttt{SuiteSparse}.
+
\item{\texttt{num\_linear\_solver\_threads}}(\texttt{1}) Number of threads used by the linear solver.
\item{\texttt{num\_eliminate\_blocks}}(\texttt{0})
@@ -258,12 +269,20 @@
to it if the \texttt{ordering\_type} is set to \texttt{USER} and the ordering vector is
non-empty.
+\item{\texttt{use\_block\_amd} (\texttt{true})} By virtue of the modeling layer in Ceres being block oriented,
+all the matrices used by Ceres are also block oriented.
+When doing sparse direct factorization of these matrices, the
+fill-reducing ordering algorithms can either be run on the
+block or the scalar form of these matrices. Running it on the
+block form exposes more of the super-nodal structure of the
+matrix to the Cholesky factorization routines. This leads to
+substantial gains in factorization performance. Setting this parameter to true, enables the use of a block oriented Approximate Minimum Degree ordering algorithm. Settings it to \texttt{false}, uses a scalar AMD algorithm. This option only makes sense when using \texttt{sparse\_linear\_algebra\_library = SUITE\_SPARSE} as it uses the \texttt{AMD} package that is part of \texttt{SuiteSparse}.
\item{\texttt{linear\_solver\_min\_num\_iterations}}(\texttt{1}) Minimum number of iterations used by the linear solver. This only makes sense when the linear solver is an iterative solver, e.g., \texttt{ITERATIVE\_SCHUR}.
\item{\texttt{linear\_solver\_max\_num\_iterations}}(\texttt{500}) Minimum number of iterations used by the linear solver. This only makes sense when the linear solver is an iterative solver, e.g., \texttt{ITERATIVE\_SCHUR}.
-\item{\texttt{eta}}(\texttt{1e-1})
+\item{\texttt{eta}} ($10^{-1}$)
Forcing sequence parameter. The truncated Newton solver uses
this number to control the relative accuracy with which the
Newton step is computed. This constant is passed to ConjugateGradientsSolver which uses
@@ -329,12 +348,12 @@
results are compared, and if they differ substantially, details
are printed to the log.
-\item{\texttt{gradient\_check\_relative\_precision}}(\texttt{1e-8})
+\item{\texttt{gradient\_check\_relative\_precision}} ($10^{-8}$)
Relative precision to check for in the gradient checker. If the
relative difference between an element in a Jacobian exceeds
this number, then the Jacobian for that cost term is dumped.
-\item{\texttt{numeric\_derivative\_relative\_step\_size}}(\texttt{1e-6})
+\item{\texttt{numeric\_derivative\_relative\_step\_size}} ($10^{-6}$)
Relative shift used for taking numeric derivatives. For finite
differencing, each dimension is evaluated at slightly shifted
values, \eg for forward differences, the numerical derivative is
@@ -348,8 +367,8 @@
The finite differencing is done along each dimension. The
reason to use a relative (rather than absolute) step size is
that this way, numeric differentiation works for functions where
- the arguments are typically large (e.g. 1e9) and when the
- values are small (e.g. 1e-5). It is possible to construct
+ the arguments are typically large (e.g. $10^9$) and when the
+ values are small (e.g. $10^{-5}$). It is possible to construct
"torture cases" which break this finite difference heuristic,
but they do not come up often in practice.