Documentation update. 1. Renamed ceres.tex to ceres-solver.tex 2. Updated Solver::Options docs to reflect the recent changes. 3. Updated .gitignore to ignore ceres-solver.pdf Change-Id: Iea19f8ff5fa1638a498422c8be5ed2e6da2950c9

commit: 8ed29a77fa3130e0ada2aa269d8c59e32047dbb6 [log] [tgz]
author: Sameer Agarwal <sameeragarwal@google.com> Thu Jun 07 17:04:25 2012 -0700
committer: Sameer Agarwal <sameeragarwal@google.com> Thu Jun 07 17:55:03 2012 -0700
tree: 9d82a49d0b9d737fd1182c1c2936b53d8703e624
parent: cb83b2801e7a62dce6c1cd5431de2becf8aef05f [diff] [blame]
diff --git a/docs/solving.tex b/docs/solving.tex
index c83103c..ddc7bae 100644
--- a/docs/solving.tex
+++ b/docs/solving.tex

@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Solving}
 Effective use of Ceres requires some familiarity with the basic components of a nonlinear least squares solver. 
 
@@ -59,7 +59,7 @@
 \subsection{\texttt{DENSE\_QR}}
 For small problems (a couple of hundred parameters and a few thousand residuals) with relatively dense Jacobians, \texttt{DENSE\_QR} is the method of choice~\cite{bjorck1996numerical}. Let $J = QR$ be the QR-decomposition of $J$, where $Q$ is an orthonormal matrix and $R$ is an upper triangular matrix~\cite{trefethen1997numerical}. Then it can be shown that the solution to~\eqref{eq:normal} is given by
 \begin{align}
-	\Delta x^* = -R^{-1}Q^\top f
+    \Delta x^* = -R^{-1}Q^\top f
 \end{align}
 Ceres uses \texttt{Eigen}'s dense QR decomposition routines.
 
@@ -67,7 +67,7 @@
 \subsection{\texttt{SPARSE\_NORMAL\_CHOLESKY}}
 Large non-linear least square problems are usually sparse. In such cases, using a dense QR factorization is inefficient. Let $H = R^\top R$ be the Cholesky factorization of the normal equations, where $R$ is an upper triangular matrix, then the  solution to ~\eqref{eq:normal} is given by
 \begin{equation}
-	\Delta x^* = R^{-1} R^{-\top} g.
+    \Delta x^* = R^{-1} R^{-\top} g.
 \end{equation}
 The observant reader will note that the $R$ in the Cholesky factorization of $H$ is the same upper triangular matrix $R$ in the QR factorization of $J$. Since $Q$ is an orthonormal matrix, $J=QR$ implies that $J^\top J = R^\top Q^\top Q R = R^\top R$.
 
@@ -142,7 +142,7 @@
 \subsection{\texttt{CGNR}}
 For general sparse problems, if the problem is too large for \texttt{CHOLMOD} or a sparse linear algebra library is not linked into Ceres, another option is the \texttt{CGNR} solver. This solver uses the Conjugate Gradients solver on the {\em normal equations}, but without forming the normal equations explicitly. It exploits the relation
 \begin{align}
-	H x = J^\top J x = J^\top(J x)
+    H x = J^\top J x = J^\top(J x)
 \end{align}
 When the user chooses \texttt{ITERATIVE\_SCHUR} as the linear solver, Ceres automatically switches from the exact step algorithm to an inexact step algorithm.
 
@@ -198,21 +198,30 @@
 \texttt{Solver::Options} controls the overall behavior of the solver. We list the various settings and their default values below.
 
 \begin{enumerate}
-\item{\texttt{minimizer\_type}}(\texttt{LEVENBERG\_MARQUARDT}) The  minimization algorithm used by Ceres. \texttt{LEVENBERG\_MARQUARDT} is currently the only valid value.
+    
+\item{\texttt{trust\_region\_strategy\_type}}(\texttt{LEVENBERG\_MARQUARDT}) The  trust region step computation algorithm used by Ceres. \texttt{LEVENBERG\_MARQUARDT} is currently the only valid value.
 
 \item{\texttt{max\_num\_iterations}}(\texttt{50}) Maximum number of iterations for Levenberg-Marquardt.
 
-\item{\texttt{max\_solver\_time\_sec}}(\texttt{1e9}) Maximum amount of time (in seconds) for which the solver should run.
+\item{\texttt{max\_solver\_time\_sec}} ($10^9$) Maximum amount of time (in seconds) for which the solver should run.
 
 \item{\texttt{num\_threads}}(\texttt{1})
 Number of threads used by Ceres to evaluate the Jacobian.
 
-\item{\texttt{tau}}(\texttt{1e-4}) Initial value of the regularization parameter $\mu$ used by the Levenberg-Marquardt algorithm. The size of this parameter indicate the user's guess of how far the initial solution is from the minimum. Large values indicates that the solution is far away.
+\item{\texttt{initial\_trust\_region\_radius} ($10^4$)} The size of the initial trust region. When the \texttt{LEVENBERG\_MARQUARDT} strategy is used, the reciprocal of this number is the initial regularization parameter.
 
-\item{\texttt{min\_relative\_decrease}}(\texttt{1e-3}) Lower threshold for relative decrease before a Levenberg-Marquardt step is acceped.
+\item{\texttt{max\_trust\_region\_radius} ($10^{16}$)} The trust region radius is not allowed to grow beyond this value.
+\item{\texttt{max\_trust\_region\_radius} ($10^{-32}$)} The solver terminates, when the trust region becomes smaller than this value.
 
+\item{\texttt{min\_relative\_decrease}}($10^{-3}$) Lower threshold for relative decrease before a Levenberg-Marquardt step is acceped.
 
-\item{\texttt{function\_tolerance}}(\texttt{1e-6}) Solver terminates if
+\item{\texttt{lm\_min\_diagonal} ($10^6$)} The \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to regularize the the trust region step. This is the lower bound on the values of this diagonal matrix.
+
+\item{\texttt{lm\_max\_diagonal} ($10^{32}$)}  The \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to regularize the the trust region step. This is the upper bound on the values of this diagonal matrix.
+
+\item{\texttt{max\_num\_consecutive\_invalid\_steps} (5)} The step returned by a trust region strategy can sometimes be numerically invalid, usually because of conditioning issues. Instead of crashing or stopping the optimization, the optimizer can go ahead and try solving with a smaller trust region/better conditioned problem. This parameter sets the number of consecutive retries before the minimizer gives up.
+
+\item{\texttt{function\_tolerance}}($10^{-6}$) Solver terminates if
 \begin{align}
 \frac{|\Delta \text{cost}|}{\text{cost}} < \texttt{function\_tolerance}
 \end{align}
@@ -224,9 +233,9 @@
 \end{equation}
 where $\|\cdot\|_\infty$ refers to the max norm, and $x_0$ is the vector of initial parameter values.
 
-\item{\texttt{parameter\_tolerance}}(\texttt{1e-8}) Solver terminates if 
+\item{\texttt{parameter\_tolerance}}($10^{-8}$) Solver terminates if 
 \begin{equation}
-	\frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance}
+    \frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance}
 \end{equation}
 where $\Delta x$ is the step computed by the linear solver in the current iteration of Levenberg-Marquardt.
 
@@ -236,6 +245,8 @@
 
 \item{\texttt{preconditioner\_type}}(\texttt{JACOBI}) The preconditioner used by the iterative linear solver. The default is the block Jacobi preconditioner. Valid values are (in increasing order of complexity) \texttt{IDENTITY},\texttt{JACOBI}, \texttt{SCHUR\_JACOBI}, \texttt{CLUSTER\_JACOBI} and \texttt{CLUSTER\_TRIDIAGONAL}.
 
+\item{\texttt{sparse\_linear\_algebra\_library} (\texttt{SUITE\_SPARSE})} Ceres supports the use of two sparse linear algebra libraries, \texttt{SuiteSparse}, which is enabled by setting this parameter to \texttt{SUITE\_SPARSE} and \texttt{CXSparse}, which can be selected by setting this parameter to $\texttt{CX\_SPARSE}$. \texttt{SuiteSparse} is a sophisticated and complex sparse linear algebra library and should be used in general. If your needs/platforms prevent you from using \texttt{SuiteSparse}, consider using \texttt{CXSparse}, which is a much smaller, easier to build library. As can be expected, its performance on large problems is not comparable to that of \texttt{SuiteSparse}.
+
 \item{\texttt{num\_linear\_solver\_threads}}(\texttt{1}) Number of threads used by the linear solver. 
 
 \item{\texttt{num\_eliminate\_blocks}}(\texttt{0}) 
@@ -258,12 +269,20 @@
     to it if the \texttt{ordering\_type} is set to \texttt{USER} and the ordering vector is
     non-empty.
 
+\item{\texttt{use\_block\_amd} (\texttt{true})} By virtue of the modeling layer in Ceres being block oriented,
+all the matrices used by Ceres are also block oriented.
+When doing sparse direct factorization of these matrices, the
+fill-reducing ordering algorithms can either be run on the
+block or the scalar form of these matrices. Running it on the
+block form exposes more of the super-nodal structure of the
+matrix to the Cholesky factorization routines. This leads to
+substantial gains in factorization performance. Setting this parameter to true, enables the use of a block oriented Approximate Minimum Degree ordering algorithm. Settings it to \texttt{false}, uses a scalar AMD algorithm. This option only makes sense when using \texttt{sparse\_linear\_algebra\_library = SUITE\_SPARSE} as it uses the \texttt{AMD} package that is part of \texttt{SuiteSparse}.
 
 \item{\texttt{linear\_solver\_min\_num\_iterations}}(\texttt{1}) Minimum number of iterations used by the linear solver. This only makes sense when the linear solver is an iterative solver, e.g., \texttt{ITERATIVE\_SCHUR}.
 
 \item{\texttt{linear\_solver\_max\_num\_iterations}}(\texttt{500}) Minimum number of iterations used by the linear solver. This only makes sense when the linear solver is an iterative solver, e.g., \texttt{ITERATIVE\_SCHUR}.
 
-\item{\texttt{eta}}(\texttt{1e-1})
+\item{\texttt{eta}} ($10^{-1}$)
  Forcing sequence parameter. The truncated Newton solver uses
     this number to control the relative accuracy with which the
      Newton step is computed. This constant is passed to ConjugateGradientsSolver which uses
@@ -329,12 +348,12 @@
      results are compared, and if they differ substantially, details
      are printed to the log.
 
-\item{\texttt{gradient\_check\_relative\_precision}}(\texttt{1e-8})
+\item{\texttt{gradient\_check\_relative\_precision}} ($10^{-8}$)
   Relative precision to check for in the gradient checker. If the
   relative difference between an element in a Jacobian exceeds
   this number, then the Jacobian for that cost term is dumped.
 
-\item{\texttt{numeric\_derivative\_relative\_step\_size}}(\texttt{1e-6})
+\item{\texttt{numeric\_derivative\_relative\_step\_size}} ($10^{-6}$)
  Relative shift used for taking numeric derivatives. For finite
      differencing, each dimension is evaluated at slightly shifted
      values, \eg for forward differences, the numerical derivative is
@@ -348,8 +367,8 @@
      The finite differencing is done along each dimension. The
      reason to use a relative (rather than absolute) step size is
      that this way, numeric differentiation works for functions where
-     the arguments are typically large (e.g. 1e9) and when the
-     values are small (e.g. 1e-5). It is possible to construct
+     the arguments are typically large (e.g. $10^9$) and when the
+     values are small (e.g. $10^{-5}$). It is possible to construct
      "torture cases" which break this finite difference heuristic,
      but they do not come up often in practice.
commit	8ed29a77fa3130e0ada2aa269d8c59e32047dbb6	[log] [tgz]
author	Sameer Agarwal <sameeragarwal@google.com>	Thu Jun 07 17:04:25 2012 -0700
committer	Sameer Agarwal <sameeragarwal@google.com>	Thu Jun 07 17:55:03 2012 -0700
tree	9d82a49d0b9d737fd1182c1c2936b53d8703e624
parent	cb83b2801e7a62dce6c1cd5431de2becf8aef05f [diff] [blame]