Documentation update.

1. Renamed ceres.tex to ceres-solver.tex
2. Updated Solver::Options docs to reflect the recent changes.
3. Updated .gitignore to ignore ceres-solver.pdf

Change-Id: Iea19f8ff5fa1638a498422c8be5ed2e6da2950c9
diff --git a/.gitignore b/.gitignore
index 7935785..c1c6c08 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,3 +13,4 @@
 *.log
 *.synctex.gz
 ceres.pdf
+ceres-solver.pdf
diff --git a/docs/build.tex b/docs/build.tex
index 4cb50f3..f329050 100644
--- a/docs/build.tex
+++ b/docs/build.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Building Ceres}
 \label{chapter:build}
 Ceres source code and documentation are hosted at
diff --git a/docs/bundleadjustment.tex b/docs/bundleadjustment.tex
index 070e09a..5a1298c 100644
--- a/docs/bundleadjustment.tex
+++ b/docs/bundleadjustment.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Bundle Adjustment}
 \label{chapter:tutorial:bundleadjustment}
 One of the main reasons for writing Ceres was our need to solve large scale bundle adjustment problems~\cite{hartley-zisserman-book-2004,triggs-etal-1999}.
diff --git a/docs/ceres.tex b/docs/ceres-solver.tex
similarity index 99%
rename from docs/ceres.tex
rename to docs/ceres-solver.tex
index 870fd99..994fb29 100644
--- a/docs/ceres.tex
+++ b/docs/ceres-solver.tex
@@ -103,6 +103,7 @@
 
 Despite our best efforts, this manual remains a work in progress and the source code for Ceres Solver remains the ultimate reference.
 
+\input{changes}
 \input{introduction}
 \input{build}
 
diff --git a/docs/changes.tex b/docs/changes.tex
new file mode 100644
index 0000000..7fca3ee
--- /dev/null
+++ b/docs/changes.tex
@@ -0,0 +1,52 @@
+%!TEX root = ceres-solver.tex
+
+\chapter{Version History}
+\section*{1.2.0}
+\subsection{New Features}
+\begin{itemize}
+\item \texttt{CXSparse} support.
+\item Block oriented fill reducing orderings. This
+reduces the factorization time for sparse
+\texttt{CHOLMOD} significantly.
+\item New Trust region loop with support for multiple
+trust region step strategies. Currently only Levenberg-Marquardt is supported, but this refactoring opens the door for Dog-leg, Stiehaug and others.
+\item \texttt{Cmake} file restructuring.  Builds in \texttt{Release} mode by default, and now has platform specific tuning flags.
+\item Re-organized documentation. No new content, but better organization.
+\end{itemize}
+
+\subsection{Bug Fixes}
+\begin{itemize}
+\item Fixed integer overflow bug in \texttt{block\_random\_access\_sparse\_matrix.cc}.
+\item Renamed some macros to prevent name conflicts.
+\item Fixed incorrent input to \texttt{StateUpdatingCallback}.
+\item Fixes to AutoDiff tests.
+\item Various internal cleanups.
+\end{itemize}
+
+\section*{1.1.1}
+\subsection{Bug Fixes}
+\begin{itemize}
+\item Fix a bug in the handling of constant blocks. Thanks to Louis Simard for reporting this.
+\item Add an optional lower bound to the Levenberg-Marquardt regularizer to prevent oscillating between well and ill posed linear problems.
+\item Some internal refactoring and test fixes.
+\end{itemize}
+\section{1.1.0}
+\subsection{New Features}
+\begin{itemize}
+\item New iterative linear solver for general sparse problems - \texttt{CGNR} and a block Jacobi preconditioner for it.
+\item Changed the semantics of how \texttt{SuiteSparse} dependencies are checked and used. Now \texttt{SuiteSparse} is built by default, only if all of its dependencies are present.
+\item Automatic differentiation now supports dynamic number of residuals.
+\item Support for writing the linear least squares problems to disk in text format so that they can loaded into \texttt{MATLAB}.
+\item Linear solver results are now checked for nan and infinities.
+\item Added \texttt{.gitignore} file.
+\item A better more robust build system.
+\end{itemize}
+
+\subsection{Bug Fixes}
+\begin{itemize}
+\item Fixed a strict weak ordering bug in the schur ordering.
+\item Grammar and typos in the documents and code comments.
+\item Fixed tests which depended on exact equality between floating point values.
+\end{itemize}
+\section*{1.0.0}
+Initial Release.
\ No newline at end of file
diff --git a/docs/curvefitting.tex b/docs/curvefitting.tex
index ebd9e94..07ccd48 100644
--- a/docs/curvefitting.tex
+++ b/docs/curvefitting.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Fitting a Curve to Data}
 \label{chapter:tutorial:curvefitting}
 The examples we have seen until now are simple optimization problems with no data. The original purpose of least squares and non-linear least squares analysis was fitting curves to data. It is only appropriate that we now consider an example of such a problem\footnote{The full code and data for this example can be found in
diff --git a/docs/faq.tex b/docs/faq.tex
index 401439f..162668a 100644
--- a/docs/faq.tex
+++ b/docs/faq.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Frequently Asked Questions}
 \label{chapter:faq}
 \newcomment{Question}
diff --git a/docs/further.tex b/docs/further.tex
index b832e90..67f8227 100644
--- a/docs/further.tex
+++ b/docs/further.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Further Reading}
 \label{chapter:further}
  For a short but informative introduction to the subject we recommend the booklet by Madsel et al.~\cite{madsen2004methods}. For a general introduction to non-linear optimization we recommend the text by Nocedal \& Wright~\cite{nocedal2000numerical}. Bj{\"o}rck's book remains the seminal reference on least squares problems~\cite{bjorck1996numerical}. Trefethen \& Bau's book is our favourite text on introductory numerical linear algebra~\cite{trefethen1997numerical}. Triggs et al., provide a thorough coverage of the bundle adjustment problem~\cite{triggs-etal-1999}.
diff --git a/docs/helloworld.tex b/docs/helloworld.tex
index 2caaf14..e0d5f1d 100644
--- a/docs/helloworld.tex
+++ b/docs/helloworld.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Hello World!}
 \label{chapter:tutorial:helloworld}
 To get started, let us consider the problem of finding the minimum of the function
diff --git a/docs/introduction.tex b/docs/introduction.tex
index a941a36..65afef3 100644
--- a/docs/introduction.tex
+++ b/docs/introduction.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Introduction}
 \label{chapter:introduction}
 Ceres Solver\footnote{For brevity, in the rest of this document we will just use the term Ceres.} is a non-linear least squares solver developed at Google. It is designed to solve small and large sparse problems accurately and efficiently~\footnote{For a gentle but brief introduction to non-liner least squares problems, please start by reading the~\hyperref[part:tutorial]{Tutorial}}. Amongst its various features is a simple but expressive API with support for automatic differentiation, robust norms, local parameterizations, automatic gradient checking, multithreading and automatic problem structure detection.
diff --git a/docs/modeling.tex b/docs/modeling.tex
index 88003c9..5cd199c 100644
--- a/docs/modeling.tex
+++ b/docs/modeling.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Modeling}
 \label{chapter:api}
 \section{\texttt{CostFunction}}
diff --git a/docs/nnlsq.tex b/docs/nnlsq.tex
index ac3b372..5de8131 100644
--- a/docs/nnlsq.tex
+++ b/docs/nnlsq.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Non-linear Least Squares}
 \label{chapter:tutorial:nonlinsq}
 Let $x \in \reals^n$ be an $n$-dimensional vector of variables, and
diff --git a/docs/powell.tex b/docs/powell.tex
index d8aa8c8..7fc94a3 100644
--- a/docs/powell.tex
+++ b/docs/powell.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Powell's Function}
 \label{chapter:tutorial:powell}
 Consider now a slightly more complicated example -- the minimization of Powell's function. Let $x = \left[x_1, x_2, x_3, x_4 \right]$ and
diff --git a/docs/reference-overview.tex b/docs/reference-overview.tex
index 3058460..dac0ef2 100644
--- a/docs/reference-overview.tex
+++ b/docs/reference-overview.tex
@@ -1,18 +1,18 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Overview}
 \label{chapter:overview}
 Ceres solves robustified non-linear least squares problems of the form 
 \begin{equation}
-	\frac{1}{2}\sum_{i=1} \rho_i\left(\left\|f_i\left(x_{i_1},\hdots,x_{i_k}\right)\right\|^2\right).
-	\label{eq:ceresproblem}
+\frac{1}{2}\sum_{i=1} \rho_i\left(\left\|f_i\left(x_{i_1},\hdots,x_{i_k}\right)\right\|^2\right).
+\label{eq:ceresproblem}
 \end{equation}
 Where $f_i(\cdot)$ is a  cost function that depends on the parameter blocks $\left[x_{i_1}, \hdots , x_{i_k}\right]$ and  $\rho_i$ is a loss function. In most optimization problems small groups of scalars occur together. For example the three components of a translation vector and the four components of the quaternion that define the pose of a camera. We refer to such a group of small scalars as a Parameter Block. Of course a parameter block can just have a single parameter. 
 The term $ \rho_i\left(\left\|f_i\left(x_{i_1},\hdots,x_{i_k}\right)\right\|^2\right)$ is known as a Residual Block. A Ceres problem is a collection of residual blocks, each of which depends on a subset of the parameter blocks.
 
 Solving problems using Ceres consists of two steps.
 \begin{enumerate}
-	\item{Modeling} Define parameter blocks and  residual blocks and build a \texttt{Problem} object containing them.
-	\item{Solving} Configure and run the solver.
+\item{Modeling} Define parameter blocks and  residual blocks and build a \texttt{Problem} object containing them.
+\item{Solving} Configure and run the solver.
 \end{enumerate}
 
 These two steps are mostly independent of each other. This is by design. Modeling the optimization problem should not depend on how the solver and the user should be able to switch between various solver settings and strategies without changing the way the problem is modeled. In the next two chapters we will consider each of these steps in detail.
\ No newline at end of file
diff --git a/docs/solving.tex b/docs/solving.tex
index c83103c..ddc7bae 100644
--- a/docs/solving.tex
+++ b/docs/solving.tex
@@ -1,4 +1,4 @@
-%!TEX root = ceres.tex
+%!TEX root = ceres-solver.tex
 \chapter{Solving}
 Effective use of Ceres requires some familiarity with the basic components of a nonlinear least squares solver. 
 
@@ -59,7 +59,7 @@
 \subsection{\texttt{DENSE\_QR}}
 For small problems (a couple of hundred parameters and a few thousand residuals) with relatively dense Jacobians, \texttt{DENSE\_QR} is the method of choice~\cite{bjorck1996numerical}. Let $J = QR$ be the QR-decomposition of $J$, where $Q$ is an orthonormal matrix and $R$ is an upper triangular matrix~\cite{trefethen1997numerical}. Then it can be shown that the solution to~\eqref{eq:normal} is given by
 \begin{align}
-	\Delta x^* = -R^{-1}Q^\top f
+    \Delta x^* = -R^{-1}Q^\top f
 \end{align}
 Ceres uses \texttt{Eigen}'s dense QR decomposition routines.
 
@@ -67,7 +67,7 @@
 \subsection{\texttt{SPARSE\_NORMAL\_CHOLESKY}}
 Large non-linear least square problems are usually sparse. In such cases, using a dense QR factorization is inefficient. Let $H = R^\top R$ be the Cholesky factorization of the normal equations, where $R$ is an upper triangular matrix, then the  solution to ~\eqref{eq:normal} is given by
 \begin{equation}
-	\Delta x^* = R^{-1} R^{-\top} g.
+    \Delta x^* = R^{-1} R^{-\top} g.
 \end{equation}
 The observant reader will note that the $R$ in the Cholesky factorization of $H$ is the same upper triangular matrix $R$ in the QR factorization of $J$. Since $Q$ is an orthonormal matrix, $J=QR$ implies that $J^\top J = R^\top Q^\top Q R = R^\top R$.
 
@@ -142,7 +142,7 @@
 \subsection{\texttt{CGNR}}
 For general sparse problems, if the problem is too large for \texttt{CHOLMOD} or a sparse linear algebra library is not linked into Ceres, another option is the \texttt{CGNR} solver. This solver uses the Conjugate Gradients solver on the {\em normal equations}, but without forming the normal equations explicitly. It exploits the relation
 \begin{align}
-	H x = J^\top J x = J^\top(J x)
+    H x = J^\top J x = J^\top(J x)
 \end{align}
 When the user chooses \texttt{ITERATIVE\_SCHUR} as the linear solver, Ceres automatically switches from the exact step algorithm to an inexact step algorithm.
 
@@ -198,21 +198,30 @@
 \texttt{Solver::Options} controls the overall behavior of the solver. We list the various settings and their default values below.
 
 \begin{enumerate}
-\item{\texttt{minimizer\_type}}(\texttt{LEVENBERG\_MARQUARDT}) The  minimization algorithm used by Ceres. \texttt{LEVENBERG\_MARQUARDT} is currently the only valid value.
+    
+\item{\texttt{trust\_region\_strategy\_type}}(\texttt{LEVENBERG\_MARQUARDT}) The  trust region step computation algorithm used by Ceres. \texttt{LEVENBERG\_MARQUARDT} is currently the only valid value.
 
 \item{\texttt{max\_num\_iterations}}(\texttt{50}) Maximum number of iterations for Levenberg-Marquardt.
 
-\item{\texttt{max\_solver\_time\_sec}}(\texttt{1e9}) Maximum amount of time (in seconds) for which the solver should run.
+\item{\texttt{max\_solver\_time\_sec}} ($10^9$) Maximum amount of time (in seconds) for which the solver should run.
 
 \item{\texttt{num\_threads}}(\texttt{1})
 Number of threads used by Ceres to evaluate the Jacobian.
 
-\item{\texttt{tau}}(\texttt{1e-4}) Initial value of the regularization parameter $\mu$ used by the Levenberg-Marquardt algorithm. The size of this parameter indicate the user's guess of how far the initial solution is from the minimum. Large values indicates that the solution is far away.
+\item{\texttt{initial\_trust\_region\_radius} ($10^4$)} The size of the initial trust region. When the \texttt{LEVENBERG\_MARQUARDT} strategy is used, the reciprocal of this number is the initial regularization parameter.
 
-\item{\texttt{min\_relative\_decrease}}(\texttt{1e-3}) Lower threshold for relative decrease before a Levenberg-Marquardt step is acceped.
+\item{\texttt{max\_trust\_region\_radius} ($10^{16}$)} The trust region radius is not allowed to grow beyond this value.
+\item{\texttt{max\_trust\_region\_radius} ($10^{-32}$)} The solver terminates, when the trust region becomes smaller than this value.
 
+\item{\texttt{min\_relative\_decrease}}($10^{-3}$) Lower threshold for relative decrease before a Levenberg-Marquardt step is acceped.
 
-\item{\texttt{function\_tolerance}}(\texttt{1e-6}) Solver terminates if
+\item{\texttt{lm\_min\_diagonal} ($10^6$)} The \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to regularize the the trust region step. This is the lower bound on the values of this diagonal matrix.
+
+\item{\texttt{lm\_max\_diagonal} ($10^{32}$)}  The \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to regularize the the trust region step. This is the upper bound on the values of this diagonal matrix.
+
+\item{\texttt{max\_num\_consecutive\_invalid\_steps} (5)} The step returned by a trust region strategy can sometimes be numerically invalid, usually because of conditioning issues. Instead of crashing or stopping the optimization, the optimizer can go ahead and try solving with a smaller trust region/better conditioned problem. This parameter sets the number of consecutive retries before the minimizer gives up.
+
+\item{\texttt{function\_tolerance}}($10^{-6}$) Solver terminates if
 \begin{align}
 \frac{|\Delta \text{cost}|}{\text{cost}} < \texttt{function\_tolerance}
 \end{align}
@@ -224,9 +233,9 @@
 \end{equation}
 where $\|\cdot\|_\infty$ refers to the max norm, and $x_0$ is the vector of initial parameter values.
 
-\item{\texttt{parameter\_tolerance}}(\texttt{1e-8}) Solver terminates if 
+\item{\texttt{parameter\_tolerance}}($10^{-8}$) Solver terminates if 
 \begin{equation}
-	\frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance}
+    \frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance}
 \end{equation}
 where $\Delta x$ is the step computed by the linear solver in the current iteration of Levenberg-Marquardt.
 
@@ -236,6 +245,8 @@
 
 \item{\texttt{preconditioner\_type}}(\texttt{JACOBI}) The preconditioner used by the iterative linear solver. The default is the block Jacobi preconditioner. Valid values are (in increasing order of complexity) \texttt{IDENTITY},\texttt{JACOBI}, \texttt{SCHUR\_JACOBI}, \texttt{CLUSTER\_JACOBI} and \texttt{CLUSTER\_TRIDIAGONAL}.
 
+\item{\texttt{sparse\_linear\_algebra\_library} (\texttt{SUITE\_SPARSE})} Ceres supports the use of two sparse linear algebra libraries, \texttt{SuiteSparse}, which is enabled by setting this parameter to \texttt{SUITE\_SPARSE} and \texttt{CXSparse}, which can be selected by setting this parameter to $\texttt{CX\_SPARSE}$. \texttt{SuiteSparse} is a sophisticated and complex sparse linear algebra library and should be used in general. If your needs/platforms prevent you from using \texttt{SuiteSparse}, consider using \texttt{CXSparse}, which is a much smaller, easier to build library. As can be expected, its performance on large problems is not comparable to that of \texttt{SuiteSparse}.
+
 \item{\texttt{num\_linear\_solver\_threads}}(\texttt{1}) Number of threads used by the linear solver. 
 
 \item{\texttt{num\_eliminate\_blocks}}(\texttt{0}) 
@@ -258,12 +269,20 @@
     to it if the \texttt{ordering\_type} is set to \texttt{USER} and the ordering vector is
     non-empty.
 
+\item{\texttt{use\_block\_amd} (\texttt{true})} By virtue of the modeling layer in Ceres being block oriented,
+all the matrices used by Ceres are also block oriented.
+When doing sparse direct factorization of these matrices, the
+fill-reducing ordering algorithms can either be run on the
+block or the scalar form of these matrices. Running it on the
+block form exposes more of the super-nodal structure of the
+matrix to the Cholesky factorization routines. This leads to
+substantial gains in factorization performance. Setting this parameter to true, enables the use of a block oriented Approximate Minimum Degree ordering algorithm. Settings it to \texttt{false}, uses a scalar AMD algorithm. This option only makes sense when using \texttt{sparse\_linear\_algebra\_library = SUITE\_SPARSE} as it uses the \texttt{AMD} package that is part of \texttt{SuiteSparse}.
 
 \item{\texttt{linear\_solver\_min\_num\_iterations}}(\texttt{1}) Minimum number of iterations used by the linear solver. This only makes sense when the linear solver is an iterative solver, e.g., \texttt{ITERATIVE\_SCHUR}.
 
 \item{\texttt{linear\_solver\_max\_num\_iterations}}(\texttt{500}) Minimum number of iterations used by the linear solver. This only makes sense when the linear solver is an iterative solver, e.g., \texttt{ITERATIVE\_SCHUR}.
 
-\item{\texttt{eta}}(\texttt{1e-1})
+\item{\texttt{eta}} ($10^{-1}$)
  Forcing sequence parameter. The truncated Newton solver uses
     this number to control the relative accuracy with which the
      Newton step is computed. This constant is passed to ConjugateGradientsSolver which uses
@@ -329,12 +348,12 @@
      results are compared, and if they differ substantially, details
      are printed to the log.
 
-\item{\texttt{gradient\_check\_relative\_precision}}(\texttt{1e-8})
+\item{\texttt{gradient\_check\_relative\_precision}} ($10^{-8}$)
   Relative precision to check for in the gradient checker. If the
   relative difference between an element in a Jacobian exceeds
   this number, then the Jacobian for that cost term is dumped.
 
-\item{\texttt{numeric\_derivative\_relative\_step\_size}}(\texttt{1e-6})
+\item{\texttt{numeric\_derivative\_relative\_step\_size}} ($10^{-6}$)
  Relative shift used for taking numeric derivatives. For finite
      differencing, each dimension is evaluated at slightly shifted
      values, \eg for forward differences, the numerical derivative is
@@ -348,8 +367,8 @@
      The finite differencing is done along each dimension. The
      reason to use a relative (rather than absolute) step size is
      that this way, numeric differentiation works for functions where
-     the arguments are typically large (e.g. 1e9) and when the
-     values are small (e.g. 1e-5). It is possible to construct
+     the arguments are typically large (e.g. $10^9$) and when the
+     values are small (e.g. $10^{-5}$). It is possible to construct
      "torture cases" which break this finite difference heuristic,
      but they do not come up often in practice.