Sameer Agarwal | 8ed29a7 | 2012-06-07 17:04:25 -0700 | [diff] [blame] | 1 | %!TEX root = ceres-solver.tex |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 2 | \chapter{Solving} |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 3 | Effective use of Ceres requires some familiarity with the basic components of a nonlinear least squares solver, so before we describe how to configure the solver, we will begin by taking a brief look at how some of the core optimization algorithms in Ceres work and the various linear solvers and preconditioners that power it. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 4 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 5 | \section{Trust Region Methods} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 6 | \label{sec:trust-region} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 7 | Let $x \in \mathbb{R}^{n}$ be an $n$-dimensional vector of variables, and |
| 8 | $ F(x) = \left[f_1(x), \hdots, f_{m}(x) \right]^{\top}$ be a $m$-dimensional function of $x$. We are interested in solving the following optimization problem~\footnote{At the level of the non-linear solver, the block and residual structure is not relevant, therefore our discussion here is in terms of an optimization problem defined over a state vector of size $n$.}, |
| 9 | \begin{equation} |
| 10 | \arg \min_x \frac{1}{2}\|F(x)\|^2\ . |
| 11 | \label{eq:nonlinsq} |
| 12 | \end{equation} |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 13 | Here, the Jacobian $J(x)$ of $F(x)$ is an $m\times n$ matrix, where $J_{ij}(x) = \partial_j f_i(x)$ and the gradient vector $g(x) = \nabla \frac{1}{2}\|F(x)\|^2 = J(x)^\top F(x)$. Since the efficient global optimization of~\eqref{eq:nonlinsq} for general $F(x)$ is an intractable problem, we will have to settle for finding a local minimum. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 14 | |
| 15 | The general strategy when solving non-linear optimization problems is to solve a sequence of approximations to the original problem~\cite{nocedal2000numerical}. At each iteration, the approximation is solved to determine a correction $\Delta x$ to the vector $x$. For non-linear least squares, an approximation can be constructed by using the linearization $F(x+\Delta x) \approx F(x) + J(x)\Delta x$, which leads to the following linear least squares problem: |
| 16 | \begin{equation} |
| 17 | \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 |
| 18 | \label{eq:linearapprox} |
| 19 | \end{equation} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 20 | Unfortunately, na\"ively solving a sequence of these problems and |
| 21 | updating $x \leftarrow x+ \Delta x$ leads to an algorithm that may not |
| 22 | converge. To get a convergent algorithm, we need to control the size |
| 23 | of the step $\Delta x$. And this is where the idea of a trust-region |
| 24 | comes in. Algorithm~\ref{alg:trust-region} describes the basic trust-region loop for non-linear least squares problems. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 25 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 26 | \begin{algorithm} |
| 27 | \caption{The basic trust-region algorithm.\label{alg:trust-region}} |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 28 | \begin{algorithmic} |
| 29 | \REQUIRE Initial point $x$ and a trust region radius $\mu$. |
| 30 | \LOOP |
| 31 | \STATE{Solve $\arg \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2$ s.t. $\|D(x)\Delta x\|^2 \le \mu$} |
| 32 | \STATE{$\rho = \frac{\displaystyle \|F(x + \Delta x)\|^2 - \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 - \|F(x)\|^2}$} |
| 33 | \IF {$\rho > \epsilon$} |
| 34 | \STATE{$x = x + \Delta x$} |
| 35 | \ENDIF |
| 36 | \IF {$\rho > \eta_1$} |
| 37 | \STATE{$\rho = 2 * \rho$} |
| 38 | \ELSE |
| 39 | \IF {$\rho < \eta_2$} |
| 40 | \STATE {$\rho = 0.5 * \rho$} |
| 41 | \ENDIF |
| 42 | \ENDIF |
| 43 | \ENDLOOP |
| 44 | \end{algorithmic} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 45 | \end{algorithm} |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 46 | |
| 47 | Here, $\mu$ is the trust region radius, $D(x)$ is some matrix used to define a metric on the domain of $F(x)$ and $\rho$ measures the quality of the step $\Delta x$, i.e., how well did the linear model predict the decrease in the value of the non-linear objective. The idea is to increase or decrease the radius of the trust region depending on how well the linearization predicts the behavior of the non-linear objective, which in turn is reflected in the value of $\rho$. |
| 48 | |
| 49 | The key computational step in a trust-region algorithm is the solution of the constrained optimization problem |
| 50 | \begin{align} |
| 51 | \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 \\ |
| 52 | \text{such that}&\quad \|D(x)\Delta x\|^2 \le \mu |
| 53 | \label{eq:trp} |
| 54 | \end{align} |
| 55 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 56 | There are a number of different ways of solving this problem, each giving rise to a different concrete trust-region algorithm. Currently Ceres, implements two trust-region algorithms - Levenberg-Marquardt and Dogleg. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 57 | |
| 58 | \subsection{Levenberg-Marquardt} |
| 59 | The Levenberg-Marquardt algorithm~\cite{levenberg1944method, marquardt1963algorithm} is the most popular algorithm for solving non-linear least squares problems. It was also the first trust region algorithm to be developed~\cite{levenberg1944method,marquardt1963algorithm}. Ceres implements an exact step~\cite{madsen2004methods} and an inexact step variant of the Levenberg-Marquardt algorithm~\cite{wright1985inexact,nash1990assessing}. |
| 60 | |
| 61 | It can be shown, that the solution to~\eqref{eq:trp} can be obtained by solving an unconstrained optimization of the form |
| 62 | \begin{align} |
| 63 | \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda \|D(x)\Delta x\|^2 |
| 64 | \end{align} |
| 65 | Where, $\lambda$ is a Lagrange multiplier that is inverse related to $\mu$. In Ceres, we solve for |
| 66 | \begin{align} |
| 67 | \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2 |
| 68 | \label{eq:lsqr} |
| 69 | \end{align} |
| 70 | The matrix $D(x)$ is a non-negative diagonal matrix, typically the square root of the diagonal of the matrix $J(x)^\top J(x)$. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 71 | |
| 72 | Before going further, let us make some notational simplifications. We will assume that the matrix $\sqrt{\mu} D$ has been concatenated at the bottom of the matrix $J$ and similarly a vector of zeros has been added to the bottom of the vector $f$ and the rest of our discussion will be in terms of $J$ and $f$, \ie the linear least squares problem. |
| 73 | \begin{align} |
| 74 | \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 . |
| 75 | \label{eq:simple} |
| 76 | \end{align} |
| 77 | For all but the smallest problems the solution of~\eqref{eq:simple} in each iteration of the Levenberg-Marquardt algorithm is the dominant computational cost in Ceres. Ceres provides a number of different options for solving~\eqref{eq:simple}. There are two major classes of methods - factorization and iterative. |
| 78 | |
| 79 | The factorization methods are based on computing an exact solution of~\eqref{eq:lsqr} using a Cholesky or a QR factorization and lead to an exact step Levenberg-Marquardt algorithm. But it is not clear if an exact solution of~\eqref{eq:lsqr} is necessary at each step of the LM algorithm to solve~\eqref{eq:nonlinsq}. In fact, we have already seen evidence that this may not be the case, as~\eqref{eq:lsqr} is itself a regularized version of~\eqref{eq:linearapprox}. Indeed, it is possible to construct non-linear optimization algorithms in which the linearized problem is solved approximately. These algorithms are known as inexact Newton or truncated Newton methods~\cite{nocedal2000numerical}. |
| 80 | |
| 81 | An inexact Newton method requires two ingredients. First, a cheap method for approximately solving systems of linear equations. Typically an iterative linear solver like the Conjugate Gradients method is used for this purpose~\cite{nocedal2000numerical}. Second, a termination rule for the iterative solver. A typical termination rule is of the form |
| 82 | \begin{equation} |
| 83 | \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|. \label{eq:inexact} |
| 84 | \end{equation} |
| 85 | Here, $k$ indicates the Levenberg-Marquardt iteration number and $0 < \eta_k <1$ is known as the forcing sequence. Wright \& Holt \cite{wright1985inexact} prove that a truncated Levenberg-Marquardt algorithm that uses an inexact Newton step based on~\eqref{eq:inexact} converges for any sequence $\eta_k \leq \eta_0 < 1$ and the rate of convergence depends on the choice of the forcing sequence $\eta_k$. |
| 86 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 87 | Ceres supports both exact and inexact step solution strategies. When the user chooses a factorization based linear solver, the exact step Levenberg-Marquardt algorithm is used. When the user chooses an iterative linear solver, the inexact step Levenberg-Marquardt algorithm is used. |
| 88 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 89 | \subsection{Dogleg} |
| 90 | \label{sec:dogleg} |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 91 | Another strategy for solving the trust region problem~\eqref{eq:trp} was introduced by M. J. D. Powell. The key idea there is to compute two vectors |
| 92 | \begin{align} |
| 93 | \Delta x^{\text{Gauss-Newton}} &= \arg \min_{\Delta x}\frac{1}{2} \|J(x)\Delta x + f(x)\|^2.\\ |
| 94 | \Delta x^{\text{Cauchy}} &= -\frac{\|g(x)\|^2}{\|J(x)g(x)\|^2}g(x). |
| 95 | \end{align} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 96 | Note that the vector $\Delta x^{\text{Gauss-Newton}}$ is the solution |
| 97 | to~\eqref{eq:linearapprox} and $\Delta x^{\text{Cauchy}}$ is the |
| 98 | vector that minimizes the linear approximation if we restrict |
| 99 | ourselves to moving along the direction of the gradient. Dogleg methods finds a vector $\Delta x$ defined by $\Delta |
| 100 | x^{\text{Gauss-Newton}}$ and $\Delta x^{\text{Cauchy}}$ that solves |
| 101 | the trust region problem. Ceres supports two |
| 102 | variants. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 103 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 104 | \texttt{TRADITIONAL\_DOGLEG} as described by Powell, |
| 105 | constructs two line segments using the Gauss-Newton and Cauchy vectors |
| 106 | and finds the point farthest along this line shaped like a dogleg |
| 107 | (hence the name) that is contained in the |
| 108 | trust-region. For more details on the exact reasoning and computations, please see Madsen et al~\cite{madsen2004methods}. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 109 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 110 | \texttt{SUBSPACE\_DOGLEG} is a more sophisticated method |
| 111 | that considers the entire two dimensional subspace spanned by these |
| 112 | two vectors and finds the point that minimizes the trust region |
| 113 | problem in this subspace\cite{byrd1988approximate}. |
| 114 | |
| 115 | The key advantage of the Dogleg over Levenberg Marquardt is that if the step computation for a particular choice of $\mu$ does not result in sufficient decrease in the value of the objective function, Levenberg-Marquardt solves the linear approximation from scratch with a smaller value of $\mu$. Dogleg on the other hand, only needs to compute the interpolation between the Gauss-Newton and the Cauchy vectors, as neither of them depend on the value of $\mu$. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 116 | |
| 117 | The Dogleg method can only be used with the exact factorization based linear solvers. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 118 | |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 119 | \subsection{Inner Iterations} |
| 120 | \label{sec:inner} |
| 121 | Some non-linear least squares problems have additional structure in |
| 122 | the way the parameter blocks interact that it is beneficial to modify |
| 123 | the way the trust region step is computed. e.g., consider the |
| 124 | following regression problem |
| 125 | |
| 126 | \begin{equation} |
| 127 | y = a_1 e^{b_1 x} + a_2 e^{b_3 x^2 + c_1} |
| 128 | \end{equation} |
| 129 | |
| 130 | Given a set of pairs $\{(x_i, y_i)\}$, the user wishes to estimate |
| 131 | $a_1, a_2, b_1, b_2$, and $c_1$. |
| 132 | |
| 133 | Notice that the expression on the left is linear in $a_1$ and $a_2$, |
| 134 | and given any value for $b_1$, $b_2$ and $c_1$, it is possible to use |
| 135 | linear regression to estimate the optimal values of $a_1$ and |
| 136 | $a_2$. It's possible to analytically eliminate the variables |
| 137 | $a_1$ and $a_2$ from the problem entirely. Problems like these are |
| 138 | known as separable least squares problem and the most famous algorithm |
| 139 | for solving them is the Variable Projection algorithm invented by |
| 140 | Golub \& Pereyra~\cite{golub-pereyra-73}. |
| 141 | |
| 142 | Similar structure can be found in the matrix factorization with |
| 143 | missing data problem. There the corresponding algorithm is |
| 144 | known as Wiberg's algorithm~\cite{wiberg}. |
| 145 | |
| 146 | Ruhe \& Wedin present an analysis of |
| 147 | various algorithms for solving separable non-linear least |
| 148 | squares problems and refer to {\em Variable Projection} as |
| 149 | Algorithm I in their paper~\cite{ruhe-wedin}. |
| 150 | |
| 151 | Implementing Variable Projection is tedious and expensive. Ruhe \& |
| 152 | Wedin present a simpler algorithm with comparable convergence |
| 153 | properties, which they call Algorithm II. Algorithm II performs an |
| 154 | additional optimization step to estimate $a_1$ and $a_2$ exactly after |
| 155 | computing a successful Newton step. |
| 156 | |
| 157 | |
| 158 | This idea can be generalized to cases where the residual is not |
| 159 | linear in $a_1$ and $a_2$, i.e., |
| 160 | |
| 161 | \begin{equation} |
| 162 | y = f_1(a_1, e^{b_1 x}) + f_2(a_2, e^{b_3 x^2 + c_1}) |
| 163 | \end{equation} |
| 164 | |
| 165 | In this case, we solve for the trust region step for the full problem, |
| 166 | and then use it as the starting point to further optimize just $a_1$ |
| 167 | and $a_2$. For the linear case, this amounts to doing a single linear |
| 168 | least squares solve. For non-linear problems, any method for solving |
| 169 | the $a_1$ and $a_2$ optimization problems will do. The only constraint |
| 170 | on $a_1$ and $a_2$ (if they are two different parameter block) is that |
| 171 | they do not co-occur in a residual block. |
| 172 | |
Sameer Agarwal | ba8d967 | 2012-10-02 00:48:57 -0700 | [diff] [blame] | 173 | This idea can be further generalized, by not just optimizing $(a_1, |
| 174 | a_2)$, but decomposing the graph corresponding to the Hessian matrix's |
| 175 | sparsity structure into a collection of non-overlapping independent sets |
| 176 | and optimizing each of them. |
| 177 | |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 178 | Setting \texttt{Solver::Options::use\_inner\_iterations} to true |
Sameer Agarwal | ba8d967 | 2012-10-02 00:48:57 -0700 | [diff] [blame] | 179 | enables |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 180 | the use of this non-linear generalization of Ruhe \& Wedin's Algorithm |
| 181 | II. This version of Ceres has a higher iteration complexity, but also |
| 182 | displays better convergence behavior per iteration. |
| 183 | |
Sameer Agarwal | ba8d967 | 2012-10-02 00:48:57 -0700 | [diff] [blame] | 184 | Setting \texttt{Solver::Options::num\_threads} to the maximum number |
| 185 | possible is highly recommended. |
| 186 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 187 | \subsection{Non-monotonic Steps} |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 188 | \label{sec:non-monotonic} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 189 | Note that the basic trust-region algorithm described in |
| 190 | Algorithm~\ref{alg:trust-region} is a descent algorithm in that they |
| 191 | only accepts a point if it strictly reduces the value of the objective |
| 192 | function. |
| 193 | |
| 194 | Relaxing this requirement allows the algorithm to be more |
| 195 | efficient in the long term at the cost of some local increase |
| 196 | in the value of the objective function. |
| 197 | |
| 198 | This is because allowing for non-decreasing objective function |
| 199 | values in a princpled manner allows the algorithm to ``jump over |
| 200 | boulders'' as the method is not restricted to move into narrow |
| 201 | valleys while preserving its convergence properties. |
| 202 | |
| 203 | Setting \texttt{Solver::Options::use\_nonmonotonic\_steps} to \texttt{true} |
| 204 | enables the non-monotonic trust region algorithm as described by |
| 205 | Conn, Gould \& Toint in~\cite{conn2000trust}. |
| 206 | |
| 207 | Even though the value of the objective function may be larger |
| 208 | than the minimum value encountered over the course of the |
| 209 | optimization, the final parameters returned to the user are the |
| 210 | ones corresponding to the minimum cost over all iterations. |
| 211 | |
| 212 | The option to take non-monotonic is available for all trust region |
| 213 | strategies. |
| 214 | |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 215 | \section{\texttt{LinearSolver}} |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 216 | Recall that in both of the trust-region methods described above, the key computational cost is the solution of a linear least squares problem of the form |
| 217 | \begin{align} |
| 218 | \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 . |
| 219 | \label{eq:simple2} |
| 220 | \end{align} |
| 221 | |
| 222 | |
| 223 | Let $H(x)= J(x)^\top J(x)$ and $g(x) = -J(x)^\top f(x)$. For notational convenience let us also drop the dependence on $x$. Then it is easy to see that solving~\eqref{eq:simple2} is equivalent to solving the {\em normal equations} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 224 | \begin{align} |
| 225 | H \Delta x &= g \label{eq:normal} |
| 226 | \end{align} |
| 227 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 228 | Ceres provides a number of different options for solving~\eqref{eq:normal}. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 229 | |
| 230 | \subsection{\texttt{DENSE\_QR}} |
| 231 | For small problems (a couple of hundred parameters and a few thousand residuals) with relatively dense Jacobians, \texttt{DENSE\_QR} is the method of choice~\cite{bjorck1996numerical}. Let $J = QR$ be the QR-decomposition of $J$, where $Q$ is an orthonormal matrix and $R$ is an upper triangular matrix~\cite{trefethen1997numerical}. Then it can be shown that the solution to~\eqref{eq:normal} is given by |
| 232 | \begin{align} |
Sameer Agarwal | 8ed29a7 | 2012-06-07 17:04:25 -0700 | [diff] [blame] | 233 | \Delta x^* = -R^{-1}Q^\top f |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 234 | \end{align} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 235 | Ceres uses \texttt{Eigen}'s dense QR factorization routines. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 236 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 237 | \subsection{\texttt{DENSE\_NORMAL\_CHOLESKY} \& \texttt{SPARSE\_NORMAL\_CHOLESKY}} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 238 | Large non-linear least square problems are usually sparse. In such cases, using a dense QR factorization is inefficient. Let $H = R^\top R$ be the Cholesky factorization of the normal equations, where $R$ is an upper triangular matrix, then the solution to ~\eqref{eq:normal} is given by |
| 239 | \begin{equation} |
Sameer Agarwal | 8ed29a7 | 2012-06-07 17:04:25 -0700 | [diff] [blame] | 240 | \Delta x^* = R^{-1} R^{-\top} g. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 241 | \end{equation} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 242 | The observant reader will note that the $R$ in the Cholesky |
| 243 | factorization of $H$ is the same upper triangular matrix $R$ in the QR |
| 244 | factorization of $J$. Since $Q$ is an orthonormal matrix, $J=QR$ |
| 245 | implies that $J^\top J = R^\top Q^\top Q R = R^\top R$. There are two variants of Cholesky factorization -- sparse and |
| 246 | dense. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 247 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 248 | \texttt{DENSE\_NORMAL\_CHOLESKY} as the name implies performs a dense |
| 249 | Cholesky factorization of the normal equations. Ceres uses |
| 250 | \texttt{Eigen}'s dense LDLT factorization routines. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 251 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 252 | \texttt{SPARSE\_NORMAL\_CHOLESKY}, as the name implies performs a |
| 253 | sparse Cholesky factorization of the normal equations. This leads to |
| 254 | substantial savings in time and memory for large sparse |
| 255 | problems. Ceres uses the sparse Cholesky factorization routines in Professor Tim Davis' \texttt{SuiteSparse} or |
| 256 | \texttt{CXSparse} packages~\cite{chen2006acs}. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 257 | |
| 258 | \subsection{\texttt{DENSE\_SCHUR} \& \texttt{SPARSE\_SCHUR}} |
| 259 | While it is possible to use \texttt{SPARSE\_NORMAL\_CHOLESKY} to solve bundle adjustment problems, bundle adjustment problem have a special structure, and a more efficient scheme for solving~\eqref{eq:normal} can be constructed. |
| 260 | |
| 261 | Suppose that the SfM problem consists of $p$ cameras and $q$ points and the variable vector $x$ has the block structure $x = [y_{1},\hdots,y_{p},z_{1},\hdots,z_{q}]$. Where, $y$ and $z$ correspond to camera and point parameters, respectively. Further, let the camera blocks be of size $c$ and the point blocks be of size $s$ (for most problems $c$ = $6$--$9$ and $s = 3$). Ceres does not impose any constancy requirement on these block sizes, but choosing them to be constant simplifies the exposition. |
| 262 | |
| 263 | A key characteristic of the bundle adjustment problem is that there is no term $f_{i}$ that includes two or more point blocks. This in turn implies that the matrix $H$ is of the form |
| 264 | \begin{equation} |
| 265 | H = \left[ |
| 266 | \begin{matrix} B & E\\ E^\top & C |
| 267 | \end{matrix} |
| 268 | \right]\ , |
| 269 | \label{eq:hblock} |
| 270 | \end{equation} |
| 271 | where, $B \in \reals^{pc\times pc}$ is a block sparse matrix with $p$ blocks of size $c\times c$ and $C \in \reals^{qs\times qs}$ is a block diagonal matrix with $q$ blocks of size $s\times s$. $E \in \reals^{pc\times qs}$ is a general block sparse matrix, with a block of size $c\times s$ for each observation. Let us now block partition $\Delta x = [\Delta y,\Delta z]$ and $g=[v,w]$ to restate~\eqref{eq:normal} as the block structured linear system |
| 272 | \begin{equation} |
| 273 | \left[ |
| 274 | \begin{matrix} B & E\\ E^\top & C |
| 275 | \end{matrix} |
| 276 | \right]\left[ |
| 277 | \begin{matrix} \Delta y \\ \Delta z |
| 278 | \end{matrix} |
| 279 | \right] |
| 280 | = |
| 281 | \left[ |
| 282 | \begin{matrix} v\\ w |
| 283 | \end{matrix} |
| 284 | \right]\ , |
| 285 | \label{eq:linear2} |
| 286 | \end{equation} |
| 287 | and apply Gaussian elimination to it. As we noted above, $C$ is a block diagonal matrix, with small diagonal blocks of size $s\times s$. |
| 288 | Thus, calculating the inverse of $C$ by inverting each of these blocks is cheap. This allows us to eliminate $\Delta z$ by observing that $\Delta z = C^{-1}(w - E^\top \Delta y)$, giving us |
| 289 | \begin{equation} |
| 290 | \left[B - EC^{-1}E^\top\right] \Delta y = v - EC^{-1}w\ . \label{eq:schur} |
| 291 | \end{equation} |
| 292 | The matrix |
| 293 | \begin{equation} |
| 294 | S = B - EC^{-1}E^\top\ , |
| 295 | \end{equation} |
| 296 | is the Schur complement of $C$ in $H$. It is also known as the {\em reduced camera matrix}, because the only variables participating in~\eqref{eq:schur} are the ones corresponding to the cameras. $S \in \reals^{pc\times pc}$ is a block structured symmetric positive definite matrix, with blocks of size $c\times c$. The block $S_{ij}$ corresponding to the pair of images $i$ and $j$ is non-zero if and only if the two images observe at least one common point. |
| 297 | |
| 298 | Now, \eqref{eq:linear2}~can be solved by first forming $S$, solving for $\Delta y$, and then back-substituting $\Delta y$ to obtain the value of $\Delta z$. |
| 299 | Thus, the solution of what was an $n\times n$, $n=pc+qs$ linear system is reduced to the inversion of the block diagonal matrix $C$, a few matrix-matrix and matrix-vector multiplies, and the solution of block sparse $pc\times pc$ linear system~\eqref{eq:schur}. For almost all problems, the number of cameras is much smaller than the number of points, $p \ll q$, thus solving~\eqref{eq:schur} is significantly cheaper than solving~\eqref{eq:linear2}. This is the {\em Schur complement trick}~\cite{brown-58}. |
| 300 | |
| 301 | This still leaves open the question of solving~\eqref{eq:schur}. The |
| 302 | method of choice for solving symmetric positive definite systems |
| 303 | exactly is via the Cholesky |
| 304 | factorization~\cite{trefethen1997numerical} and depending upon the |
| 305 | structure of the matrix, there are, in general, two options. The first |
| 306 | is direct factorization, where we store and factor $S$ as a dense |
| 307 | matrix~\cite{trefethen1997numerical}. This method has $O(p^2)$ space complexity and $O(p^3)$ time |
| 308 | complexity and is only practical for problems with up to a few hundred |
| 309 | cameras. Ceres implements this strategy as the \texttt{DENSE\_SCHUR} solver. |
| 310 | |
| 311 | |
| 312 | But, $S$ is typically a fairly sparse matrix, as most images |
| 313 | only see a small fraction of the scene. This leads us to the second |
| 314 | option: sparse direct methods. These methods store $S$ as a sparse |
| 315 | matrix, use row and column re-ordering algorithms to maximize the |
| 316 | sparsity of the Cholesky decomposition, and focus their compute effort |
| 317 | on the non-zero part of the factorization~\cite{chen2006acs}. |
| 318 | Sparse direct methods, depending on the exact sparsity structure of the Schur complement, |
| 319 | allow bundle adjustment algorithms to significantly scale up over those based on dense |
| 320 | factorization. Ceres implements this strategy as the \texttt{SPARSE\_SCHUR} solver. |
| 321 | |
| 322 | \subsection{\texttt{CGNR}} |
| 323 | For general sparse problems, if the problem is too large for \texttt{CHOLMOD} or a sparse linear algebra library is not linked into Ceres, another option is the \texttt{CGNR} solver. This solver uses the Conjugate Gradients solver on the {\em normal equations}, but without forming the normal equations explicitly. It exploits the relation |
| 324 | \begin{align} |
Sameer Agarwal | 8ed29a7 | 2012-06-07 17:04:25 -0700 | [diff] [blame] | 325 | H x = J^\top J x = J^\top(J x) |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 326 | \end{align} |
| 327 | When the user chooses \texttt{ITERATIVE\_SCHUR} as the linear solver, Ceres automatically switches from the exact step algorithm to an inexact step algorithm. |
| 328 | |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 329 | \subsection{\texttt{ITERATIVE\_SCHUR}} |
| 330 | Another option for bundle adjustment problems is to apply PCG to the reduced camera matrix $S$ instead of $H$. One reason to do this is that $S$ is a much smaller matrix than $H$, but more importantly, it can be shown that $\kappa(S)\leq \kappa(H)$. Ceres implements PCG on $S$ as the \texttt{ITERATIVE\_SCHUR} solver. When the user chooses \texttt{ITERATIVE\_SCHUR} as the linear solver, Ceres automatically switches from the exact step algorithm to an inexact step algorithm. |
| 331 | |
| 332 | The cost of forming and storing the Schur complement $S$ can be prohibitive for large problems. Indeed, for an inexact Newton solver that computes $S$ and runs PCG on it, almost all of its time is spent in constructing $S$; the time spent inside the PCG algorithm is negligible in comparison. Because PCG only needs access to $S$ via its product with a vector, one way to evaluate $Sx$ is to observe that |
| 333 | \begin{align} |
| 334 | x_1 &= E^\top x \notag \\ |
| 335 | x_2 &= C^{-1} x_1 \notag\\ |
| 336 | x_3 &= Ex_2 \notag\\ |
| 337 | x_4 &= Bx \notag\\ |
| 338 | Sx &= x_4 - x_3\ .\label{eq:schurtrick1} |
| 339 | \end{align} |
| 340 | Thus, we can run PCG on $S$ with the same computational effort per iteration as PCG on $H$, while reaping the benefits of a more powerful preconditioner. In fact, we do not even need to compute $H$, \eqref{eq:schurtrick1} can be implemented using just the columns of $J$. |
| 341 | |
| 342 | Equation~\eqref{eq:schurtrick1} is closely related to {\em Domain Decomposition methods} for solving large linear systems that arise in structural engineering and partial differential equations. In the language of Domain Decomposition, each point in a bundle adjustment problem is a domain, and the cameras form the interface between these domains. The iterative solution of the Schur complement then falls within the sub-category of techniques known as Iterative Sub-structuring~\cite{saad2003iterative,mathew2008domain}. |
| 343 | |
| 344 | \section{Preconditioner} |
| 345 | The convergence rate of Conjugate Gradients for solving~\eqref{eq:normal} depends on the distribution of eigenvalues of $H$~\cite{saad2003iterative}. A useful upper bound is $\sqrt{\kappa(H)}$, where, $\kappa(H)$f is the condition number of the matrix $H$. For most bundle adjustment problems, $\kappa(H)$ is high and a direct application of Conjugate Gradients to~\eqref{eq:normal} results in extremely poor performance. |
| 346 | |
| 347 | The solution to this problem is to replace~\eqref{eq:normal} with a {\em preconditioned} system. Given a linear system, $Ax =b$ and a preconditioner $M$ the preconditioned system is given by $M^{-1}Ax = M^{-1}b$. The resulting algorithm is known as Preconditioned Conjugate Gradients algorithm (PCG) and its worst case complexity now depends on the condition number of the {\em preconditioned} matrix $\kappa(M^{-1}A)$. |
| 348 | |
| 349 | The computational cost of using a preconditioner $M$ is the cost of computing $M$ and evaluating the product $M^{-1}y$ for arbitrary vectors $y$. Thus, there are two competing factors to consider: How much of $H$'s structure is captured by $M$ so that the condition number $\kappa(HM^{-1})$ is low, and the computational cost of constructing and using $M$. The ideal preconditioner would be one for which $\kappa(M^{-1}A) =1$. $M=A$ achieves this, but it is not a practical choice, as applying this preconditioner would require solving a linear system equivalent to the unpreconditioned problem. It is usually the case that the more information $M$ has about $H$, the more expensive it is use. For example, Incomplete Cholesky factorization based preconditioners have much better convergence behavior than the Jacobi preconditioner, but are also much more expensive. |
| 350 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 351 | |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 352 | The simplest of all preconditioners is the diagonal or Jacobi preconditioner, \ie, $M=\operatorname{diag}(A)$, which for block structured matrices like $H$ can be generalized to the block Jacobi preconditioner. |
| 353 | |
| 354 | For \texttt{ITERATIVE\_SCHUR} there are two obvious choices for block diagonal preconditioners for $S$. The block diagonal of the matrix $B$~\cite{mandel1990block} and the block diagonal $S$, \ie the block Jacobi preconditioner for $S$. Ceres's implements both of these preconditioners and refers to them as \texttt{JACOBI} and \texttt{SCHUR\_JACOBI} respectively. |
| 355 | |
| 356 | For bundle adjustment problems arising in reconstruction from community photo collections, more effective preconditioners can be constructed by analyzing and exploiting the camera-point visibility structure of the scene~\cite{kushal2012}. Ceres implements the two visibility based preconditioners described by Kushal \& Agarwal as \texttt{CLUSTER\_JACOBI} and \texttt{CLUSTER\_TRIDIAGONAL}. These are fairly new preconditioners and Ceres' implementation of them is in its early stages and is not as mature as the other preconditioners described above. |
| 357 | |
| 358 | \section{Ordering} |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 359 | \label{sec:ordering} |
| 360 | The order in which variables are eliminated in a linear solver can |
| 361 | have a significant of impact on the efficiency and accuracy of the |
| 362 | method. For example when doing sparse Cholesky factorization, there are |
| 363 | matrices for which a good ordering will give a Cholesky factor with |
| 364 | O(n) storage, where as a bad ordering will result in an completely |
| 365 | dense factor. |
Sameer Agarwal | 65625f7 | 2012-09-17 12:06:57 -0700 | [diff] [blame] | 366 | |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 367 | Ceres allows the user to provide varying amounts of hints to the |
| 368 | solver about the variable elimination ordering to use. This can range |
| 369 | from no hints, where the solver is free to decide the best ordering |
| 370 | based on the user's choices like the linear solver being used, to an |
| 371 | exact order in which the variables should be eliminated, and a variety |
| 372 | of possibilities in between. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 373 | |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 374 | Instances of the \texttt{Ordering} class are used to communicate this |
| 375 | information to Ceres. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 376 | |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 377 | Formally an ordering is an ordered partitioning of the parameter |
| 378 | blocks. Each parameter block belongs to exactly one group, and |
| 379 | each group has a unique integer associated with it, that determines |
| 380 | its order in the set of groups. We call these groups {\em elimination |
| 381 | groups}. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 382 | |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 383 | Given such an ordering, Ceres ensures that the parameter blocks in the |
| 384 | lowest numbered elimination group are eliminated first, and then the |
| 385 | parameter blocks in the next lowest numbered elimination group and so |
| 386 | on. Within each elimination group, Ceres is free to order the |
| 387 | parameter blocks as it chooses. e.g. Consider the linear system |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 388 | |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 389 | \begin{align} |
| 390 | x + y &= 3\\ |
| 391 | 2x + 3y &= 7 |
| 392 | \end{align} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 393 | |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 394 | There are two ways in which it can be solved. First eliminating $x$ |
| 395 | from the two equations, solving for y and then back substituting |
| 396 | for $x$, or first eliminating $y$, solving for $x$ and back substituting |
| 397 | for $y$. The user can construct three orderings here. |
| 398 | |
| 399 | \begin{enumerate} |
| 400 | \item $\{0: x\}, \{1: y\}$: Eliminate $x$ first. |
| 401 | \item $\{0: y\}, \{1: x\}$: Eliminate $y$ first. |
| 402 | \item $\{0: x, y\}$: Solver gets to decide the elimination order. |
| 403 | \end{enumerate} |
| 404 | |
| 405 | Thus, to have Ceres determine the ordering automatically using |
| 406 | heuristics, put all the variables in the same elimination group. The |
| 407 | identity of the group does not matter. This is the same as not |
| 408 | specifying an ordering at all. To control the ordering for every |
| 409 | variable, create an elimination group per variable, ordering them in |
| 410 | the desired order. |
| 411 | |
| 412 | If the user is using one of the Schur solvers (\texttt{DENSE\_SCHUR}, |
| 413 | \texttt{SPARSE\_SCHUR},\ \texttt{ITERATIVE\_SCHUR}) and chooses to |
| 414 | specify an ordering, it must have one important property. The lowest |
| 415 | numbered elimination group must form an independent set in the graph |
| 416 | corresponding to the Hessian, or in other words, no two parameter |
| 417 | blocks in in the first elimination group should co-occur in the same |
| 418 | residual block. For the best performance, this elimination group |
| 419 | should be as large as possible. For standard bundle adjustment |
| 420 | problems, this corresponds to the first elimination group containing |
| 421 | all the 3d points, and the second containing the all the cameras |
| 422 | parameter blocks. |
| 423 | |
| 424 | If the user leaves the choice to Ceres, then the solver uses an |
| 425 | approximate maximum independent set algorithm to identify the first |
| 426 | elimination group~\cite{li2007miqr} . |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 427 | \section{\texttt{Solver::Options}} |
| 428 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 429 | \texttt{Solver::Options} controls the overall behavior of the |
| 430 | solver. We list the various settings and their default values below. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 431 | |
| 432 | \begin{enumerate} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 433 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 434 | \item{\texttt{trust\_region\_strategy\_type }} |
| 435 | (\texttt{LEVENBERG\_MARQUARDT}) The trust region step computation |
| 436 | algorithm used by Ceres. Currently \texttt{LEVENBERG\_MARQUARDT } |
| 437 | and \texttt{DOGLEG} are the two valid choices. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 438 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 439 | \item{\texttt{dogleg\_type}} (\texttt{TRADITIONAL\_DOGLEG}) Ceres |
| 440 | supports two different dogleg strategies. |
| 441 | \texttt{TRADITIONAL\_DOGLEG} method by Powell and the |
| 442 | \texttt{SUBSPACE\_DOGLEG} method described by Byrd et al. |
| 443 | ~\cite{byrd1988approximate}. See Section~\ref{sec:dogleg} for more details. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 444 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 445 | \item{\texttt{use\_nonmonotoic\_steps}} (\texttt{false}) |
| 446 | Relax the requirement that the trust-region algorithm take strictly |
| 447 | decreasing steps. See Section~\ref{sec:non-monotonic} for more details. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 448 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 449 | \item{\texttt{max\_consecutive\_nonmonotonic\_steps}} (5) |
| 450 | The window size used by the step selection algorithm to accept |
| 451 | non-monotonic steps. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 452 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 453 | \item{\texttt{max\_num\_iterations }}(\texttt{50}) Maximum number of |
| 454 | iterations for Levenberg-Marquardt. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 455 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 456 | \item{\texttt{max\_solver\_time\_in\_seconds }} ($10^9$) Maximum |
| 457 | amount of time for which the solver should run. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 458 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 459 | \item{\texttt{num\_threads }}(\texttt{1}) Number of threads used by |
| 460 | Ceres to evaluate the Jacobian. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 461 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 462 | \item{\texttt{initial\_trust\_region\_radius } ($10^4$)} The size of |
| 463 | the initial trust region. When the \texttt{LEVENBERG\_MARQUARDT} |
| 464 | strategy is used, the reciprocal of this number is the initial |
| 465 | regularization parameter. |
Sameer Agarwal | 8ed29a7 | 2012-06-07 17:04:25 -0700 | [diff] [blame] | 466 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 467 | \item{\texttt{max\_trust\_region\_radius } ($10^{16}$)} The trust |
| 468 | region radius is not allowed to grow beyond this value. |
Sameer Agarwal | 8ed29a7 | 2012-06-07 17:04:25 -0700 | [diff] [blame] | 469 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 470 | \item{\texttt{min\_trust\_region\_radius } ($10^{-32}$)} The solver |
| 471 | terminates, when the trust region becomes smaller than this value. |
| 472 | |
| 473 | \item{\texttt{min\_relative\_decrease }}($10^{-3}$) Lower threshold |
| 474 | for relative decrease before a Levenberg-Marquardt step is acceped. |
| 475 | |
| 476 | \item{\texttt{lm\_min\_diagonal } ($10^6$)} The |
| 477 | \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to |
| 478 | regularize the the trust region step. This is the lower bound on the |
| 479 | values of this diagonal matrix. |
| 480 | |
| 481 | \item{\texttt{lm\_max\_diagonal } ($10^{32}$)} The |
| 482 | \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to |
| 483 | regularize the the trust region step. This is the upper bound on the |
| 484 | values of this diagonal matrix. |
| 485 | |
| 486 | \item{\texttt{max\_num\_consecutive\_invalid\_steps } (5)} The step |
| 487 | returned by a trust region strategy can sometimes be numerically |
| 488 | invalid, usually because of conditioning issues. Instead of crashing |
| 489 | or stopping the optimization, the optimizer can go ahead and try |
| 490 | solving with a smaller trust region/better conditioned problem. This |
| 491 | parameter sets the number of consecutive retries before the |
| 492 | minimizer gives up. |
Sameer Agarwal | 8ed29a7 | 2012-06-07 17:04:25 -0700 | [diff] [blame] | 493 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 494 | \item{\texttt{function\_tolerance }}($10^{-6}$) Solver terminates if |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 495 | \begin{align} |
| 496 | \frac{|\Delta \text{cost}|}{\text{cost}} < \texttt{function\_tolerance} |
| 497 | \end{align} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 498 | where, $\Delta \text{cost}$ is the change in objective function value |
| 499 | (up or down) in the current iteration of Levenberg-Marquardt. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 500 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 501 | \item \texttt{Solver::Options::gradient\_tolerance } Solver terminates if |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 502 | \begin{equation} |
| 503 | \frac{\|g(x)\|_\infty}{\|g(x_0)\|_\infty} < \texttt{gradient\_tolerance} |
| 504 | \end{equation} |
| 505 | where $\|\cdot\|_\infty$ refers to the max norm, and $x_0$ is the vector of initial parameter values. |
| 506 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 507 | \item{\texttt{parameter\_tolerance }}($10^{-8}$) Solver terminates if |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 508 | \begin{equation} |
Sameer Agarwal | 8ed29a7 | 2012-06-07 17:04:25 -0700 | [diff] [blame] | 509 | \frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 510 | \end{equation} |
| 511 | where $\Delta x$ is the step computed by the linear solver in the current iteration of Levenberg-Marquardt. |
| 512 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 513 | \item{\texttt{linear\_solver\_type }(\texttt{SPARSE\_NORMAL\_CHOLESKY})} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 514 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 515 | \item{\texttt{linear\_solver\_type |
| 516 | }}(\texttt{SPARSE\_NORMAL\_CHOLESKY}/\texttt{DENSE\_QR}) Type of |
| 517 | linear solver used to compute the solution to the linear least |
| 518 | squares problem in each iteration of the Levenberg-Marquardt |
| 519 | algorithm. If Ceres is build with \suitesparse linked in then the |
| 520 | default is \texttt{SPARSE\_NORMAL\_CHOLESKY}, it is |
| 521 | \texttt{DENSE\_QR} otherwise. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 522 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 523 | \item{\texttt{preconditioner\_type }}(\texttt{JACOBI}) The |
| 524 | preconditioner used by the iterative linear solver. The default is |
| 525 | the block Jacobi preconditioner. Valid values are (in increasing |
| 526 | order of complexity) \texttt{IDENTITY},\texttt{JACOBI}, |
| 527 | \texttt{SCHUR\_JACOBI}, \texttt{CLUSTER\_JACOBI} and |
| 528 | \texttt{CLUSTER\_TRIDIAGONAL}. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 529 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 530 | \item{\texttt{sparse\_linear\_algebra\_library } |
| 531 | (\texttt{SUITE\_SPARSE})} Ceres supports the use of two sparse |
| 532 | linear algebra libraries, \texttt{SuiteSparse}, which is enabled by |
| 533 | setting this parameter to \texttt{SUITE\_SPARSE} and |
| 534 | \texttt{CXSparse}, which can be selected by setting this parameter |
| 535 | to $\texttt{CX\_SPARSE}$. \texttt{SuiteSparse} is a sophisticated |
| 536 | and complex sparse linear algebra library and should be used in |
| 537 | general. If your needs/platforms prevent you from using |
| 538 | \texttt{SuiteSparse}, consider using \texttt{CXSparse}, which is a |
| 539 | much smaller, easier to build library. As can be expected, its |
| 540 | performance on large problems is not comparable to that of |
| 541 | \texttt{SuiteSparse}. |
Sameer Agarwal | 8ed29a7 | 2012-06-07 17:04:25 -0700 | [diff] [blame] | 542 | |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 543 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 544 | \item{\texttt{num\_linear\_solver\_threads }}(\texttt{1}) Number of |
| 545 | threads used by the linear solver. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 546 | |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 547 | \item{\texttt{use\_inner\_iterations} (\texttt{false}) } Use a |
| 548 | non-linear version of a simplified variable projection |
| 549 | algorithm. Essentially this amounts to doing a further optimization |
| 550 | on each Newton/Trust region step using a coordinate descent |
| 551 | algorithm. For more details, see the discussion in ~\ref{sec:inner} |
| 552 | |
Sameer Agarwal | ba8d967 | 2012-10-02 00:48:57 -0700 | [diff] [blame] | 553 | \item{\texttt{inner\_iteration\_ordering} (\texttt{NULL})} If |
| 554 | \texttt{Solver::Options::inner\_iterations} is true, then the user |
| 555 | has two choices. |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 556 | |
Sameer Agarwal | ba8d967 | 2012-10-02 00:48:57 -0700 | [diff] [blame] | 557 | \begin{enumerate} |
| 558 | \item Let the solver heuristically decide which parameter blocks to |
| 559 | optimize in each inner iteration. To do this, set |
| 560 | \texttt{inner\_iteration\_ordering} to {\texttt{NULL}}. |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 561 | |
Sameer Agarwal | ba8d967 | 2012-10-02 00:48:57 -0700 | [diff] [blame] | 562 | \item Specify a collection of of ordered independent sets. The lower |
| 563 | numbered groups are optimized before the higher number groups during |
| 564 | the inner optimization phase. Each group must be an independent set. |
| 565 | \end{enumerate} |
| 566 | |
| 567 | \item{\texttt{ordering} (\texttt{NULL})} An instance of the ordering |
| 568 | object informs the solver about the desired order in which parameter |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 569 | blocks should be eliminated by the linear solvers. See |
| 570 | section~\ref{sec:ordering} for more details. |
| 571 | |
| 572 | If \texttt{NULL}, the solver is free to choose an ordering that it |
| 573 | thinks is best. Note: currently, this option only has an effect on |
| 574 | the Schur type solvers, support for the |
| 575 | \texttt{SPARSE\_NORMAL\_CHOLESKY} solver is forth coming. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 576 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 577 | \item{\texttt{use\_block\_amd } (\texttt{true})} By virtue of the |
| 578 | modeling layer in Ceres being block oriented, all the matrices used |
Sameer Agarwal | faf88a1 | 2012-09-26 10:38:20 -0700 | [diff] [blame] | 579 | by Ceres are also block oriented. When doing sparse direct |
| 580 | factorization of these matrices, the fill-reducing ordering |
| 581 | algorithms can either be run on the block or the scalar form of |
| 582 | these matrices. Running it on the block form exposes more of the |
| 583 | super-nodal structure of the matrix to the Cholesky factorization |
| 584 | routines. This leads to substantial gains in factorization |
| 585 | performance. Setting this parameter to true, enables the use of a |
| 586 | block oriented Approximate Minimum Degree ordering |
| 587 | algorithm. Settings it to \texttt{false}, uses a scalar AMD |
| 588 | algorithm. This option only makes sense when using |
| 589 | \texttt{sparse\_linear\_algebra\_library = SUITE\_SPARSE} as it uses |
| 590 | the \texttt{AMD} package that is part of \texttt{SuiteSparse}. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 591 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 592 | \item{\texttt{linear\_solver\_min\_num\_iterations }}(\texttt{1}) |
| 593 | Minimum number of iterations used by the linear solver. This only |
| 594 | makes sense when the linear solver is an iterative solver, e.g., |
| 595 | \texttt{ITERATIVE\_SCHUR}. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 596 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 597 | \item{\texttt{linear\_solver\_max\_num\_iterations }}(\texttt{500}) |
| 598 | Minimum number of iterations used by the linear solver. This only |
| 599 | makes sense when the linear solver is an iterative solver, e.g., |
| 600 | \texttt{ITERATIVE\_SCHUR}. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 601 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 602 | \item{\texttt{eta }} ($10^{-1}$) |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 603 | Forcing sequence parameter. The truncated Newton solver uses this |
| 604 | number to control the relative accuracy with which the Newton step is |
| 605 | computed. This constant is passed to ConjugateGradientsSolver which |
| 606 | uses it to terminate the iterations when |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 607 | \begin{equation} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 608 | \frac{Q_i - Q_{i-1}}{Q_i} < \frac{\eta}{i} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 609 | \end{equation} |
| 610 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 611 | \item{\texttt{jacobi\_scaling }}(\texttt{true}) \texttt{true} means |
| 612 | that the Jacobian is scaled by the norm of its columns before being |
| 613 | passed to the linear solver. This improves the numerical |
| 614 | conditioning of the normal equations. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 615 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 616 | \item{\texttt{logging\_type }}(\texttt{PER\_MINIMIZER\_ITERATION}) |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 617 | |
| 618 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 619 | \item{\texttt{minimizer\_progress\_to\_stdout }}(\texttt{false}) |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 620 | By default the Minimizer progress is logged to \texttt{STDERR} |
| 621 | depending on the \texttt{vlog} level. If this flag is |
| 622 | set to true, and \texttt{logging\_type } is not \texttt{SILENT}, the |
| 623 | logging output |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 624 | is sent to \texttt{STDOUT}. |
| 625 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 626 | \item{\texttt{return\_initial\_residuals }}(\texttt{false}) |
| 627 | \item{\texttt{return\_final\_residuals }}(\texttt{false}) |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 628 | If true, the vectors \texttt{Solver::Summary::initial\_residuals } and |
| 629 | \texttt{Solver::Summary::final\_residuals } are filled with the |
| 630 | residuals before and after the optimization. The entries of these |
| 631 | vectors are in the order in which ResidualBlocks were added to the |
| 632 | Problem object. |
| 633 | |
Sameer Agarwal | 4997cbc | 2012-07-02 12:44:34 -0700 | [diff] [blame] | 634 | \item{\texttt{return\_initial\_gradient }}(\texttt{false}) |
| 635 | \item{\texttt{return\_final\_gradient }}(\texttt{false}) |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 636 | If true, the vectors \texttt{Solver::Summary::initial\_gradient } and |
| 637 | \texttt{Solver::Summary::final\_gradient } are filled with the |
| 638 | gradient before and after the optimization. The entries of these |
| 639 | vectors are in the order in which ParameterBlocks were added to the |
| 640 | Problem object. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 641 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 642 | Since \texttt{AddResidualBlock } adds ParameterBlocks to the |
| 643 | \texttt{Problem } automatically if they do not already exist, if you |
| 644 | wish to have explicit control over the ordering of the vectors, then |
| 645 | use \texttt{Problem::AddParameterBlock } to explicitly add the |
| 646 | ParameterBlocks in the order desired. |
| 647 | |
Sameer Agarwal | 4997cbc | 2012-07-02 12:44:34 -0700 | [diff] [blame] | 648 | \item{\texttt{return\_initial\_jacobian }}(\texttt{false}) |
| 649 | \item{\texttt{return\_initial\_jacobian }}(\texttt{false}) |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 650 | If true, the Jacobian matrices before and after the optimization are |
| 651 | returned in \texttt{Solver::Summary::initial\_jacobian } and |
| 652 | \texttt{Solver::Summary::final\_jacobian } respectively. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 653 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 654 | The rows of these matrices are in the same order in which the |
| 655 | ResidualBlocks were added to the Problem object. The columns are in |
| 656 | the same order in which the ParameterBlocks were added to the Problem |
| 657 | object. |
Sameer Agarwal | 4997cbc | 2012-07-02 12:44:34 -0700 | [diff] [blame] | 658 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 659 | Since \texttt{AddResidualBlock } adds ParameterBlocks to the |
| 660 | \texttt{Problem } automatically if they do not already exist, if you |
| 661 | wish to have explicit control over the column ordering of the matrix, |
| 662 | then use \texttt{Problem::AddParameterBlock } to explicitly add the |
| 663 | ParameterBlocks in the order desired. |
| 664 | |
| 665 | The Jacobian matrices are stored as compressed row sparse |
| 666 | matrices. Please see \texttt{ceres/crs\_matrix.h } for more details of |
| 667 | the format. |
| 668 | |
| 669 | \item{\texttt{lsqp\_iterations\_to\_dump }} List of iterations at |
| 670 | which the optimizer should dump the linear least squares problem to |
| 671 | disk. Useful for testing and benchmarking. If empty (default), no |
| 672 | problems are dumped. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 673 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 674 | \item{\texttt{lsqp\_dump\_directory }} (\texttt{/tmp}) |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 675 | If \texttt{lsqp\_iterations\_to\_dump} is non-empty, then this |
| 676 | setting determines the directory to which the files containing the |
| 677 | linear least squares problems are written to. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 678 | |
| 679 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 680 | \item{\texttt{lsqp\_dump\_format }}(\texttt{TEXTFILE}) The format in |
| 681 | which linear least squares problems should be logged |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 682 | when \texttt{lsqp\_iterations\_to\_dump} is non-empty. There are three options |
| 683 | \begin{itemize} |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 684 | \item{\texttt{CONSOLE }} prints the linear least squares problem in a human readable format |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 685 | to \texttt{stderr}. The Jacobian is printed as a dense matrix. The vectors |
| 686 | $D$, $x$ and $f$ are printed as dense vectors. This should only be used |
| 687 | for small problems. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 688 | \item{\texttt{PROTOBUF }} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 689 | Write out the linear least squares problem to the directory |
| 690 | pointed to by \texttt{lsqp\_dump\_directory} as a protocol |
| 691 | buffer. \texttt{linear\_least\_squares\_problems.h/cc} contains routines for |
| 692 | loading these problems. For details on the on disk format used, |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 693 | see \texttt{matrix.proto}. The files are named |
| 694 | \texttt{lm\_iteration\_???.lsqp}. This requires that |
| 695 | \texttt{protobuf} be linked into Ceres Solver. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 696 | \item{\texttt{TEXTFILE }} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 697 | Write out the linear least squares problem to the directory |
| 698 | pointed to by \texttt{lsqp\_dump\_directory} as text files |
| 699 | which can be read into \texttt{MATLAB/Octave}. The Jacobian is dumped as a |
| 700 | text file containing $(i,j,s)$ triplets, the vectors $D$, $x$ and $f$ are |
| 701 | dumped as text files containing a list of their values. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 702 | |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 703 | A \texttt{MATLAB/Octave} script called \texttt{lm\_iteration\_???.m} is also output, |
| 704 | which can be used to parse and load the problem into memory. |
| 705 | \end{itemize} |
| 706 | |
| 707 | |
| 708 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 709 | \item{\texttt{check\_gradients }}(\texttt{false}) |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 710 | Check all Jacobians computed by each residual block with finite |
| 711 | differences. This is expensive since it involves computing the |
| 712 | derivative by normal means (e.g. user specified, autodiff, |
| 713 | etc), then also computing it using finite differences. The |
| 714 | results are compared, and if they differ substantially, details |
| 715 | are printed to the log. |
| 716 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 717 | \item{\texttt{gradient\_check\_relative\_precision }} ($10^{-8}$) |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 718 | Relative precision to check for in the gradient checker. If the |
| 719 | relative difference between an element in a Jacobian exceeds |
| 720 | this number, then the Jacobian for that cost term is dumped. |
| 721 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 722 | \item{\texttt{numeric\_derivative\_relative\_step\_size }} ($10^{-6}$) |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 723 | Relative shift used for taking numeric derivatives. For finite |
| 724 | differencing, each dimension is evaluated at slightly shifted |
| 725 | values, \eg for forward differences, the numerical derivative is |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 726 | |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 727 | \begin{align} |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 728 | \delta &= \texttt{numeric\_derivative\_relative\_step\_size}\\ |
| 729 | \Delta f &= \frac{f((1 + \delta) x) - f(x)}{\delta x} |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 730 | \end{align} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 731 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 732 | The finite differencing is done along each dimension. The |
| 733 | reason to use a relative (rather than absolute) step size is |
| 734 | that this way, numeric differentiation works for functions where |
| 735 | the arguments are typically large (e.g. $10^9$) and when the |
| 736 | values are small (e.g. $10^{-5}$). It is possible to construct |
| 737 | "torture cases" which break this finite difference heuristic, |
| 738 | but they do not come up often in practice. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 739 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 740 | \item{\texttt{callbacks }} |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 741 | Callbacks that are executed at the end of each iteration of the |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 742 | \texttt{Minimizer}. They are executed in the order that they are |
| 743 | specified in this vector. By default, parameter blocks are |
| 744 | updated only at the end of the optimization, i.e when the |
| 745 | \texttt{Minimizer} terminates. This behavior is controlled by |
| 746 | \texttt{update\_state\_every\_variable}. If the user wishes to have access |
| 747 | to the update parameter blocks when his/her callbacks are |
| 748 | executed, then set \texttt{update\_state\_every\_iteration} to true. |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 749 | |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 750 | The solver does NOT take ownership of these pointers. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 751 | |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 752 | \item{\texttt{update\_state\_every\_iteration }}(\texttt{false}) |
Sameer Agarwal | 122cf83 | 2012-08-24 16:28:27 -0700 | [diff] [blame] | 753 | Normally the parameter blocks are only updated when the solver |
| 754 | terminates. Setting this to true update them in every iteration. This |
| 755 | setting is useful when building an interactive application using Ceres |
| 756 | and using an \texttt{IterationCallback}. |
| 757 | |
| 758 | \item{\texttt{solver\_log}} If non-empty, a summary of the execution of the solver is |
| 759 | recorded to this file. This file is used for recording and Ceres' |
| 760 | performance. Currently, only the iteration number, total |
| 761 | time and the objective function value are logged. The format of this |
| 762 | file is expected to change over time as the performance evaluation |
| 763 | framework is fleshed out. |
Sameer Agarwal | d3eaa48 | 2012-05-29 23:47:57 -0700 | [diff] [blame] | 764 | \end{enumerate} |
| 765 | |
| 766 | \section{\texttt{Solver::Summary}} |
Sameer Agarwal | 97fb6d9 | 2012-06-17 10:08:19 -0700 | [diff] [blame] | 767 | TBD |