More tips and tricks. 1. Add a tip about glog. 2. Add a tip about using Summary::FullReport to optimize performance. 3. Add a tip about using the Inverse Function Theorem. Change-Id: I949ec6843ff796672edbad8bc801230dd0ab5345
diff --git a/docs/source/tricks.rst b/docs/source/tricks.rst index 2622e23..32d0499 100644 --- a/docs/source/tricks.rst +++ b/docs/source/tricks.rst
@@ -37,10 +37,182 @@ :class:`NumericDiffFunctor` and :class:`CostFunctionToFunctor`. -2. Diagnosing convergence issues. +2. Use `google-glog <http://code.google.com/p/google-glog>`_. - TBD + Ceres has extensive support for logging various stages of the + solve. This includes detailed information about memory allocations + and time consumed in various parts of the solve, internal error + conditions etc. This logging structure is built on top of the + `google-glog <http://code.google.com/p/google-glog>`_ library and + can easily be controlled from the command line. -3. Diagnoising performance issues. + We use it extensively to observe and analyze Ceres's + performance. Starting with ``-logtostdterr`` you can add ``-v=N`` + for increasing values of N to get more and more verbose and + detailed information about Ceres internals. - TBD + Building Ceres like this introduces an external dependency, and it + is tempting instead to use the `miniglog` implementation that ships + inside Ceres instead. This is a bad idea. + + ``miniglog`` was written primarily for building and using Ceres on + Android because the current version of `google-glog + <http://code.google.com/p/google-glog>`_ does not build using the + NDK. It has worse performance than the full fledged glog library + and is much harder to control and use. + +3. `Solver::Summary::FullReport` is your friend. + + When diagnosing Ceres performance issues - runtime and convergence, + the first place to start is by looking at the output of + ``Solver::Summary::FullReport``. Here is an example + + .. code-block:: bash + + ./bin/bundle_adjuster --input ../data/problem-16-22106-pre.txt + + + 0: f: 4.185660e+06 d: 0.00e+00 g: 2.16e+07 h: 0.00e+00 rho: 0.00e+00 mu: 1.00e+04 li: 0 it: 9.20e-02 tt: 3.35e-01 + 1: f: 1.980525e+05 d: 3.99e+06 g: 5.34e+06 h: 2.40e+03 rho: 9.60e-01 mu: 3.00e+04 li: 1 it: 1.99e-01 tt: 5.34e-01 + 2: f: 5.086543e+04 d: 1.47e+05 g: 2.11e+06 h: 1.01e+03 rho: 8.22e-01 mu: 4.09e+04 li: 1 it: 1.61e-01 tt: 6.95e-01 + 3: f: 1.859667e+04 d: 3.23e+04 g: 2.87e+05 h: 2.64e+02 rho: 9.85e-01 mu: 1.23e+05 li: 1 it: 1.63e-01 tt: 8.58e-01 + 4: f: 1.803857e+04 d: 5.58e+02 g: 2.69e+04 h: 8.66e+01 rho: 9.93e-01 mu: 3.69e+05 li: 1 it: 1.62e-01 tt: 1.02e+00 + 5: f: 1.803391e+04 d: 4.66e+00 g: 3.11e+02 h: 1.02e+01 rho: 1.00e+00 mu: 1.11e+06 li: 1 it: 1.61e-01 tt: 1.18e+00 + + Ceres Solver Report + ------------------- + Original Reduced + Parameter blocks 22122 22122 + Parameters 66462 66462 + Residual blocks 83718 83718 + Residual 167436 167436 + + Minimizer TRUST_REGION + + Sparse linear algebra library SUITE_SPARSE + Trust region strategy LEVENBERG_MARQUARDT + + Given Used + Linear solver SPARSE_SCHUR SPARSE_SCHUR + Threads 1 1 + Linear solver threads 1 1 + Linear solver ordering AUTOMATIC 22106, 16 + + Cost: + Initial 4.185660e+06 + Final 1.803391e+04 + Change 4.167626e+06 + + Minimizer iterations 5 + Successful steps 5 + Unsuccessful steps 0 + + Time (in seconds): + Preprocessor 0.243 + + Residual evaluation 0.053 + Jacobian evaluation 0.435 + Linear solver 0.371 + Minimizer 0.940 + + Postprocessor 0.002 + Total 1.221 + + Termination: NO_CONVERGENCE (Maximum number of iterations reached.) + + Let us focus on run-time performance. The relevant lines to look at + are + + + .. code-block:: bash + + Time (in seconds): + Preprocessor 0.243 + + Residual evaluation 0.053 + Jacobian evaluation 0.435 + Linear solver 0.371 + Minimizer 0.940 + + Postprocessor 0.002 + Total 1.221 + + Which tell us that of the total 1.2 seconds, about .4 seconds was + spent in the linear solver and the rest was mostly spent in + preprocessing and jacobian evaluation. + + The preprocessing seems particularly expensive. Looking back at the + report, we observe + + .. code-block:: bash + + Linear solver ordering AUTOMATIC 22106, 16 + + Which indicates that we are using automatic ordering for the + ``SPARSE_SCHUR`` solver. This can be expensive at times. A straight + forward way to deal with this is to give the ordering manually. For + ``bundle_adjuster`` this can be done by passing the flag + ``-ordering=user``. Doing so and looking at the timing block of the + full report gives us + + .. code-block:: bash + + Time (in seconds): + Preprocessor 0.058 + + Residual evaluation 0.050 + Jacobian evaluation 0.416 + Linear solver 0.360 + Minimizer 0.903 + + Postprocessor 0.002 + Total 0.998 + + The preprocessor time has gone down by more than 4x!. + + +4. Putting `Inverse Function Theorem + <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to use. + + Every now and then we have to deal with functions which cannot be + evaluated analytically. Computing the Jacobian in such cases is + tricky. A particularly interesting case is where the inverse of the + function is easy to compute analytically. An example of such a + function is the Coordinate transformation between the `ECEF + <http://en.wikipedia.org/wiki/ECEF>`_ and the `WGS84 + <http://en.wikipedia.org/wiki/World_Geodetic_System>`_ where the + conversion from WGS84 to ECEF is analytic, but the conversion back + to ECEF uses an iterative algorithm. So how do you compute the + derivative of the ECEF to WGS84 transformation? + + One obvious approach would be to numerically + differentiate the conversion function. This is not a good idea. For + one, it will be slow, but it will also be numerically quite + bad. + + Turns out you can use the `Inverse Function Theorem + <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ in this + case to compute the derivatives more or less analytically. + + The key result here is. If :math:`x = f^{-1}(y)`, and :math:`Df(x)` + is the invertible Jacobian of :math:`f` at :math:`x`. Then the + Jacobian :math:`Df^{-1}(y) = [Df(x)]^{-1}`, i.e., the Jacobian of + the :math:`f^{-1}` is the inverse of the Jacobian of :math:`f`. + + Algorithmically this means that given :math:`y`, compute :math:`x = + f^{-1}(y)` by whatever means you can. Evaluate the Jacobian of + :math:`f` at :math:`x`. If the Jacobian matrix is invertible, then + the inverse is the Jacobian of the inverse at :math:`y`. + + One can put this into practice with the following code fragment. + + .. code-block:: c++ + + Eigen::Vector3d ecef; // Fill some values + // Iterative computation. + Eigen::Vector3d lla = ECEFToLLA(ecef); + // Analytic derivatives + Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla); + bool invertible; + Eigen::Matrix3d ecef_to_lla_jacobian; + lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);