ceres-solver / ceres-solver / e91995cce456d7edf404103bd3dc40794e13886e / . / docs / source / analytical_derivatives.rst

.. default-domain:: cpp | |

.. cpp:namespace:: ceres | |

.. _chapter-analytical_derivatives: | |

==================== | |

Analytic Derivatives | |

==================== | |

Consider the problem of fitting the following curve (`Rat43 | |

<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to | |

data: | |

.. math:: | |

y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}} | |

That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`, | |

determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best | |

fit this data. | |

Which can be stated as the problem of finding the | |

values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that | |

minimize the following objective function [#f1]_: | |

.. math:: | |

\begin{align} | |

E(b_1, b_2, b_3, b_4) | |

&= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\ | |

&= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\ | |

\end{align} | |

To solve this problem using Ceres Solver, we need to define a | |

:class:`CostFunction` that computes the residual :math:`f` for a given | |

:math:`x` and :math:`y` and its derivatives with respect to | |

:math:`b_1, b_2, b_3` and :math:`b_4`. | |

Using elementary differential calculus, we can see that: | |

.. math:: | |

\begin{align} | |

D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\ | |

D_2 f(b_1, b_2, b_3, b_4; x,y) &= | |

\frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\ | |

D_3 f(b_1, b_2, b_3, b_4; x,y) &= | |

\frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\ | |

D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1 \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}} | |

\end{align} | |

With these derivatives in hand, we can now implement the | |

:class:`CostFunction` as: | |

.. code-block:: c++ | |

class Rat43Analytic : public SizedCostFunction<1,4> { | |

public: | |

Rat43Analytic(const double x, const double y) : x_(x), y_(y) {} | |

virtual ~Rat43Analytic() {} | |

virtual bool Evaluate(double const* const* parameters, | |

double* residuals, | |

double** jacobians) const { | |

const double b1 = parameters[0][0]; | |

const double b2 = parameters[0][1]; | |

const double b3 = parameters[0][2]; | |

const double b4 = parameters[0][3]; | |

residuals[0] = b1 * pow(1 + exp(b2 - b3 * x_), -1.0 / b4) - y_; | |

if (!jacobians) return true; | |

double* jacobian = jacobians[0]; | |

if (!jacobian) return true; | |

jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4); | |

jacobian[1] = -b1 * exp(b2 - b3 * x_) * | |

pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4; | |

jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) * | |

pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4; | |

jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) * | |

pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4); | |

return true; | |

} | |

private: | |

const double x_; | |

const double y_; | |

}; | |

This is tedious code, hard to read and with a lot of | |

redundancy. So in practice we will cache some sub-expressions to | |

improve its efficiency, which would give us something like: | |

.. code-block:: c++ | |

class Rat43AnalyticOptimized : public SizedCostFunction<1,4> { | |

public: | |

Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {} | |

virtual ~Rat43AnalyticOptimized() {} | |

virtual bool Evaluate(double const* const* parameters, | |

double* residuals, | |

double** jacobians) const { | |

const double b1 = parameters[0][0]; | |

const double b2 = parameters[0][1]; | |

const double b3 = parameters[0][2]; | |

const double b4 = parameters[0][3]; | |

const double t1 = exp(b2 - b3 * x_); | |

const double t2 = 1 + t1; | |

const double t3 = pow(t2, -1.0 / b4); | |

residuals[0] = b1 * t3 - y_; | |

if (!jacobians) return true; | |

double* jacobian = jacobians[0]; | |

if (!jacobian) return true; | |

const double t4 = pow(t2, -1.0 / b4 - 1); | |

jacobian[0] = t3; | |

jacobian[1] = -b1 * t1 * t4 / b4; | |

jacobian[2] = -x_ * jacobian[1]; | |

jacobian[3] = b1 * log(t2) * t3 / (b4 * b4); | |

return true; | |

} | |

private: | |

const double x_; | |

const double y_; | |

}; | |

What is the difference in performance of these two implementations? | |

========================== ========= | |

CostFunction Time (ns) | |

========================== ========= | |

Rat43Analytic 255 | |

Rat43AnalyticOptimized 92 | |

========================== ========= | |

``Rat43AnalyticOptimized`` is :math:`2.8` times faster than | |

``Rat43Analytic``. This difference in run-time is not uncommon. To | |

get the best performance out of analytically computed derivatives, one | |

usually needs to optimize the code to account for common | |

sub-expressions. | |

When should you use analytical derivatives? | |

=========================================== | |

#. The expressions are simple, e.g. mostly linear. | |

#. A computer algebra system like `Maple | |

<https://www.maplesoft.com/products/maple/>`_ , `Mathematica | |

<https://www.wolfram.com/mathematica/>`_, or `SymPy | |

<http://www.sympy.org/en/index.html>`_ can be used to symbolically | |

differentiate the objective function and generate the C++ to | |

evaluate them. | |

#. Performance is of utmost concern and there is algebraic structure | |

in the terms that you can exploit to get better performance than | |

automatic differentiation. | |

That said, getting the best performance out of analytical | |

derivatives requires a non-trivial amount of work. Before going | |

down this path, it is useful to measure the amount of time being | |

spent evaluating the Jacobian as a fraction of the total solve time | |

and remember `Amdahl's Law | |

<https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend. | |

#. There is no other way to compute the derivatives, e.g. you | |

wish to compute the derivative of the root of a polynomial: | |

.. math:: | |

a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0 | |

with respect to :math:`x` and :math:`y`. This requires the use of | |

the `Inverse Function Theorem | |

<https://en.wikipedia.org/wiki/Inverse_function_theorem>`_ | |

#. You love the chain rule and actually enjoy doing all the algebra by | |

hand. | |

.. rubric:: Footnotes | |

.. [#f1] The notion of best fit depends on the choice of the objective | |

function used to measure the quality of fit, which in turn | |

depends on the underlying noise process which generated the | |

observations. Minimizing the sum of squared differences is | |

the right thing to do when the noise is `Gaussian | |

<https://en.wikipedia.org/wiki/Normal_distribution>`_. In | |

that case the optimal value of the parameters is the `Maximum | |

Likelihood Estimate | |

<https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_. |