docs/source/analytical_derivatives.rst - ceres-solver - Git at Google

 .. default-domain:: cpp

 .. cpp:namespace:: ceres

 .. _chapter-analytical_derivatives:

 ====================
 Analytic Derivatives
 ====================

 Consider the problem of fitting the following curve (`Rat43
 <http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to
 data:

 .. math::
   y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}}

 That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`,
 determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best
 fit this data.

 Which can be stated as the problem of finding the
 values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that
 minimize the following objective function [#f1]_:

 .. math::
    \begin{align}
    E(b_1, b_2, b_3, b_4)
    &= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\
    &= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\
    \end{align}

 To solve this problem using Ceres Solver, we need to define a
 :class:`CostFunction` that computes the residual :math:`f` for a given
 :math:`x` and :math:`y` and its derivatives with respect to
 :math:`b_1, b_2, b_3` and :math:`b_4`.

 Using elementary differential calculus, we can see that:

 .. math::
   \begin{align}
   D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\
   D_2 f(b_1, b_2, b_3, b_4; x,y) &=
   \frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
   D_3 f(b_1, b_2, b_3, b_4; x,y) &=
   \frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
   D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1  \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}}
   \end{align}

 With these derivatives in hand, we can now implement the
 :class:`CostFunction` as:

 .. code-block:: c++

   class Rat43Analytic : public SizedCostFunction<1,4> {
      public:
        Rat43Analytic(const double x, const double y) : x_(x), y_(y) {}
        virtual ~Rat43Analytic() {}
        virtual bool Evaluate(double const* const* parameters,
                              double* residuals,
                              double** jacobians) const {
          const double b1 = parameters[0][0];
          const double b2 = parameters[0][1];
          const double b3 = parameters[0][2];
          const double b4 = parameters[0][3];

          residuals[0] = b1 *  pow(1 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;

          if (!jacobians) return true;
          double* jacobian = jacobians[0];
          if (!jacobian) return true;

          jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
          jacobian[1] = -b1 * exp(b2 - b3 * x_) *
                        pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
          jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
                        pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
          jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
                        pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
          return true;
        }

       private:
        const double x_;
        const double y_;
    };

 This is tedious code, hard to read and with a lot of
 redundancy. So in practice we will cache some sub-expressions to
 improve its efficiency, which would give us something like:

 .. code-block:: c++

   class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
      public:
        Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {}
        virtual ~Rat43AnalyticOptimized() {}
        virtual bool Evaluate(double const* const* parameters,
                              double* residuals,
                              double** jacobians) const {
          const double b1 = parameters[0][0];
          const double b2 = parameters[0][1];
          const double b3 = parameters[0][2];
          const double b4 = parameters[0][3];

          const double t1 = exp(b2 -  b3 * x_);
          const double t2 = 1 + t1;
          const double t3 = pow(t2, -1.0 / b4);
          residuals[0] = b1 * t3 - y_;

          if (!jacobians) return true;
          double* jacobian = jacobians[0];
          if (!jacobian) return true;

          const double t4 = pow(t2, -1.0 / b4 - 1);
          jacobian[0] = t3;
          jacobian[1] = -b1 * t1 * t4 / b4;
          jacobian[2] = -x_ * jacobian[1];
          jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
          return true;
        }

      private:
        const double x_;
        const double y_;
    };

 What is the difference in performance of these two implementations?

 ==========================   =========
 CostFunction                 Time (ns)
 ==========================   =========
 Rat43Analytic                      255
 Rat43AnalyticOptimized              92
 ==========================   =========

 ``Rat43AnalyticOptimized`` is :math:`2.8` times faster than
 ``Rat43Analytic``.  This difference in run-time is not uncommon. To
 get the best performance out of analytically computed derivatives, one
 usually needs to optimize the code to account for common
 sub-expressions.


 When should you use analytical derivatives?
 ===========================================

 #. The expressions are simple, e.g. mostly linear.

 #. A computer algebra system like `Maple
    <https://www.maplesoft.com/products/maple/>`_ , `Mathematica
    <https://www.wolfram.com/mathematica/>`_, or `SymPy
    <http://www.sympy.org/en/index.html>`_ can be used to symbolically
    differentiate the objective function and generate the C++ to
    evaluate them.

 #. Performance is of utmost concern and there is algebraic structure
    in the terms that you can exploit to get better performance than
    automatic differentiation.

    That said, getting the best performance out of analytical
    derivatives requires a non-trivial amount of work.  Before going
    down this path, it is useful to measure the amount of time being
    spent evaluating the Jacobian as a fraction of the total solve time
    and remember `Amdahl's Law
    <https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend.

 #. There is no other way to compute the derivatives, e.g. you
    wish to compute the derivative of the root of a polynomial:

    .. math::
      a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0


    with respect to :math:`x` and :math:`y`. This requires the use of
    the `Inverse Function Theorem
    <https://en.wikipedia.org/wiki/Inverse_function_theorem>`_

 #. You love the chain rule and actually enjoy doing all the algebra by
    hand.


 .. rubric:: Footnotes

 .. [#f1] The notion of best fit depends on the choice of the objective
          function used to measure the quality of fit, which in turn
          depends on the underlying noise process which generated the
          observations. Minimizing the sum of squared differences is
          the right thing to do when the noise is `Gaussian
          <https://en.wikipedia.org/wiki/Normal_distribution>`_. In
          that case the optimal value of the parameters is the `Maximum
          Likelihood Estimate
          <https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_.
	.. default-domain:: cpp

	.. cpp:namespace:: ceres

	.. _chapter-analytical_derivatives:

	====================
	Analytic Derivatives
	====================

	Consider the problem of fitting the following curve (`Rat43
	<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to
	data:

	.. math::
	y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}}

	That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`,
	determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best
	fit this data.

	Which can be stated as the problem of finding the
	values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that
	minimize the following objective function [#f1]_:

	.. math::
	\begin{align}
	E(b_1, b_2, b_3, b_4)
	&= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\
	&= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\
	\end{align}

	To solve this problem using Ceres Solver, we need to define a
	:class:`CostFunction` that computes the residual :math:`f` for a given
	:math:`x` and :math:`y` and its derivatives with respect to
	:math:`b_1, b_2, b_3` and :math:`b_4`.

	Using elementary differential calculus, we can see that:

	.. math::
	\begin{align}
	D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\
	D_2 f(b_1, b_2, b_3, b_4; x,y) &=
	\frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
	D_3 f(b_1, b_2, b_3, b_4; x,y) &=
	\frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
	D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1 \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}}
	\end{align}

	With these derivatives in hand, we can now implement the
	:class:`CostFunction` as:

	.. code-block:: c++

	class Rat43Analytic : public SizedCostFunction<1,4> {
	public:
	Rat43Analytic(const double x, const double y) : x_(x), y_(y) {}
	virtual ~Rat43Analytic() {}
	virtual bool Evaluate(double const* const* parameters,
	double* residuals,
	double** jacobians) const {
	const double b1 = parameters[0][0];
	const double b2 = parameters[0][1];
	const double b3 = parameters[0][2];
	const double b4 = parameters[0][3];

	residuals[0] = b1 * pow(1 + exp(b2 - b3 * x_), -1.0 / b4) - y_;

	if (!jacobians) return true;
	double* jacobian = jacobians[0];
	if (!jacobian) return true;

	jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
	jacobian[1] = -b1 * exp(b2 - b3 * x_) *
	pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
	jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
	pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
	jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
	pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
	return true;
	}

	private:
	const double x_;
	const double y_;
	};

	This is tedious code, hard to read and with a lot of
	redundancy. So in practice we will cache some sub-expressions to
	improve its efficiency, which would give us something like:

	.. code-block:: c++

	class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
	public:
	Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {}
	virtual ~Rat43AnalyticOptimized() {}
	virtual bool Evaluate(double const* const* parameters,
	double* residuals,
	double** jacobians) const {
	const double b1 = parameters[0][0];
	const double b2 = parameters[0][1];
	const double b3 = parameters[0][2];
	const double b4 = parameters[0][3];

	const double t1 = exp(b2 - b3 * x_);
	const double t2 = 1 + t1;
	const double t3 = pow(t2, -1.0 / b4);
	residuals[0] = b1 * t3 - y_;

	if (!jacobians) return true;
	double* jacobian = jacobians[0];
	if (!jacobian) return true;

	const double t4 = pow(t2, -1.0 / b4 - 1);
	jacobian[0] = t3;
	jacobian[1] = -b1 * t1 * t4 / b4;
	jacobian[2] = -x_ * jacobian[1];
	jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
	return true;
	}

	private:
	const double x_;
	const double y_;
	};

	What is the difference in performance of these two implementations?

	========================== =========
	CostFunction Time (ns)
	========================== =========
	Rat43Analytic 255
	Rat43AnalyticOptimized 92
	========================== =========

	``Rat43AnalyticOptimized`` is :math:`2.8` times faster than
	``Rat43Analytic``. This difference in run-time is not uncommon. To
	get the best performance out of analytically computed derivatives, one
	usually needs to optimize the code to account for common
	sub-expressions.


	When should you use analytical derivatives?
	===========================================

	#. The expressions are simple, e.g. mostly linear.

	#. A computer algebra system like `Maple
	<https://www.maplesoft.com/products/maple/>`_ , `Mathematica
	<https://www.wolfram.com/mathematica/>`_, or `SymPy
	<http://www.sympy.org/en/index.html>`_ can be used to symbolically
	differentiate the objective function and generate the C++ to
	evaluate them.

	#. Performance is of utmost concern and there is algebraic structure
	in the terms that you can exploit to get better performance than
	automatic differentiation.

	That said, getting the best performance out of analytical
	derivatives requires a non-trivial amount of work. Before going
	down this path, it is useful to measure the amount of time being
	spent evaluating the Jacobian as a fraction of the total solve time
	and remember `Amdahl's Law
	<https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend.

	#. There is no other way to compute the derivatives, e.g. you
	wish to compute the derivative of the root of a polynomial:

	.. math::
	a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0


	with respect to :math:`x` and :math:`y`. This requires the use of
	the `Inverse Function Theorem
	<https://en.wikipedia.org/wiki/Inverse_function_theorem>`_

	#. You love the chain rule and actually enjoy doing all the algebra by
	hand.


	.. rubric:: Footnotes

	.. [#f1] The notion of best fit depends on the choice of the objective
	function used to measure the quality of fit, which in turn
	depends on the underlying noise process which generated the
	observations. Minimizing the sum of squared differences is
	the right thing to do when the noise is `Gaussian
	<https://en.wikipedia.org/wiki/Normal_distribution>`_. In
	that case the optimal value of the parameters is the `Maximum
	Likelihood Estimate
	<https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_.