ceres-solver / ceres-solver / 84436f791c2f76a82f56190ba9d744c904c07551 / . / docs / source / numerical_derivatives.rst

.. default-domain:: cpp | |

.. cpp:namespace:: ceres | |

.. _chapter-numerical_derivatives: | |

=================== | |

Numeric derivatives | |

=================== | |

The other extreme from using analytic derivatives is to use numeric | |

derivatives. The key observation here is that the process of | |

differentiating a function :math:`f(x)` w.r.t :math:`x` can be written | |

as the limiting process: | |

.. math:: | |

Df(x) = \lim_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h} | |

Forward Differences | |

=================== | |

Now of course one cannot perform the limiting operation numerically on | |

a computer so we do the next best thing, which is to choose a small | |

value of :math:`h` and approximate the derivative as | |

.. math:: | |

Df(x) \approx \frac{f(x + h) - f(x)}{h} | |

The above formula is the simplest most basic form of numeric | |

differentiation. It is known as the *Forward Difference* formula. | |

So how would one go about constructing a numerically differentiated | |

version of ``Rat43Analytic`` (`Rat43 | |

<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) in | |

Ceres Solver. This is done in two steps: | |

1. Define *Functor* that given the parameter values will evaluate the | |

residual for a given :math:`(x,y)`. | |

2. Construct a :class:`CostFunction` by using | |

:class:`NumericDiffCostFunction` to wrap an instance of | |

``Rat43CostFunctor``. | |

.. code-block:: c++ | |

struct Rat43CostFunctor { | |

Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {} | |

bool operator()(const double* parameters, double* residuals) const { | |

const double b1 = parameters[0]; | |

const double b2 = parameters[1]; | |

const double b3 = parameters[2]; | |

const double b4 = parameters[3]; | |

residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_; | |

return true; | |

} | |

const double x_; | |

const double y_; | |

} | |

CostFunction* cost_function = | |

new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>( | |

new Rat43CostFunctor(x, y)); | |

This is about the minimum amount of work one can expect to do to | |

define the cost function. The only thing that the user needs to do is | |

to make sure that the evaluation of the residual is implemented | |

correctly and efficiently. | |

Before going further, it is instructive to get an estimate of the | |

error in the forward difference formula. We do this by considering the | |

`Taylor expansion <https://en.wikipedia.org/wiki/Taylor_series>`_ of | |

:math:`f` near :math:`x`. | |

.. math:: | |

\begin{align} | |

f(x+h) &= f(x) + h Df(x) + \frac{h^2}{2!} D^2f(x) + | |

\frac{h^3}{3!}D^3f(x) + \cdots \\ | |

Df(x) &= \frac{f(x + h) - f(x)}{h} - \left [\frac{h}{2!}D^2f(x) + | |

\frac{h^2}{3!}D^3f(x) + \cdots \right]\\ | |

Df(x) &= \frac{f(x + h) - f(x)}{h} + O(h) | |

\end{align} | |

i.e., the error in the forward difference formula is | |

:math:`O(h)` [#f4]_. | |

Implementation Details | |

---------------------- | |

:class:`NumericDiffCostFunction` implements a generic algorithm to | |

numerically differentiate a given functor. While the actual | |

implementation of :class:`NumericDiffCostFunction` is complicated, the | |

net result is a :class:`CostFunction` that roughly looks something | |

like the following: | |

.. code-block:: c++ | |

class Rat43NumericDiffForward : public SizedCostFunction<1,4> { | |

public: | |

Rat43NumericDiffForward(const Rat43Functor* functor) : functor_(functor) {} | |

virtual ~Rat43NumericDiffForward() {} | |

virtual bool Evaluate(double const* const* parameters, | |

double* residuals, | |

double** jacobians) const { | |

functor_(parameters[0], residuals); | |

if (!jacobians) return true; | |

double* jacobian = jacobians[0]; | |

if (!jacobian) return true; | |

const double f = residuals[0]; | |

double parameters_plus_h[4]; | |

for (int i = 0; i < 4; ++i) { | |

std::copy(parameters, parameters + 4, parameters_plus_h); | |

const double kRelativeStepSize = 1e-6; | |

const double h = std::abs(parameters[i]) * kRelativeStepSize; | |

parameters_plus_h[i] += h; | |

double f_plus; | |

functor_(parameters_plus_h, &f_plus); | |

jacobian[i] = (f_plus - f) / h; | |

} | |

return true; | |

} | |

private: | |

std::unique_ptr<Rat43Functor> functor_; | |

}; | |

Note the choice of step size :math:`h` in the above code, instead of | |

an absolute step size which is the same for all parameters, we use a | |

relative step size of :math:`\text{kRelativeStepSize} = 10^{-6}`. This | |

gives better derivative estimates than an absolute step size [#f2]_ | |

[#f3]_. This choice of step size only works for parameter values that | |

are not close to zero. So the actual implementation of | |

:class:`NumericDiffCostFunction`, uses a more complex step size | |

selection logic, where close to zero, it switches to a fixed step | |

size. | |

Central Differences | |

=================== | |

:math:`O(h)` error in the Forward Difference formula is okay but not | |

great. A better method is to use the *Central Difference* formula: | |

.. math:: | |

Df(x) \approx \frac{f(x + h) - f(x - h)}{2h} | |

Notice that if the value of :math:`f(x)` is known, the Forward | |

Difference formula only requires one extra evaluation, but the Central | |

Difference formula requires two evaluations, making it twice as | |

expensive. So is the extra evaluation worth it? | |

To answer this question, we again compute the error of approximation | |

in the central difference formula: | |

.. math:: | |

\begin{align} | |

f(x + h) &= f(x) + h Df(x) + \frac{h^2}{2!} | |

D^2f(x) + \frac{h^3}{3!} D^3f(x) + \frac{h^4}{4!} D^4f(x) + \cdots\\ | |

f(x - h) &= f(x) - h Df(x) + \frac{h^2}{2!} | |

D^2f(x) - \frac{h^3}{3!} D^3f(c_2) + \frac{h^4}{4!} D^4f(x) + | |

\cdots\\ | |

Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!} | |

D^3f(x) + \frac{h^4}{5!} | |

D^5f(x) + \cdots \\ | |

Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + O(h^2) | |

\end{align} | |

The error of the Central Difference formula is :math:`O(h^2)`, i.e., | |

the error goes down quadratically whereas the error in the Forward | |

Difference formula only goes down linearly. | |

Using central differences instead of forward differences in Ceres | |

Solver is a simple matter of changing a template argument to | |

:class:`NumericDiffCostFunction` as follows: | |

.. code-block:: c++ | |

CostFunction* cost_function = | |

new NumericDiffCostFunction<Rat43CostFunctor, CENTRAL, 1, 4>( | |

new Rat43CostFunctor(x, y)); | |

But what do these differences in the error mean in practice? To see | |

this, consider the problem of evaluating the derivative of the | |

univariate function | |

.. math:: | |

f(x) = \frac{e^x}{\sin x - x^2}, | |

at :math:`x = 1.0`. | |

It is easy to determine that :math:`Df(1.0) = | |

140.73773557129658`. Using this value as reference, we can now compute | |

the relative error in the forward and central difference formulae as a | |

function of the absolute step size and plot them. | |

.. figure:: forward_central_error.png | |

:figwidth: 100% | |

:align: center | |

Reading the graph from right to left, a number of things stand out in | |

the above graph: | |

1. The graph for both formulae have two distinct regions. At first, | |

starting from a large value of :math:`h` the error goes down as | |

the effect of truncating the Taylor series dominates, but as the | |

value of :math:`h` continues to decrease, the error starts | |

increasing again as roundoff error starts to dominate the | |

computation. So we cannot just keep on reducing the value of | |

:math:`h` to get better estimates of :math:`Df`. The fact that we | |

are using finite precision arithmetic becomes a limiting factor. | |

2. Forward Difference formula is not a great method for evaluating | |

derivatives. Central Difference formula converges much more | |

quickly to a more accurate estimate of the derivative with | |

decreasing step size. So unless the evaluation of :math:`f(x)` is | |

so expensive that you absolutely cannot afford the extra | |

evaluation required by central differences, **do not use the | |

Forward Difference formula**. | |

3. Neither formula works well for a poorly chosen value of :math:`h`. | |

Ridders' Method | |

=============== | |

So, can we get better estimates of :math:`Df` without requiring such | |

small values of :math:`h` that we start hitting floating point | |

roundoff errors? | |

One possible approach is to find a method whose error goes down faster | |

than :math:`O(h^2)`. This can be done by applying `Richardson | |

Extrapolation | |

<https://en.wikipedia.org/wiki/Richardson_extrapolation>`_ to the | |

problem of differentiation. This is also known as *Ridders' Method* | |

[Ridders]_. | |

Let us recall, the error in the central differences formula. | |

.. math:: | |

\begin{align} | |

Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!} | |

D^3f(x) + \frac{h^4}{5!} | |

D^5f(x) + \cdots\\ | |

& = \frac{f(x + h) - f(x - h)}{2h} + K_2 h^2 + K_4 h^4 + \cdots | |

\end{align} | |

The key thing to note here is that the terms :math:`K_2, K_4, ...` | |

are independent of :math:`h` and only depend on :math:`x`. | |

Let us now define: | |

.. math:: | |

A(1, m) = \frac{f(x + h/2^{m-1}) - f(x - h/2^{m-1})}{2h/2^{m-1}}. | |

Then observe that | |

.. math:: | |

Df(x) = A(1,1) + K_2 h^2 + K_4 h^4 + \cdots | |

and | |

.. math:: | |

Df(x) = A(1, 2) + K_2 (h/2)^2 + K_4 (h/2)^4 + \cdots | |

Here we have halved the step size to obtain a second central | |

differences estimate of :math:`Df(x)`. Combining these two estimates, | |

we get: | |

.. math:: | |

Df(x) = \frac{4 A(1, 2) - A(1,1)}{4 - 1} + O(h^4) | |

which is an approximation of :math:`Df(x)` with truncation error that | |

goes down as :math:`O(h^4)`. But we do not have to stop here. We can | |

iterate this process to obtain even more accurate estimates as | |

follows: | |

.. math:: | |

A(n, m) = \begin{cases} | |

\frac{\displaystyle f(x + h/2^{m-1}) - f(x - | |

h/2^{m-1})}{\displaystyle 2h/2^{m-1}} & n = 1 \\ | |

\frac{\displaystyle 4^{n-1} A(n - 1, m + 1) - A(n - 1, m)}{\displaystyle 4^{n-1} - 1} & n > 1 | |

\end{cases} | |

It is straightforward to show that the approximation error in | |

:math:`A(n, 1)` is :math:`O(h^{2n})`. To see how the above formula can | |

be implemented in practice to compute :math:`A(n,1)` it is helpful to | |

structure the computation as the following tableau: | |

.. math:: | |

\begin{array}{ccccc} | |

A(1,1) & A(1, 2) & A(1, 3) & A(1, 4) & \cdots\\ | |

& A(2, 1) & A(2, 2) & A(2, 3) & \cdots\\ | |

& & A(3, 1) & A(3, 2) & \cdots\\ | |

& & & A(4, 1) & \cdots \\ | |

& & & & \ddots | |

\end{array} | |

So, to compute :math:`A(n, 1)` for increasing values of :math:`n` we | |

move from the left to the right, computing one column at a | |

time. Assuming that the primary cost here is the evaluation of the | |

function :math:`f(x)`, the cost of computing a new column of the above | |

tableau is two function evaluations. Since the cost of evaluating | |

:math:`A(1, n)`, requires evaluating the central difference formula | |

for step size of :math:`2^{1-n}h` | |

Applying this method to :math:`f(x) = \frac{e^x}{\sin x - x^2}` | |

starting with a fairly large step size :math:`h = 0.01`, we get: | |

.. math:: | |

\begin{array}{rrrrr} | |

141.678097131 &140.971663667 &140.796145400 &140.752333523 &140.741384778\\ | |

&140.736185846 &140.737639311 &140.737729564 &140.737735196\\ | |

& &140.737736209 &140.737735581 &140.737735571\\ | |

& & &140.737735571 &140.737735571\\ | |

& & & &140.737735571\\ | |

\end{array} | |

Compared to the *correct* value :math:`Df(1.0) = 140.73773557129658`, | |

:math:`A(5, 1)` has a relative error of :math:`10^{-13}`. For | |

comparison, the relative error for the central difference formula with | |

the same step size (:math:`0.01/2^4 = 0.000625`) is :math:`10^{-5}`. | |

The above tableau is the basis of Ridders' method for numeric | |

differentiation. The full implementation is an adaptive scheme that | |

tracks its own estimation error and stops automatically when the | |

desired precision is reached. Of course it is more expensive than the | |

forward and central difference formulae, but is also significantly | |

more robust and accurate. | |

Using Ridder's method instead of forward or central differences in | |

Ceres is again a simple matter of changing a template argument to | |

:class:`NumericDiffCostFunction` as follows: | |

.. code-block:: c++ | |

CostFunction* cost_function = | |

new NumericDiffCostFunction<Rat43CostFunctor, RIDDERS, 1, 4>( | |

new Rat43CostFunctor(x, y)); | |

The following graph shows the relative error of the three methods as a | |

function of the absolute step size. For Ridders's method we assume | |

that the step size for evaluating :math:`A(n,1)` is :math:`2^{1-n}h`. | |

.. figure:: forward_central_ridders_error.png | |

:figwidth: 100% | |

:align: center | |

Using the 10 function evaluations that are needed to compute | |

:math:`A(5,1)` we are able to approximate :math:`Df(1.0)` about a 1000 | |

times better than the best central differences estimate. To put these | |

numbers in perspective, machine epsilon for double precision | |

arithmetic is :math:`\approx 2.22 \times 10^{-16}`. | |

Going back to ``Rat43``, let us also look at the runtime cost of the | |

various methods for computing numeric derivatives. | |

========================== ========= | |

CostFunction Time (ns) | |

========================== ========= | |

Rat43Analytic 255 | |

Rat43AnalyticOptimized 92 | |

Rat43NumericDiffForward 262 | |

Rat43NumericDiffCentral 517 | |

Rat43NumericDiffRidders 3760 | |

========================== ========= | |

As expected, Central Differences is about twice as expensive as | |

Forward Differences and the remarkable accuracy improvements of | |

Ridders' method cost an order of magnitude more runtime. | |

Recommendations | |

=============== | |

Numeric differentiation should be used when you cannot compute the | |

derivatives either analytically or using automatic differentiation. This | |

is usually the case when you are calling an external library or | |

function whose analytic form you do not know or even if you do, you | |

are not in a position to re-write it in a manner required to use | |

:ref:`chapter-automatic_derivatives`. | |

When using numeric differentiation, use at least Central Differences, | |

and if execution time is not a concern or the objective function is | |

such that determining a good static relative step size is hard, | |

Ridders' method is recommended. | |

.. rubric:: Footnotes | |

.. [#f2] `Numerical Differentiation | |

<https://en.wikipedia.org/wiki/Numerical_differentiation#Practical_considerations_using_floating_point_arithmetic>`_ | |

.. [#f3] [Press]_ Numerical Recipes, Section 5.7 | |

.. [#f4] In asymptotic error analysis, an error of :math:`O(h^k)` | |

means that the absolute-value of the error is at most some | |

constant times :math:`h^k` when :math:`h` is close enough to | |

:math:`0`. |