blob: 7d8f7faa784933ae3524f743352fb28e2cf11caa [file] [log] [blame]
.. default-domain:: cpp
.. cpp:namespace:: ceres
.. _chapter-inverse_function_theorem:
==========================================
Using Inverse & Implicit Function Theorems
==========================================
Until now we have considered methods for computing derivatives that
work directly on the function being differentiated. However, this is
not always possible. For example, if the function can only be computed
via an iterative algorithm, or there is no explicit definition of the
function available. In this section we will see how we can use two
basic results from calculus to get around these difficulties.
Inverse Function Theorem
========================
Suppose we wish to evaluate the derivative of a function :math:`f(x)`,
but evaluating :math:`f(x)` is not easy. Say it involves running an
iterative algorithm. You could try automatically differentiating the
iterative algorithm, but even if that is possible, it can become quite
expensive.
In some cases we get lucky, and computing the inverse of :math:`f(x)`
is an easy operation. In these cases, we can use the `Inverse Function
Theorem <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to
compute the derivative exactly. Here is the key idea:
Assuming that :math:`y=f(x)` is continuously differentiable in a
neighborhood of a point :math:`x` and :math:`Df(x)` is the invertible
Jacobian of :math:`f` at :math:`x`, then by applying the chain rule to
the identity :math:`f^{-1}(f(x)) = x`, we have
:math:`Df^{-1}(f(x))Df(x) = I`, or :math:`Df^{-1}(y) = (Df(x))^{-1}`,
i.e., the Jacobian of :math:`f^{-1}` is the inverse of the Jacobian of
:math:`f`, or :math:`Df(x) = (Df^{-1}(y))^{-1}`.
For example, let :math:`f(x) = e^x`. Now of course we know that
:math:`Df(x) = e^x`, but let's try and compute it via the Inverse
Function Theorem. For :math:`x > 0`, we have :math:`f^{-1}(y) = \log
y`, so :math:`Df^{-1}(y) = \frac{1}{y}`, so :math:`Df(x) =
(Df^{-1}(y))^{-1} = y = e^x`.
You maybe wondering why the above is true. A smoothly differentiable
function in a small neighborhood is well approximated by a linear
function. Indeed this is a good way to think about the Jacobian, it is
the matrix that best approximates the function linearly. Once you do
that, it is straightforward to see that *locally* :math:`f^{-1}(y)` is
best approximated linearly by the inverse of the Jacobian of
:math:`f(x)`.
Let us now consider a more practical example.
Geodetic Coordinate System Conversion
-------------------------------------
When working with data related to the Earth, one can use two different
coordinate systems. The familiar (latitude, longitude, height)
Latitude-Longitude-Altitude coordinate system or the `ECEF
<http://en.wikipedia.org/wiki/ECEF>`_ coordinate systems. The former
is familiar but is not terribly convenient analytically. The latter is
a Cartesian system but not particularly intuitive. So systems that
process earth related data have to go back and forth between these
coordinate systems.
The conversion between the LLA and the ECEF coordinate system requires
a model of the Earth, the most commonly used one being `WGS84
<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.
Going from the spherical :math:`(\phi,\lambda,h)` to the ECEF
:math:`(x,y,z)` coordinates is easy.
.. math::
\chi &= \sqrt{1 - e^2 \sin^2 \phi}
X &= \left( \frac{a}{\chi} + h \right) \cos \phi \cos \lambda
Y &= \left( \frac{a}{\chi} + h \right) \cos \phi \sin \lambda
Z &= \left(\frac{a(1-e^2)}{\chi} +h \right) \sin \phi
Here :math:`a` and :math:`e^2` are constants defined by `WGS84
<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.
Going from ECEF to LLA coordinates requires an iterative algorithm. So
to compute the derivative of the this transformation we invoke the
Inverse Function Theorem as follows:
.. code-block:: c++
Eigen::Vector3d ecef; // Fill some values
// Iterative computation.
Eigen::Vector3d lla = ECEFToLLA(ecef);
// Analytic derivatives
Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla);
bool invertible;
Eigen::Matrix3d ecef_to_lla_jacobian;
lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);
Implicit Function Theorem
=========================
Consider now the problem where we have two variables :math:`x \in
\mathbb{R}^m` and :math:`y \in \mathbb{R}^n` and a function
:math:`F:\mathbb{R}^m \times \mathbb{R}^n \rightarrow \mathbb{R}^n`
such that :math:`F(x,y) = 0` and we wish to calculate the Jacobian of
:math:`y` with respect to `x`. How do we do this?
If for a given value of :math:`(x,y)`, the partial Jacobian
:math:`D_2F(x,y)` is full rank, then the `Implicit Function Theorem
<https://en.wikipedia.org/wiki/Implicit_function_theorem>`_ tells us
that there exists a neighborhood of :math:`x` and a function :math:`G`
such :math:`y = G(x)` in this neighborhood. Differentiating
:math:`F(x,G(x)) = 0` gives us
.. math::
D_1F(x,y) + D_2F(x,y)DG(x) &= 0
DG(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)
D y(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)
This means that we can compute the derivative of :math:`y` with
respect to :math:`x` by multiplying the Jacobian of :math:`F` w.r.t
:math:`x` by the inverse of the Jacobian of :math:`F` w.r.t :math:`y`.
Let's consider two examples.
Roots of a Polynomial
---------------------
The first example we consider is a classic. Let :math:`p(x) = a_0 +
a_1 x + \dots + a_n x^n` be a degree :math:`n` polynomial, and we wish
to compute the derivative of its roots with respect to its
coefficients. There is no closed form formula for computing the roots
of a general degree :math:`n` polynomial. `Galois
<https://en.wikipedia.org/wiki/%C3%89variste_Galois>`_ and `Abel
<https://en.wikipedia.org/wiki/Niels_Henrik_Abel>`_ proved that. There
are numerical algorithms like computing the eigenvalues of the
`Companion Matrix
<https://nhigham.com/2021/03/23/what-is-a-companion-matrix/>`_, but
differentiating an eigenvalue solver does not seem like fun. But the
Implicit Function Theorem offers us a simple path.
If :math:`x` is a root of :math:`p(x)`, then :math:`F(\mathbf{a}, x) =
a_0 + a_1 x + \dots + a_n x^n = 0`. So,
.. math::
D_1 F(\mathbf{a}, x) &= [1, x, x^2, \dots, x^n]
D_2 F(\mathbf{a}, x) &= \sum_{k=1}^n k a_k x^{k-1} = Dp(x)
Dx(a) &= \frac{-1}{Dp(x)} [1, x, x^2, \dots, x^n]
Differentiating the Solution to an Optimization Problem
-------------------------------------------------------
Sometimes we are required to solve optimization problems inside
optimization problems, and this requires computing the derivative of
the optimal solution (or a fixed point) of an optimization problem
w.r.t its parameters.
Let :math:`\theta \in \mathbb{R}^m` be a vector, :math:`A(\theta) \in
\mathbb{R}^{k\times n}` be a matrix whose entries are a function of
:math:`\theta` with :math:`k \ge n` and let :math:`b \in \mathbb{R}^k`
be a constant vector, then consider the linear least squares problem:
.. math::
x^* = \arg \min_x \|A(\theta) x - b\|_2^2
How do we compute :math:`D_\theta x^*(\theta)`?
One approach would be to observe that :math:`x^*(\theta) =
(A^\top(\theta)A(\theta))^{-1}A^\top(\theta)b` and then differentiate
this w.r.t :math:`\theta`. But this would require differentiating
through the inverse of the matrix
:math:`(A^\top(\theta)A(\theta))^{-1}`. Not exactly easy. Let's use
the Implicit Function Theorem instead.
The first step is to observe that :math:`x^*` satisfies the so called
*normal equations*.
.. math::
A^\top(\theta)A(\theta)x^* - A^\top(\theta)b = 0
We will compute :math:`D_\theta x^*` column-wise, treating
:math:`A(\theta)` as a function of one coordinate (:math:`\theta_i`)
of :math:`\theta` at a time. So using the normal equations, let's
define :math:`F(\theta_i, x^*) = A^\top(\theta_i)A(\theta_i)x^* -
A^\top(\theta_i)b = 0`. Using which can now compute:
.. math::
D_1F(\theta_i, x^*) &= D_{\theta_i}A^\top A + A^\top
D_{\theta_i}Ax^* - D_{\theta_i} A^\top b = g_i
D_2F(\theta_i, x^*) &= A^\top A
Dx^*(\theta_i) & = -(A^\top A)^{-1} g_i
Dx^*(\theta) & = -(A^\top A )^{-1} \left[g_1, \dots, g_m\right]
Observe that we only need to compute the inverse of :math:`A^\top A`,
to compute :math:`D x^*(\theta)`, which we needed anyways to compute
:math:`x^*`.