docs/source/inverse_and_implicit_function_theorems.rst - ceres-solver - Git at Google

 .. default-domain:: cpp

 .. cpp:namespace:: ceres

 .. _chapter-inverse_function_theorem:

 ==========================================
 Using Inverse & Implicit Function Theorems
 ==========================================

 Until now we have considered methods for computing derivatives that
 work directly on the function being differentiated. However, this is
 not always possible. For example, if the function can only be computed
 via an iterative algorithm, or there is no explicit definition of the
 function available.  In this section we will see how we can use two
 basic results from calculus to get around these difficulties.


 Inverse Function Theorem
 ========================

 Suppose we wish to evaluate the derivative of a function :math:`f(x)`,
 but evaluating :math:`f(x)` is not easy. Say it involves running an
 iterative algorithm. You could try automatically differentiating the
 iterative algorithm, but even if that is possible, it can become quite
 expensive.

 In some cases we get lucky, and computing the inverse of :math:`f(x)`
 is an easy operation. In these cases, we can use the `Inverse Function
 Theorem <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to
 compute the derivative exactly. Here is the key idea:

 Assuming that :math:`y=f(x)` is continuously differentiable in a
 neighborhood of a point :math:`x` and :math:`Df(x)` is the invertible
 Jacobian of :math:`f` at :math:`x`, then by applying the chain rule to
 the identity :math:`f^{-1}(f(x)) = x`, we have
 :math:`Df^{-1}(f(x))Df(x) = I`, or :math:`Df^{-1}(y) = (Df(x))^{-1}`,
 i.e., the Jacobian of :math:`f^{-1}` is the inverse of the Jacobian of
 :math:`f`, or :math:`Df(x) = (Df^{-1}(y))^{-1}`.

 For example, let :math:`f(x) = e^x`. Now of course we know that
 :math:`Df(x) = e^x`, but let's try and compute it via the Inverse
 Function Theorem. For :math:`x > 0`, we have :math:`f^{-1}(y) = \log
 y`, so :math:`Df^{-1}(y) = \frac{1}{y}`, so :math:`Df(x) =
 (Df^{-1}(y))^{-1} = y = e^x`.

 You maybe wondering why the above is true. A smoothly differentiable
 function in a small neighborhood is well approximated by a linear
 function. Indeed this is a good way to think about the Jacobian, it is
 the matrix that best approximates the function linearly. Once you do
 that, it is straightforward to see that *locally* :math:`f^{-1}(y)` is
 best approximated linearly by the inverse of the Jacobian of
 :math:`f(x)`.

 Let us now consider a more practical example.

 Geodetic Coordinate System Conversion
 -------------------------------------

 When working with data related to the Earth, one can use two different
 coordinate systems. The familiar (latitude, longitude, height)
 Latitude-Longitude-Altitude coordinate system or the `ECEF
 <http://en.wikipedia.org/wiki/ECEF>`_ coordinate systems. The former
 is familiar but is not terribly convenient analytically. The latter is
 a Cartesian system but not particularly intuitive. So systems that
 process earth related data have to go back and forth between these
 coordinate systems.

 The conversion between the LLA and the ECEF coordinate system requires
 a model of the Earth, the most commonly used one being `WGS84
 <https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.

 Going from the spherical :math:`(\phi,\lambda,h)` to the ECEF
 :math:`(x,y,z)` coordinates is easy.

 .. math::

    \chi &= \sqrt{1 - e^2 \sin^2 \phi}

    X &= \left( \frac{a}{\chi} + h \right) \cos \phi \cos \lambda

    Y &= \left( \frac{a}{\chi} + h \right) \cos \phi \sin \lambda

    Z &= \left(\frac{a(1-e^2)}{\chi}  +h \right) \sin \phi

 Here :math:`a` and :math:`e^2` are constants defined by `WGS84
 <https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.

 Going from ECEF to LLA coordinates requires an iterative algorithm. So
 to compute the derivative of the this transformation we invoke the
 Inverse Function Theorem as follows:

 .. code-block:: c++

    Eigen::Vector3d ecef; // Fill some values
    // Iterative computation.
    Eigen::Vector3d lla = ECEFToLLA(ecef);
    // Analytic derivatives
    Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla);
    bool invertible;
    Eigen::Matrix3d ecef_to_lla_jacobian;
    lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);


 Implicit Function Theorem
 =========================

 Consider now the problem where we have two variables :math:`x \in
 \mathbb{R}^m` and :math:`y \in \mathbb{R}^n` and a function
 :math:`F:\mathbb{R}^m \times \mathbb{R}^n \rightarrow \mathbb{R}^n`
 such that :math:`F(x,y) = 0` and we wish to calculate the Jacobian of
 :math:`y` with respect to `x`. How do we do this?

 If for a given value of :math:`(x,y)`, the partial Jacobian
 :math:`D_2F(x,y)` is full rank, then the `Implicit Function Theorem
 <https://en.wikipedia.org/wiki/Implicit_function_theorem>`_ tells us
 that there exists a neighborhood of :math:`x` and a function :math:`G`
 such :math:`y = G(x)` in this neighborhood. Differentiating
 :math:`F(x,G(x)) = 0` gives us

 .. math::

    D_1F(x,y) + D_2F(x,y)DG(x) &= 0

                         DG(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)

                         D y(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)

 This means that we can compute the derivative of :math:`y` with
 respect to :math:`x` by multiplying the Jacobian of :math:`F` w.r.t
 :math:`x` by the inverse of the Jacobian of :math:`F` w.r.t :math:`y`.

 Let's consider two examples.

 Roots of a Polynomial
 ---------------------

 The first example we consider is a classic. Let :math:`p(x) = a_0 +
 a_1 x + \dots + a_n x^n` be a degree :math:`n` polynomial, and we wish
 to compute the derivative of its roots with respect to its
 coefficients. There is no closed form formula for computing the roots
 of a general degree :math:`n` polynomial. `Galois
 <https://en.wikipedia.org/wiki/%C3%89variste_Galois>`_ and `Abel
 <https://en.wikipedia.org/wiki/Niels_Henrik_Abel>`_ proved that. There
 are numerical algorithms like computing the eigenvalues of the
 `Companion Matrix
 <https://nhigham.com/2021/03/23/what-is-a-companion-matrix/>`_, but
 differentiating an eigenvalue solver does not seem like fun. But the
 Implicit Function Theorem offers us a simple path.

 If :math:`x` is a root of :math:`p(x)`, then :math:`F(\mathbf{a}, x) =
 a_0 + a_1 x + \dots + a_n x^n = 0`. So,

 .. math::

    D_1 F(\mathbf{a}, x) &= [1, x, x^2, \dots, x^n]

    D_2 F(\mathbf{a}, x) &= \sum_{k=1}^n k a_k x^{k-1} = Dp(x)

         Dx(a) &= \frac{-1}{Dp(x)} [1, x, x^2, \dots, x^n]

 Differentiating the Solution to an Optimization Problem
 -------------------------------------------------------

 Sometimes we are required to solve optimization problems inside
 optimization problems, and this requires computing the derivative of
 the optimal solution (or a fixed point) of an optimization problem
 w.r.t its parameters.

 Let :math:`\theta \in \mathbb{R}^m` be a vector, :math:`A(\theta) \in
 \mathbb{R}^{k\times n}` be a matrix whose entries are a function of
 :math:`\theta` with :math:`k \ge n` and let :math:`b \in \mathbb{R}^k`
 be a constant vector, then consider the linear least squares problem:

 .. math::

    x^* = \arg \min_x \|A(\theta) x - b\|_2^2

 How do we compute :math:`D_\theta x^*(\theta)`?

 One approach would be to observe that :math:`x^*(\theta) =
 (A^\top(\theta)A(\theta))^{-1}A^\top(\theta)b` and then differentiate
 this w.r.t :math:`\theta`. But this would require differentiating
 through the inverse of the matrix
 :math:`(A^\top(\theta)A(\theta))^{-1}`. Not exactly easy. Let's use
 the Implicit Function Theorem instead.

 The first step is to observe that :math:`x^*` satisfies the so called
 *normal equations*.

 .. math::

    A^\top(\theta)A(\theta)x^* - A^\top(\theta)b = 0

 We will compute :math:`D_\theta x^*` column-wise, treating
 :math:`A(\theta)` as a function of one coordinate (:math:`\theta_i`)
 of :math:`\theta` at a time. So using the normal equations, let's
 define :math:`F(\theta_i, x^*) = A^\top(\theta_i)A(\theta_i)x^* -
 A^\top(\theta_i)b = 0`. Using which can now compute:

 .. math::

    D_1F(\theta_i, x^*) &= D_{\theta_i}A^\top A + A^\top
    D_{\theta_i}Ax^* - D_{\theta_i} A^\top b = g_i

    D_2F(\theta_i, x^*) &= A^\top A

    Dx^*(\theta_i) & = -(A^\top A)^{-1} g_i

    Dx^*(\theta) & = -(A^\top A )^{-1} \left[g_1, \dots, g_m\right]

 Observe that we only need to compute the inverse of :math:`A^\top A`,
 to compute :math:`D x^*(\theta)`, which we needed anyways to compute
 :math:`x^*`.
	.. default-domain:: cpp

	.. cpp:namespace:: ceres

	.. _chapter-inverse_function_theorem:

	==========================================
	Using Inverse & Implicit Function Theorems
	==========================================

	Until now we have considered methods for computing derivatives that
	work directly on the function being differentiated. However, this is
	not always possible. For example, if the function can only be computed
	via an iterative algorithm, or there is no explicit definition of the
	function available. In this section we will see how we can use two
	basic results from calculus to get around these difficulties.


	Inverse Function Theorem
	========================

	Suppose we wish to evaluate the derivative of a function :math:`f(x)`,
	but evaluating :math:`f(x)` is not easy. Say it involves running an
	iterative algorithm. You could try automatically differentiating the
	iterative algorithm, but even if that is possible, it can become quite
	expensive.

	In some cases we get lucky, and computing the inverse of :math:`f(x)`
	is an easy operation. In these cases, we can use the `Inverse Function
	Theorem <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to
	compute the derivative exactly. Here is the key idea:

	Assuming that :math:`y=f(x)` is continuously differentiable in a
	neighborhood of a point :math:`x` and :math:`Df(x)` is the invertible
	Jacobian of :math:`f` at :math:`x`, then by applying the chain rule to
	the identity :math:`f^{-1}(f(x)) = x`, we have
	:math:`Df^{-1}(f(x))Df(x) = I`, or :math:`Df^{-1}(y) = (Df(x))^{-1}`,
	i.e., the Jacobian of :math:`f^{-1}` is the inverse of the Jacobian of
	:math:`f`, or :math:`Df(x) = (Df^{-1}(y))^{-1}`.

	For example, let :math:`f(x) = e^x`. Now of course we know that
	:math:`Df(x) = e^x`, but let's try and compute it via the Inverse
	Function Theorem. For :math:`x > 0`, we have :math:`f^{-1}(y) = \log
	y`, so :math:`Df^{-1}(y) = \frac{1}{y}`, so :math:`Df(x) =
	(Df^{-1}(y))^{-1} = y = e^x`.

	You maybe wondering why the above is true. A smoothly differentiable
	function in a small neighborhood is well approximated by a linear
	function. Indeed this is a good way to think about the Jacobian, it is
	the matrix that best approximates the function linearly. Once you do
	that, it is straightforward to see that locally :math:`f^{-1}(y)` is
	best approximated linearly by the inverse of the Jacobian of
	:math:`f(x)`.

	Let us now consider a more practical example.

	Geodetic Coordinate System Conversion
	-------------------------------------

	When working with data related to the Earth, one can use two different
	coordinate systems. The familiar (latitude, longitude, height)
	Latitude-Longitude-Altitude coordinate system or the `ECEF
	<http://en.wikipedia.org/wiki/ECEF>`_ coordinate systems. The former
	is familiar but is not terribly convenient analytically. The latter is
	a Cartesian system but not particularly intuitive. So systems that
	process earth related data have to go back and forth between these
	coordinate systems.

	The conversion between the LLA and the ECEF coordinate system requires
	a model of the Earth, the most commonly used one being `WGS84
	<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.

	Going from the spherical :math:`(\phi,\lambda,h)` to the ECEF
	:math:`(x,y,z)` coordinates is easy.

	.. math::

	\chi &= \sqrt{1 - e^2 \sin^2 \phi}

	X &= \left( \frac{a}{\chi} + h \right) \cos \phi \cos \lambda

	Y &= \left( \frac{a}{\chi} + h \right) \cos \phi \sin \lambda

	Z &= \left(\frac{a(1-e^2)}{\chi} +h \right) \sin \phi

	Here :math:`a` and :math:`e^2` are constants defined by `WGS84
	<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.

	Going from ECEF to LLA coordinates requires an iterative algorithm. So
	to compute the derivative of the this transformation we invoke the
	Inverse Function Theorem as follows:

	.. code-block:: c++

	Eigen::Vector3d ecef; // Fill some values
	// Iterative computation.
	Eigen::Vector3d lla = ECEFToLLA(ecef);
	// Analytic derivatives
	Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla);
	bool invertible;
	Eigen::Matrix3d ecef_to_lla_jacobian;
	lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);


	Implicit Function Theorem
	=========================

	Consider now the problem where we have two variables :math:`x \in
	\mathbb{R}^m` and :math:`y \in \mathbb{R}^n` and a function
	:math:`F:\mathbb{R}^m \times \mathbb{R}^n \rightarrow \mathbb{R}^n`
	such that :math:`F(x,y) = 0` and we wish to calculate the Jacobian of
	:math:`y` with respect to `x`. How do we do this?

	If for a given value of :math:`(x,y)`, the partial Jacobian
	:math:`D_2F(x,y)` is full rank, then the `Implicit Function Theorem
	<https://en.wikipedia.org/wiki/Implicit_function_theorem>`_ tells us
	that there exists a neighborhood of :math:`x` and a function :math:`G`
	such :math:`y = G(x)` in this neighborhood. Differentiating
	:math:`F(x,G(x)) = 0` gives us

	.. math::

	D_1F(x,y) + D_2F(x,y)DG(x) &= 0

	DG(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)

	D y(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)

	This means that we can compute the derivative of :math:`y` with
	respect to :math:`x` by multiplying the Jacobian of :math:`F` w.r.t
	:math:`x` by the inverse of the Jacobian of :math:`F` w.r.t :math:`y`.

	Let's consider two examples.

	Roots of a Polynomial
	---------------------

	The first example we consider is a classic. Let :math:`p(x) = a_0 +
	a_1 x + \dots + a_n x^n` be a degree :math:`n` polynomial, and we wish
	to compute the derivative of its roots with respect to its
	coefficients. There is no closed form formula for computing the roots
	of a general degree :math:`n` polynomial. `Galois
	<https://en.wikipedia.org/wiki/%C3%89variste_Galois>`_ and `Abel
	<https://en.wikipedia.org/wiki/Niels_Henrik_Abel>`_ proved that. There
	are numerical algorithms like computing the eigenvalues of the
	`Companion Matrix
	<https://nhigham.com/2021/03/23/what-is-a-companion-matrix/>`_, but
	differentiating an eigenvalue solver does not seem like fun. But the
	Implicit Function Theorem offers us a simple path.

	If :math:`x` is a root of :math:`p(x)`, then :math:`F(\mathbf{a}, x) =
	a_0 + a_1 x + \dots + a_n x^n = 0`. So,

	.. math::

	D_1 F(\mathbf{a}, x) &= [1, x, x^2, \dots, x^n]

	D_2 F(\mathbf{a}, x) &= \sum_{k=1}^n k a_k x^{k-1} = Dp(x)

	Dx(a) &= \frac{-1}{Dp(x)} [1, x, x^2, \dots, x^n]

	Differentiating the Solution to an Optimization Problem
	-------------------------------------------------------

	Sometimes we are required to solve optimization problems inside
	optimization problems, and this requires computing the derivative of
	the optimal solution (or a fixed point) of an optimization problem
	w.r.t its parameters.

	Let :math:`\theta \in \mathbb{R}^m` be a vector, :math:`A(\theta) \in
	\mathbb{R}^{k\times n}` be a matrix whose entries are a function of
	:math:`\theta` with :math:`k \ge n` and let :math:`b \in \mathbb{R}^k`
	be a constant vector, then consider the linear least squares problem:

	.. math::

	x^* = \arg \min_x \\|A(\theta) x - b\\|_2^2

	How do we compute :math:`D_\theta x^*(\theta)`?

	One approach would be to observe that :math:`x^*(\theta) =
	(A^\top(\theta)A(\theta))^{-1}A^\top(\theta)b` and then differentiate
	this w.r.t :math:`\theta`. But this would require differentiating
	through the inverse of the matrix
	:math:`(A^\top(\theta)A(\theta))^{-1}`. Not exactly easy. Let's use
	the Implicit Function Theorem instead.

	The first step is to observe that :math:`x^*` satisfies the so called
	normal equations.

	.. math::

	A^\top(\theta)A(\theta)x^* - A^\top(\theta)b = 0

	We will compute :math:`D_\theta x^*` column-wise, treating
	:math:`A(\theta)` as a function of one coordinate (:math:`\theta_i`)
	of :math:`\theta` at a time. So using the normal equations, let's
	define :math:`F(\theta_i, x^) = A^\top(\theta_i)A(\theta_i)x^ -
	A^\top(\theta_i)b = 0`. Using which can now compute:

	.. math::

	D_1F(\theta_i, x^*) &= D_{\theta_i}A^\top A + A^\top
	D_{\theta_i}Ax^* - D_{\theta_i} A^\top b = g_i

	D_2F(\theta_i, x^*) &= A^\top A

	Dx^*(\theta_i) & = -(A^\top A)^{-1} g_i

	Dx^*(\theta) & = -(A^\top A )^{-1} \left[g_1, \dots, g_m\right]

	Observe that we only need to compute the inverse of :math:`A^\top A`,
	to compute :math:`D x^*(\theta)`, which we needed anyways to compute
	:math:`x^*`.