Miscellaneous CUDA related changes.
1. Fix a stupid error in types.cc
2. Update documentation for Solver::Options::dense_linear_algebra_library_type
3. Add a note to installation.rst to update the installation docs.
4. Mention GPU acceleration in features.rst
Change-Id: Id63202ff090e23bbb211d2ee458559fb8046281d
diff --git a/docs/source/features.rst b/docs/source/features.rst
index 634be9d..956eb00 100644
--- a/docs/source/features.rst
+++ b/docs/source/features.rst
@@ -46,10 +46,11 @@
computational cost in all of these methods is the solution of a
linear system. To this end Ceres ships with a variety of linear
solvers - dense QR and dense Cholesky factorization (using
- `Eigen`_ or `LAPACK`_) for dense problems, sparse Cholesky
- factorization (`SuiteSparse`_, `CXSparse`_ or `Eigen`_) for large
- sparse problems, custom Schur complement based dense, sparse, and
- iterative linear solvers for `bundle adjustment`_ problems.
+ `Eigen`_, `LAPACK`_ or `CUDA`_) for dense problems, sparse
+ Cholesky factorization (`SuiteSparse`_, `Apple's Accelerate`_,
+ `CXSparse`_ `Eigen`_) for large sparse problems, custom Schur
+ complement based dense, sparse, and iterative linear solvers for
+ `bundle adjustment`_ problems.
- **Line Search Solvers** - When the problem size is so large that
storing and factoring the Jacobian is not feasible or a low
@@ -62,6 +63,9 @@
modern C++ threads based multithreading of the Jacobian evaluation
and the linear solvers.
+* **GPU Acceleration** If your system supports `CUDA`_ then Ceres
+ Solver can use the Nvidia GPU on your system to speed up the solver.
+
* **Solution Quality** Ceres is the `best performing`_ solver on the NIST
problem set used by Mondragon and Borchers for benchmarking
non-linear least squares solvers.
@@ -89,3 +93,5 @@
.. _CXSparse: https://www.cise.ufl.edu/research/sparse/CXSparse/
.. _automatic: http://en.wikipedia.org/wiki/Automatic_differentiation
.. _numeric: http://en.wikipedia.org/wiki/Numerical_differentiation
+.. _CUDA : https://developer.nvidia.com/cuda-toolkit
+.. _Apple's Accelerate: https://developer.apple.com/documentation/accelerate/sparse_solvers
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
index a924d95..3591901 100644
--- a/docs/source/installation.rst
+++ b/docs/source/installation.rst
@@ -104,6 +104,11 @@
``SuiteSparse``, and optionally used by Ceres directly for some
operations.
+ TODO::
+
+ 1. Add a more detailed note about Intel MKL.
+ 2. Add detailed instructions about CUDA
+
On ``UNIX`` OSes other than macOS we recommend `ATLAS
<http://math-atlas.sourceforge.net/>`_, which includes ``BLAS`` and
``LAPACK`` routines. It is also possible to use `OpenBLAS
diff --git a/docs/source/nnls_solving.rst b/docs/source/nnls_solving.rst
index 9946645..a3225db 100644
--- a/docs/source/nnls_solving.rst
+++ b/docs/source/nnls_solving.rst
@@ -1325,16 +1325,17 @@
Default:``EIGEN``
Ceres supports using multiple dense linear algebra libraries for
- dense matrix factorizations. Currently ``EIGEN`` and ``LAPACK`` are
- the valid choices. ``EIGEN`` is always available, ``LAPACK`` refers
- to the system ``BLAS + LAPACK`` library which may or may not be
- available.
+ dense matrix factorizations. Currently ``EIGEN``, ``LAPACK`` and
+ ``CUDA`` are the valid choices. ``EIGEN`` is always available,
+ ``LAPACK`` refers to the system ``BLAS + LAPACK`` library which may
+ or may not be available. ``CUDA`` refers to Nvidia's GPU based
+ dense linear algebra library which may or may not be available.
This setting affects the ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY``
and ``DENSE_SCHUR`` solvers. For small to moderate sized probem
``EIGEN`` is a fine choice but for large problems, an optimized
- ``LAPACK + BLAS`` implementation can make a substantial difference
- in performance.
+ ``LAPACK + BLAS`` or ``CUDA`` implementation can make a substantial
+ difference in performance.
.. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library_type
diff --git a/include/ceres/solver.h b/include/ceres/solver.h
index 35644c4..026fc1c0 100644
--- a/include/ceres/solver.h
+++ b/include/ceres/solver.h
@@ -364,23 +364,23 @@
std::unordered_set<ResidualBlockId>
residual_blocks_for_subset_preconditioner;
- // Ceres supports using multiple dense linear algebra libraries
- // for dense matrix factorizations. Currently EIGEN and LAPACK are
- // the valid choices. EIGEN is always available, LAPACK refers to
- // the system BLAS + LAPACK library which may or may not be
+ // Ceres supports using multiple dense linear algebra libraries for dense
+ // matrix factorizations. Currently EIGEN, LAPACK and CUDA are the valid
+ // choices. EIGEN is always available, LAPACK refers to the system BLAS +
+ // LAPACK library which may or may not be available. CUDA refers to Nvidia's
+ // GPU based dense linear algebra library, which may or may not be
// available.
//
- // This setting affects the DENSE_QR, DENSE_NORMAL_CHOLESKY and
- // DENSE_SCHUR solvers. For small to moderate sized problem EIGEN
- // is a fine choice but for large problems, an optimized LAPACK +
- // BLAS implementation can make a substantial difference in
- // performance.
+ // This setting affects the DENSE_QR, DENSE_NORMAL_CHOLESKY and DENSE_SCHUR
+ // solvers. For small to moderate sized problem EIGEN is a fine choice but
+ // for large problems, an optimized LAPACK + BLAS or CUDA implementation can
+ // make a substantial difference in performance.
DenseLinearAlgebraLibraryType dense_linear_algebra_library_type = EIGEN;
- // Ceres supports using multiple sparse linear algebra libraries
- // for sparse matrix ordering and factorizations. Currently,
- // SUITE_SPARSE and CX_SPARSE are the valid choices, depending on
- // whether they are linked into Ceres at build time.
+ // Ceres supports using multiple sparse linear algebra libraries for sparse
+ // matrix ordering and factorizations. Currently, SUITE_SPARSE and CX_SPARSE
+ // are the valid choices, depending on whether they are linked into Ceres at
+ // build time.
SparseLinearAlgebraLibraryType sparse_linear_algebra_library_type =
#if !defined(CERES_NO_SUITESPARSE)
SUITE_SPARSE;
diff --git a/internal/ceres/types.cc b/internal/ceres/types.cc
index 8cfbc77..4824267 100644
--- a/internal/ceres/types.cc
+++ b/internal/ceres/types.cc
@@ -433,7 +433,7 @@
#ifdef CERES_NO_CUDA
return false;
#else
- return false;
+ return true;
#endif
}