Mixed-precision Iterative Refinement Cholesky With CUDA

* Created a new class CUDADenseCholeskyMixedPrecision, which performs
  Cholesky factorization and solving in single (fp32) precision, and
  optionally performs iterative refinement.
* Added CUDA kernels for mixed-precision solve operations
* Added more detailed timing information to the FullReport about Schur
  elimination, reduced system solves, and back-substitution.

Some test performance numbers follow.
All tests were performed on an Ubuntu 20.04 desktop with an
Intel Core i9-9940X CPU and Nvidia Quadro RTX 6000 GPU.

Tests were launched as:
./bin/bundle_adjuster --input (problem_file) \
    --num_iterations 20
    --num_threads 28
    --linear_solver dense_schur
    --dense_linear_algebra_library (cuda|lapack)
    [--mixed_precision_solves]

==================================================
problem-21-11315-pre.txt
==================================================

--------------------------------------------------
Cuda Mixed Precision
--------------------------------------------------
Cost:
Initial                          4.413239e+06
Final                            3.037864e+04
Change                           4.382861e+06
  Linear solver                      0.250703 (14)
  ├ Schur eliminate                  0.234025 (14)
  ├ Reduced solve                    0.006643 (14)
  └ Backsubstitute                   0.006598 (12)

--------------------------------------------------
Cuda
--------------------------------------------------
Cost:
Initial                          4.413239e+06
Final                            3.037864e+04
Change                           4.382861e+06
  Linear solver                      0.257517 (12)
  ├ Schur eliminate                  0.233518 (12)
  ├ Reduced solve                    0.010621 (12)
  └ Backsubstitute                   0.007124 (12)

--------------------------------------------------
Lapack (OpenBLAS)
--------------------------------------------------
Cost:
Initial                          4.413239e+06
Final                            3.037864e+04
Change                           4.382861e+06
  Linear solver                      0.332349 (12)
  ├ Schur eliminate                  0.274748 (12)
  ├ Reduced solve                    0.015966 (12)
  └ Backsubstitute                   0.034192 (12)

==================================================
problem-257-65132-pre.txt
==================================================

--------------------------------------------------
Cuda Mixed Precision
--------------------------------------------------
Cost:
Initial                          2.456242e+07
Final                            9.677593e+04
Change                           2.446565e+07
  Linear solver                      1.332367 (20)
  ├ Schur eliminate                  1.021365 (20)
  ├ Reduced solve                    0.195472 (20)
  └ Backsubstitute                   0.075582 (20)

--------------------------------------------------
Cuda
--------------------------------------------------
Cost:
Initial                          2.456242e+07
Final                            9.677547e+04
Change                           2.446565e+07
  Linear solver                      1.810176 (20)
  ├ Schur eliminate                  1.012862 (20)
  ├ Reduced solve                    0.678704 (20)
  └ Backsubstitute                   0.083925 (20)

--------------------------------------------------
Lapack (OpenBLAS)
--------------------------------------------------
Cost:
Initial                          2.456242e+07
Final                            9.677547e+04
Change                           2.446565e+07
  Linear solver                      2.376273 (20)
  ├ Schur eliminate                  0.987613 (20)
  ├ Reduced solve                    1.043873 (20)
  └ Backsubstitute                   0.310402 (20)

==================================================
problem-744-543562-pre.txt
==================================================

--------------------------------------------------
Cuda Mixed Precision
--------------------------------------------------
Cost:
Initial                          1.434881e+08
Final                            1.546895e+06
Change                           1.419412e+08
  Linear solver                     27.010088 (20)
  ├ Schur eliminate                 24.362433 (20)
  ├ Reduced solve                    1.428542 (20)
  └ Backsubstitute                   0.814266 (20)

--------------------------------------------------
Cuda
--------------------------------------------------
Cost:
Initial                          1.434881e+08
Final                            1.546895e+06
Change                           1.419412e+08
  Linear solver                     32.342513 (20)
  ├ Schur eliminate                 24.638819 (20)
  ├ Reduced solve                    6.492090 (20)
  └ Backsubstitute                   0.802184 (20)

--------------------------------------------------
Lapack (OpenBLAS)
--------------------------------------------------
Cost:
Initial                          1.434881e+08
Final                            1.546895e+06
Change                           1.419412e+08
  Linear solver                     34.152224 (20)
  ├ Schur eliminate                 24.183723 (20)
  ├ Reduced solve                    8.784413 (20)
  └ Backsubstitute                   0.795044 (20)

Change-Id: I178887e776d8f4a1e8abb99bbc205bf8c278bf79
12 files changed
tree: 449a579bdc7e34f4fd2a306578878c44b9aeb811
  1. .github/
  2. bazel/
  3. cmake/
  4. config/
  5. data/
  6. docs/
  7. examples/
  8. include/
  9. internal/
  10. scripts/
  11. .clang-format
  12. .gitignore
  13. BUILD
  14. CITATION.cff
  15. CMakeLists.txt
  16. CONTRIBUTING.md
  17. LICENSE
  18. package.xml
  19. README.md
  20. WORKSPACE
README.md

Android Linux macOS Windows

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

  1. Non-linear Least Squares problems with bounds constraints.
  2. General unconstrained optimization problems.

Please see ceres-solver.org for more information.