| commit | 88e08cfe7158b59b848df70721d6fa29592af30d | [log] [tgz] |
|---|---|---|
| author | Joydeep Biswas <joydeepb@cs.utexas.edu> | Sat Jun 04 20:17:06 2022 -0500 |
| committer | Joydeep Biswas <joydeepb@cs.utexas.edu> | Wed Jul 13 06:55:31 2022 -0500 |
| tree | 449a579bdc7e34f4fd2a306578878c44b9aeb811 | |
| parent | 290b34ef058eb83aae64236b26742867a7a9431d [diff] |
Mixed-precision Iterative Refinement Cholesky With CUDA
* Created a new class CUDADenseCholeskyMixedPrecision, which performs
Cholesky factorization and solving in single (fp32) precision, and
optionally performs iterative refinement.
* Added CUDA kernels for mixed-precision solve operations
* Added more detailed timing information to the FullReport about Schur
elimination, reduced system solves, and back-substitution.
Some test performance numbers follow.
All tests were performed on an Ubuntu 20.04 desktop with an
Intel Core i9-9940X CPU and Nvidia Quadro RTX 6000 GPU.
Tests were launched as:
./bin/bundle_adjuster --input (problem_file) \
--num_iterations 20
--num_threads 28
--linear_solver dense_schur
--dense_linear_algebra_library (cuda|lapack)
[--mixed_precision_solves]
==================================================
problem-21-11315-pre.txt
==================================================
--------------------------------------------------
Cuda Mixed Precision
--------------------------------------------------
Cost:
Initial 4.413239e+06
Final 3.037864e+04
Change 4.382861e+06
Linear solver 0.250703 (14)
├ Schur eliminate 0.234025 (14)
├ Reduced solve 0.006643 (14)
└ Backsubstitute 0.006598 (12)
--------------------------------------------------
Cuda
--------------------------------------------------
Cost:
Initial 4.413239e+06
Final 3.037864e+04
Change 4.382861e+06
Linear solver 0.257517 (12)
├ Schur eliminate 0.233518 (12)
├ Reduced solve 0.010621 (12)
└ Backsubstitute 0.007124 (12)
--------------------------------------------------
Lapack (OpenBLAS)
--------------------------------------------------
Cost:
Initial 4.413239e+06
Final 3.037864e+04
Change 4.382861e+06
Linear solver 0.332349 (12)
├ Schur eliminate 0.274748 (12)
├ Reduced solve 0.015966 (12)
└ Backsubstitute 0.034192 (12)
==================================================
problem-257-65132-pre.txt
==================================================
--------------------------------------------------
Cuda Mixed Precision
--------------------------------------------------
Cost:
Initial 2.456242e+07
Final 9.677593e+04
Change 2.446565e+07
Linear solver 1.332367 (20)
├ Schur eliminate 1.021365 (20)
├ Reduced solve 0.195472 (20)
└ Backsubstitute 0.075582 (20)
--------------------------------------------------
Cuda
--------------------------------------------------
Cost:
Initial 2.456242e+07
Final 9.677547e+04
Change 2.446565e+07
Linear solver 1.810176 (20)
├ Schur eliminate 1.012862 (20)
├ Reduced solve 0.678704 (20)
└ Backsubstitute 0.083925 (20)
--------------------------------------------------
Lapack (OpenBLAS)
--------------------------------------------------
Cost:
Initial 2.456242e+07
Final 9.677547e+04
Change 2.446565e+07
Linear solver 2.376273 (20)
├ Schur eliminate 0.987613 (20)
├ Reduced solve 1.043873 (20)
└ Backsubstitute 0.310402 (20)
==================================================
problem-744-543562-pre.txt
==================================================
--------------------------------------------------
Cuda Mixed Precision
--------------------------------------------------
Cost:
Initial 1.434881e+08
Final 1.546895e+06
Change 1.419412e+08
Linear solver 27.010088 (20)
├ Schur eliminate 24.362433 (20)
├ Reduced solve 1.428542 (20)
└ Backsubstitute 0.814266 (20)
--------------------------------------------------
Cuda
--------------------------------------------------
Cost:
Initial 1.434881e+08
Final 1.546895e+06
Change 1.419412e+08
Linear solver 32.342513 (20)
├ Schur eliminate 24.638819 (20)
├ Reduced solve 6.492090 (20)
└ Backsubstitute 0.802184 (20)
--------------------------------------------------
Lapack (OpenBLAS)
--------------------------------------------------
Cost:
Initial 1.434881e+08
Final 1.546895e+06
Change 1.419412e+08
Linear solver 34.152224 (20)
├ Schur eliminate 24.183723 (20)
├ Reduced solve 8.784413 (20)
└ Backsubstitute 0.795044 (20)
Change-Id: I178887e776d8f4a1e8abb99bbc205bf8c278bf79
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.