commit | 88e08cfe7158b59b848df70721d6fa29592af30d | [log] [tgz] |
---|---|---|
author | Joydeep Biswas <joydeepb@cs.utexas.edu> | Sat Jun 04 20:17:06 2022 -0500 |
committer | Joydeep Biswas <joydeepb@cs.utexas.edu> | Wed Jul 13 06:55:31 2022 -0500 |
tree | 449a579bdc7e34f4fd2a306578878c44b9aeb811 | |
parent | 290b34ef058eb83aae64236b26742867a7a9431d [diff] |
Mixed-precision Iterative Refinement Cholesky With CUDA * Created a new class CUDADenseCholeskyMixedPrecision, which performs Cholesky factorization and solving in single (fp32) precision, and optionally performs iterative refinement. * Added CUDA kernels for mixed-precision solve operations * Added more detailed timing information to the FullReport about Schur elimination, reduced system solves, and back-substitution. Some test performance numbers follow. All tests were performed on an Ubuntu 20.04 desktop with an Intel Core i9-9940X CPU and Nvidia Quadro RTX 6000 GPU. Tests were launched as: ./bin/bundle_adjuster --input (problem_file) \ --num_iterations 20 --num_threads 28 --linear_solver dense_schur --dense_linear_algebra_library (cuda|lapack) [--mixed_precision_solves] ================================================== problem-21-11315-pre.txt ================================================== -------------------------------------------------- Cuda Mixed Precision -------------------------------------------------- Cost: Initial 4.413239e+06 Final 3.037864e+04 Change 4.382861e+06 Linear solver 0.250703 (14) ├ Schur eliminate 0.234025 (14) ├ Reduced solve 0.006643 (14) └ Backsubstitute 0.006598 (12) -------------------------------------------------- Cuda -------------------------------------------------- Cost: Initial 4.413239e+06 Final 3.037864e+04 Change 4.382861e+06 Linear solver 0.257517 (12) ├ Schur eliminate 0.233518 (12) ├ Reduced solve 0.010621 (12) └ Backsubstitute 0.007124 (12) -------------------------------------------------- Lapack (OpenBLAS) -------------------------------------------------- Cost: Initial 4.413239e+06 Final 3.037864e+04 Change 4.382861e+06 Linear solver 0.332349 (12) ├ Schur eliminate 0.274748 (12) ├ Reduced solve 0.015966 (12) └ Backsubstitute 0.034192 (12) ================================================== problem-257-65132-pre.txt ================================================== -------------------------------------------------- Cuda Mixed Precision -------------------------------------------------- Cost: Initial 2.456242e+07 Final 9.677593e+04 Change 2.446565e+07 Linear solver 1.332367 (20) ├ Schur eliminate 1.021365 (20) ├ Reduced solve 0.195472 (20) └ Backsubstitute 0.075582 (20) -------------------------------------------------- Cuda -------------------------------------------------- Cost: Initial 2.456242e+07 Final 9.677547e+04 Change 2.446565e+07 Linear solver 1.810176 (20) ├ Schur eliminate 1.012862 (20) ├ Reduced solve 0.678704 (20) └ Backsubstitute 0.083925 (20) -------------------------------------------------- Lapack (OpenBLAS) -------------------------------------------------- Cost: Initial 2.456242e+07 Final 9.677547e+04 Change 2.446565e+07 Linear solver 2.376273 (20) ├ Schur eliminate 0.987613 (20) ├ Reduced solve 1.043873 (20) └ Backsubstitute 0.310402 (20) ================================================== problem-744-543562-pre.txt ================================================== -------------------------------------------------- Cuda Mixed Precision -------------------------------------------------- Cost: Initial 1.434881e+08 Final 1.546895e+06 Change 1.419412e+08 Linear solver 27.010088 (20) ├ Schur eliminate 24.362433 (20) ├ Reduced solve 1.428542 (20) └ Backsubstitute 0.814266 (20) -------------------------------------------------- Cuda -------------------------------------------------- Cost: Initial 1.434881e+08 Final 1.546895e+06 Change 1.419412e+08 Linear solver 32.342513 (20) ├ Schur eliminate 24.638819 (20) ├ Reduced solve 6.492090 (20) └ Backsubstitute 0.802184 (20) -------------------------------------------------- Lapack (OpenBLAS) -------------------------------------------------- Cost: Initial 1.434881e+08 Final 1.546895e+06 Change 1.419412e+08 Linear solver 34.152224 (20) ├ Schur eliminate 24.183723 (20) ├ Reduced solve 8.784413 (20) └ Backsubstitute 0.795044 (20) Change-Id: I178887e776d8f4a1e8abb99bbc205bf8c278bf79
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.