tree 52619fc47d1d278f9712beb522a50ea1db9a73b3
parent f802a09ff12f79a3daaded70eaea2b033c0ce01f
author Joydeep Biswas <joydeepb@cs.utexas.edu> 1660859150 -0500
committer Joydeep Biswas <joydeepb@cs.utexas.edu> 1661522227 -0500

Added CUDA Jacobi Preconditioner

* Added CudaCpuPreconditionerWrapper, a templated wrapper to leverage
  existing CPU preconditioners that provide CompressedRowSparse
  representation, for use with CudaCgnrSolver.
* Added CudaJacobiPreconditioner as a specific instantiation of
  CudaCpuPreconditionerWrapper over BlockCRSJacobiPreconditioner.

The resulting Jacobi-preconditioned CudaCgnrSolver exhibits moderate
increase in speed with a healthy increase in numerical stability.
Examples of running CudaCgnrSolver with Jacobi vs. Identity
preconditioner for bundle adjustment on problem-13682-4456117-pre.txt
on a desktop computer with an Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz
and an Nvidia Quadro RTX 6000 GPU.

A known issue is that the Jacobi preconditioner update is slow - this
can be sped up in the future by multithreading on the CPU, or by
computing the preconditioner on the GPU.

====================================================================
CUDA CGNR + IDENTITY Preconditioner
====================================================================
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  1.126372e+09    0.00e+00    3.02e+15   0.00e+00   0.00e+00  1.00e+04        0    9.17e+00    2.70e+01
   1  3.796903e+07    1.09e+09    1.08e+15   6.48e+03   9.95e-01  3.00e+04       49    1.55e+01    4.24e+01
   2  3.529024e+21   -3.53e+21    1.08e+15   1.93e+04  -2.25e+14  1.50e+04      500    3.79e+01    8.03e+01
   3  7.952482e+09   -7.91e+09    1.08e+15   1.63e+04  -5.16e+02  3.75e+03      386    2.96e+01    1.10e+02
   4  8.842366e+08   -8.46e+08    1.08e+15   1.18e+04  -5.87e+01  4.69e+02      312    2.47e+01    1.35e+02
   5  2.935800e+07    8.61e+06    1.24e+13   3.93e+03   8.19e-01  6.32e+02      129    1.96e+01    1.54e+02
   6  2.529342e+07    4.06e+06    3.22e+12   3.39e+03   9.93e-01  1.90e+03      148    2.16e+01    1.76e+02
   7  2.343028e+07    1.86e+06    1.27e+13   6.00e+03   9.37e-01  5.68e+03      387    3.79e+01    2.14e+02
   8  2.332405e+07    1.06e+05    7.92e+12   6.47e+01   8.26e-01  7.87e+03       46    1.48e+01    2.28e+02
   9  3.591512e+14   -3.59e+14    7.92e+12   5.19e+03  -4.09e+08  3.94e+03      500    3.83e+01    2.67e+02
  10  2.956078e+07   -6.24e+06    7.92e+12   3.94e+03  -8.34e+00  9.84e+02      447    3.40e+01    3.01e+02

Solver Summary (v 2.2.0-eigen-(3.3.7)-lapack-suitesparse-(5.7.1)-metis-(5.1.0)-eigensparse-no_openmp-cuda-(11070))

                                     Original                  Reduced
Parameter blocks                      4469799                  4469799
Parameters                           13491489                 13491489
Residual blocks                      28987644                 28987644
Residuals                            57975288                 57975288

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT
Sparse linear algebra library     CUDA_SPARSE

                                        Given                     Used
Linear solver                            CGNR                     CGNR
Preconditioner                       IDENTITY                 IDENTITY
Threads                                    24                       24
Linear solver ordering          4456117,13682            4456117,13682

Cost:
Initial                          1.126372e+09
Final                            2.332405e+07
Change                           1.103048e+09

Minimizer iterations                       11
Successful steps                            6
Unsuccessful steps                          5

Time (in seconds):
Preprocessor                        17.798455

  Residual only evaluation           8.656350 (10)
  Jacobian & residual evaluation    35.794169 (6)
  Linear solver                    208.586109 (10)
Minimizer                          283.593882

Postprocessor                        1.985112
Total                              303.377449

====================================================================
CUDA CGNR + JACOBI Preconditioner
====================================================================
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  1.126372e+09    0.00e+00    3.02e+15   0.00e+00   0.00e+00  1.00e+04        0    9.18e+00    2.68e+01
   1  1.491225e+14   -1.49e+14    3.02e+15   8.73e+05  -1.36e+05  5.00e+03        6    1.07e+01    3.74e+01
   2  7.461790e+12   -7.46e+12    3.02e+15   4.78e+05  -6.78e+03  1.25e+03        6    8.88e+00    4.63e+01
   3  4.357438e+07    1.08e+09    5.41e+13   1.57e+05   9.87e-01  3.75e+03        6    1.62e+01    6.25e+01
   4  2.928453e+07    1.43e+07    6.85e+13   3.81e+05   6.97e-01  3.99e+03       23    1.82e+01    8.08e+01
   5  5.977433e+14   -5.98e+14    6.85e+13   4.97e+05  -8.70e+07  2.00e+03        9    9.93e+00    9.07e+01
   6  6.423965e+10   -6.42e+10    6.85e+13   2.45e+05  -9.61e+03  4.99e+02        7    8.95e+00    9.97e+01
   7  2.337471e+07    5.91e+06    1.49e+12   8.24e+04   9.31e-01  1.39e+03        4    1.60e+01    1.16e+02
   8  2.257524e+07    7.99e+05    8.71e+12   1.62e+05   8.99e-01  2.81e+03       54    2.03e+01    1.36e+02
   9  2.285090e+07   -2.76e+05    8.71e+12   3.56e+05  -6.13e-01  1.41e+03      104    1.65e+01    1.52e+02
  10  5.909983e+12   -5.91e+12    8.71e+12   1.78e+05  -1.83e+07  3.52e+02       74    1.36e+01    1.66e+02

Solver Summary (v 2.2.0-eigen-(3.3.7)-lapack-suitesparse-(5.7.1)-metis-(5.1.0)-eigensparse-no_openmp-cuda-(11070))

                                     Original                  Reduced
Parameter blocks                      4469799                  4469799
Parameters                           13491489                 13491489
Residual blocks                      28987644                 28987644
Residuals                            57975288                 57975288

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT
Sparse linear algebra library     CUDA_SPARSE

                                        Given                     Used
Linear solver                            CGNR                     CGNR
Preconditioner                         JACOBI                   JACOBI
Threads                                    24                       24
Linear solver ordering          4456117,13682            4456117,13682

Cost:
Initial                          1.126372e+09
Final                            2.257524e+07
Change                           1.103796e+09

Minimizer iterations                       11
Successful steps                            5
Unsuccessful steps                          6

Time (in seconds):
Preprocessor                        17.575990

  Residual only evaluation           8.461414 (10)
  Jacobian & residual evaluation    30.055383 (5)
  Linear solver                     82.550675 (10)
Minimizer                          149.137856

Postprocessor                        1.994014
Total                              168.707860

Change-Id: I458d2445bf062e54de44fc91517ed11a300c7182
