Parallelize block_jacobi_preconditioner

Use ParallelFor to parallelize both versions of the
block Jacobi preconditioner. Also add benchmarks for
varying number of threads.

Benchmark on M1 Mac Pro

Benchmark                                               Time             CPU   Iterations
BM_BlockSparseJacobiPreconditionerBA             44847927 ns     44788313 ns           16
BM_BlockCRSJacobiPreconditionerBA                48772330 ns     48723571 ns           14
BM_BlockSparseJacobiPreconditionerUnstructured   62385231 ns     62306818 ns           11
BM_BlockCRSJacobiPreconditionerUnstructured      60671473 ns     60577727 ns           11

Benchmark                                                  Time             CPU   Iterations
BM_BlockSparseJacobiPreconditionerBA/1              53314862 ns     53302308 ns           13
BM_BlockSparseJacobiPreconditionerBA/2              33601214 ns     33295143 ns           21
BM_BlockSparseJacobiPreconditionerBA/4              28162794 ns     27224167 ns           30
BM_BlockSparseJacobiPreconditionerBA/8              31402448 ns     28038760 ns           25
BM_BlockSparseJacobiPreconditionerBA/16             30820813 ns     22625233 ns           30
BM_BlockCRSJacobiPreconditionerBA/1                 60348194 ns     60332167 ns           12
BM_BlockCRSJacobiPreconditionerBA/2                 35489954 ns     34782050 ns           20
BM_BlockCRSJacobiPreconditionerBA/4                 23636360 ns     22547032 ns           31
BM_BlockCRSJacobiPreconditionerBA/8                 31688798 ns     27857800 ns           25
BM_BlockCRSJacobiPreconditionerBA/16                30806695 ns     20562516 ns           31
BM_BlockSparseJacobiPreconditionerUnstructured/1    59793396 ns     59788583 ns           12
BM_BlockSparseJacobiPreconditionerUnstructured/2    35192900 ns     34968900 ns           20
BM_BlockSparseJacobiPreconditionerUnstructured/4    30171145 ns     28924480 ns           25
BM_BlockSparseJacobiPreconditionerUnstructured/8    24982583 ns     23193172 ns           29
BM_BlockSparseJacobiPreconditionerUnstructured/16   23370546 ns     18389694 ns           36
BM_BlockCRSJacobiPreconditionerUnstructured/1       63204538 ns     63204545 ns           11
BM_BlockCRSJacobiPreconditionerUnstructured/2       34466060 ns     34193429 ns           21
BM_BlockCRSJacobiPreconditionerUnstructured/4       22712230 ns     20491147 ns           34
BM_BlockCRSJacobiPreconditionerUnstructured/8       16701833 ns     16190395 ns           43
BM_BlockCRSJacobiPreconditionerUnstructured/16      16762565 ns     12857304 ns           56

Note that single threaded performance gets worse. Performance goes up for 2 and 4 threads
and then essentially stalls.

Change-Id: I96a5d2f719545e14c03d73e71c8c0564e8c1c729
5 files changed
tree: 5f5c83079a19d9676d692b925e479154f2980079
  1. .github/
  2. bazel/
  3. cmake/
  4. config/
  5. data/
  6. docs/
  7. examples/
  8. include/
  9. internal/
  10. scripts/
  11. .clang-format
  12. .gitignore
  13. BUILD
  14. CITATION.cff
  15. CMakeLists.txt
  18. package.xml

Android Linux macOS Windows

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

  1. Non-linear Least Squares problems with bounds constraints.
  2. General unconstrained optimization problems.

Please see for more information.