Parallelize block_jacobi_preconditioner

Use ParallelFor to parallelize both versions of the
block Jacobi preconditioner. Also add benchmarks for
varying number of threads.

Benchmark on M1 Mac Pro

Before:
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
BM_BlockSparseJacobiPreconditionerBA             44847927 ns     44788313 ns           16
BM_BlockCRSJacobiPreconditionerBA                48772330 ns     48723571 ns           14
BM_BlockSparseJacobiPreconditionerUnstructured   62385231 ns     62306818 ns           11
BM_BlockCRSJacobiPreconditionerUnstructured      60671473 ns     60577727 ns           11

After:
--------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations
--------------------------------------------------------------------------------------------
BM_BlockSparseJacobiPreconditionerBA/1              53314862 ns     53302308 ns           13
BM_BlockSparseJacobiPreconditionerBA/2              33601214 ns     33295143 ns           21
BM_BlockSparseJacobiPreconditionerBA/4              28162794 ns     27224167 ns           30
BM_BlockSparseJacobiPreconditionerBA/8              31402448 ns     28038760 ns           25
BM_BlockSparseJacobiPreconditionerBA/16             30820813 ns     22625233 ns           30
BM_BlockCRSJacobiPreconditionerBA/1                 60348194 ns     60332167 ns           12
BM_BlockCRSJacobiPreconditionerBA/2                 35489954 ns     34782050 ns           20
BM_BlockCRSJacobiPreconditionerBA/4                 23636360 ns     22547032 ns           31
BM_BlockCRSJacobiPreconditionerBA/8                 31688798 ns     27857800 ns           25
BM_BlockCRSJacobiPreconditionerBA/16                30806695 ns     20562516 ns           31
BM_BlockSparseJacobiPreconditionerUnstructured/1    59793396 ns     59788583 ns           12
BM_BlockSparseJacobiPreconditionerUnstructured/2    35192900 ns     34968900 ns           20
BM_BlockSparseJacobiPreconditionerUnstructured/4    30171145 ns     28924480 ns           25
BM_BlockSparseJacobiPreconditionerUnstructured/8    24982583 ns     23193172 ns           29
BM_BlockSparseJacobiPreconditionerUnstructured/16   23370546 ns     18389694 ns           36
BM_BlockCRSJacobiPreconditionerUnstructured/1       63204538 ns     63204545 ns           11
BM_BlockCRSJacobiPreconditionerUnstructured/2       34466060 ns     34193429 ns           21
BM_BlockCRSJacobiPreconditionerUnstructured/4       22712230 ns     20491147 ns           34
BM_BlockCRSJacobiPreconditionerUnstructured/8       16701833 ns     16190395 ns           43
BM_BlockCRSJacobiPreconditionerUnstructured/16      16762565 ns     12857304 ns           56

Note that single threaded performance gets worse. Performance goes up for 2 and 4 threads
and then essentially stalls.

Change-Id: I96a5d2f719545e14c03d73e71c8c0564e8c1c729
5 files changed
tree: 5f5c83079a19d9676d692b925e479154f2980079
  1. .github/
  2. bazel/
  3. cmake/
  4. config/
  5. data/
  6. docs/
  7. examples/
  8. include/
  9. internal/
  10. scripts/
  11. .clang-format
  12. .gitignore
  13. BUILD
  14. CITATION.cff
  15. CMakeLists.txt
  16. CONTRIBUTING.md
  17. LICENSE
  18. package.xml
  19. README.md
  20. WORKSPACE
README.md

Android Linux macOS Windows

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

  1. Non-linear Least Squares problems with bounds constraints.
  2. General unconstrained optimization problems.

Please see ceres-solver.org for more information.