commit | 739f2a25aea83fc3db119b85779b08fae465d0dd | [log] [tgz] |
---|---|---|
author | Sameer Agarwal <sameeragarwal@google.com> | Mon Aug 29 20:40:30 2022 -0700 |
committer | Sameer Agarwal <sameeragarwal@google.com> | Tue Sep 20 08:43:19 2022 -0700 |
tree | 5f5c83079a19d9676d692b925e479154f2980079 | |
parent | c0c4f93940f86e8d108e84a60104bfda4aad66b3 [diff] |
Parallelize block_jacobi_preconditioner Use ParallelFor to parallelize both versions of the block Jacobi preconditioner. Also add benchmarks for varying number of threads. Benchmark on M1 Mac Pro Before: ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- BM_BlockSparseJacobiPreconditionerBA 44847927 ns 44788313 ns 16 BM_BlockCRSJacobiPreconditionerBA 48772330 ns 48723571 ns 14 BM_BlockSparseJacobiPreconditionerUnstructured 62385231 ns 62306818 ns 11 BM_BlockCRSJacobiPreconditionerUnstructured 60671473 ns 60577727 ns 11 After: -------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------------- BM_BlockSparseJacobiPreconditionerBA/1 53314862 ns 53302308 ns 13 BM_BlockSparseJacobiPreconditionerBA/2 33601214 ns 33295143 ns 21 BM_BlockSparseJacobiPreconditionerBA/4 28162794 ns 27224167 ns 30 BM_BlockSparseJacobiPreconditionerBA/8 31402448 ns 28038760 ns 25 BM_BlockSparseJacobiPreconditionerBA/16 30820813 ns 22625233 ns 30 BM_BlockCRSJacobiPreconditionerBA/1 60348194 ns 60332167 ns 12 BM_BlockCRSJacobiPreconditionerBA/2 35489954 ns 34782050 ns 20 BM_BlockCRSJacobiPreconditionerBA/4 23636360 ns 22547032 ns 31 BM_BlockCRSJacobiPreconditionerBA/8 31688798 ns 27857800 ns 25 BM_BlockCRSJacobiPreconditionerBA/16 30806695 ns 20562516 ns 31 BM_BlockSparseJacobiPreconditionerUnstructured/1 59793396 ns 59788583 ns 12 BM_BlockSparseJacobiPreconditionerUnstructured/2 35192900 ns 34968900 ns 20 BM_BlockSparseJacobiPreconditionerUnstructured/4 30171145 ns 28924480 ns 25 BM_BlockSparseJacobiPreconditionerUnstructured/8 24982583 ns 23193172 ns 29 BM_BlockSparseJacobiPreconditionerUnstructured/16 23370546 ns 18389694 ns 36 BM_BlockCRSJacobiPreconditionerUnstructured/1 63204538 ns 63204545 ns 11 BM_BlockCRSJacobiPreconditionerUnstructured/2 34466060 ns 34193429 ns 21 BM_BlockCRSJacobiPreconditionerUnstructured/4 22712230 ns 20491147 ns 34 BM_BlockCRSJacobiPreconditionerUnstructured/8 16701833 ns 16190395 ns 43 BM_BlockCRSJacobiPreconditionerUnstructured/16 16762565 ns 12857304 ns 56 Note that single threaded performance gets worse. Performance goes up for 2 and 4 threads and then essentially stalls. Change-Id: I96a5d2f719545e14c03d73e71c8c0564e8c1c729
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.