commit	b158515089a85be8425db66ed43546605f86a00e	[log] [tgz]
author	Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com>	Sun Nov 27 21:35:44 2022 +0300
committer	Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com>	Sat Dec 17 02:52:27 2022 +0300
tree	1e88ea96569aa49f06b970d0936e01daf44860d2
parent	2fd81de12d619f8a32769a6775dc8cb06355aa70 [diff]

Parallel operations on vectors

Main focus of this change is to parallelize remaining operations (most of them
are operations on vectors) in code-path utilized with iterative Schur
complement.

Parallelization is handled using lazy evaluation of Eigen expressions.

On linux pc with intel 8176 processor parallelization of vector operations has
the following effect:

Running ./bin/parallel_vector_operations_benchmark
Run on (112 X 3200.32 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x56)
  L1 Instruction 32 KiB (x56)
  L2 Unified 1024 KiB (x56)
  L3 Unified 39424 KiB (x2)
Load Average: 3.30, 8.41, 11.82
-----------------------------------
Benchmark                      Time
-----------------------------------
SetZero                 10009532 ns
SetZeroParallel/1       10024139 ns
...
SetZeroParallel/16        877606 ns

Negate                   4978856 ns
NegateParallel/1         5145413 ns
...
NegateParallel/16         721823 ns

Assign                  10731408 ns
AssignParallel/1        10749944 ns
...
AssignParallel/16        1829381 ns

D2X                     15214399 ns
D2XParallel/1           15623245 ns
...
D2XParallel/16           2687060 ns

DivideSqrt               8220050 ns
DivideSqrtParallel/1     9088467 ns
...
DivideSqrtParallel/16     905569 ns

Clamp                    3502010 ns
ClampParallel/1          4507897 ns
...
ClampParallel/16          759576 ns

Norm                     4426782 ns
NormParallel/1           4442805 ns
...
NormParallel/16           430290 ns

Dot                      9023276 ns
DotParallel/1            9031304 ns
...
DotParallel/16           1157267 ns

Axpby                   14608289 ns
AxpbyParallel/1         14570825 ns
...
AxpbyParallel/16         2672220 ns
-----------------------------------

Multi-threading of vector operations in ISC and program evaluation results into
the following improvement:

Running ./bin/evaluation_benchmark
--------------------------------------------------------------------------------------
Benchmark                                                               this   2fd81de
--------------------------------------------------------------------------------------
Residuals<problem-13682-4456117-pre.txt>/1                           4136 ms   4292 ms
Residuals<problem-13682-4456117-pre.txt>/2                           2919 ms   2670 ms
Residuals<problem-13682-4456117-pre.txt>/4                           2065 ms   2198 ms
Residuals<problem-13682-4456117-pre.txt>/8                           1458 ms   1609 ms
Residuals<problem-13682-4456117-pre.txt>/16                          1152 ms   1227 ms

ResidualsAndJacobian<problem-13682-4456117-pre.txt>/1               19759 ms  20084 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/2               10921 ms  10977 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/4                6220 ms   6941 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/8                3490 ms   4398 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/16               2277 ms   3172 ms

Plus<problem-13682-4456117-pre.txt>/1                                 339 ms    322 ms
Plus<problem-13682-4456117-pre.txt>/2                                 220 ms
Plus<problem-13682-4456117-pre.txt>/4                                 128 ms
Plus<problem-13682-4456117-pre.txt>/8                                78.0 ms
Plus<problem-13682-4456117-pre.txt>/16                               49.8 ms

ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/1       2434 ms   2478 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/2       2706 ms   2688 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/4       1430 ms   1548 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/8        742 ms    883 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/16       438 ms    555 ms

ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/1   2438 ms   2481 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/2   2565 ms   2790 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/4   1434 ms   1551 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/8    765 ms    892 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/16   435 ms    559 ms

JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/1           1278 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/2           1555 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/4            833 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/8            459 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/16           250 ms

JacobianScaleColumns<problem-13682-4456117-pre.txt>/1                1468 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/2                1871 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/4                 957 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/8                 528 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/16                294 ms

End-to-end improvements with bundle_adjuster invoked with
./bin/bundle_adjuster --num_threads 28 --num_iterations 40 \
                      --linear_solver iterative_schur \
                      --preconditioner jacobi --input
---------------------------------------------
Problem                         this  2fd81de
---------------------------------------------
problem-13682-4456117-pre.txt  508.6    892.7
problem-1778-993923-pre.txt    763.8   1129.9
problem-1723-156502-pre.txt      6.3     14.4
problem-356-226730-pre.txt      76.3    116.2
problem-257-65132-pre.txt       38.6     52.0

Change-Id: Ie31cc5015f13fa479c16ffb5ce48c9b880990d49

31 files changed

tree: 1e88ea96569aa49f06b970d0936e01daf44860d2

README.md

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

Non-linear Least Squares problems with bounds constraints.
General unconstrained optimization problems.

Please see ceres-solver.org for more information.