| commit | b158515089a85be8425db66ed43546605f86a00e | [log] [tgz] |
|---|---|---|
| author | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Sun Nov 27 21:35:44 2022 +0300 |
| committer | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Sat Dec 17 02:52:27 2022 +0300 |
| tree | 1e88ea96569aa49f06b970d0936e01daf44860d2 | |
| parent | 2fd81de12d619f8a32769a6775dc8cb06355aa70 [diff] |
Parallel operations on vectors
Main focus of this change is to parallelize remaining operations (most of them
are operations on vectors) in code-path utilized with iterative Schur
complement.
Parallelization is handled using lazy evaluation of Eigen expressions.
On linux pc with intel 8176 processor parallelization of vector operations has
the following effect:
Running ./bin/parallel_vector_operations_benchmark
Run on (112 X 3200.32 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x56)
L1 Instruction 32 KiB (x56)
L2 Unified 1024 KiB (x56)
L3 Unified 39424 KiB (x2)
Load Average: 3.30, 8.41, 11.82
-----------------------------------
Benchmark Time
-----------------------------------
SetZero 10009532 ns
SetZeroParallel/1 10024139 ns
...
SetZeroParallel/16 877606 ns
Negate 4978856 ns
NegateParallel/1 5145413 ns
...
NegateParallel/16 721823 ns
Assign 10731408 ns
AssignParallel/1 10749944 ns
...
AssignParallel/16 1829381 ns
D2X 15214399 ns
D2XParallel/1 15623245 ns
...
D2XParallel/16 2687060 ns
DivideSqrt 8220050 ns
DivideSqrtParallel/1 9088467 ns
...
DivideSqrtParallel/16 905569 ns
Clamp 3502010 ns
ClampParallel/1 4507897 ns
...
ClampParallel/16 759576 ns
Norm 4426782 ns
NormParallel/1 4442805 ns
...
NormParallel/16 430290 ns
Dot 9023276 ns
DotParallel/1 9031304 ns
...
DotParallel/16 1157267 ns
Axpby 14608289 ns
AxpbyParallel/1 14570825 ns
...
AxpbyParallel/16 2672220 ns
-----------------------------------
Multi-threading of vector operations in ISC and program evaluation results into
the following improvement:
Running ./bin/evaluation_benchmark
--------------------------------------------------------------------------------------
Benchmark this 2fd81de
--------------------------------------------------------------------------------------
Residuals<problem-13682-4456117-pre.txt>/1 4136 ms 4292 ms
Residuals<problem-13682-4456117-pre.txt>/2 2919 ms 2670 ms
Residuals<problem-13682-4456117-pre.txt>/4 2065 ms 2198 ms
Residuals<problem-13682-4456117-pre.txt>/8 1458 ms 1609 ms
Residuals<problem-13682-4456117-pre.txt>/16 1152 ms 1227 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/1 19759 ms 20084 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/2 10921 ms 10977 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/4 6220 ms 6941 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/8 3490 ms 4398 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/16 2277 ms 3172 ms
Plus<problem-13682-4456117-pre.txt>/1 339 ms 322 ms
Plus<problem-13682-4456117-pre.txt>/2 220 ms
Plus<problem-13682-4456117-pre.txt>/4 128 ms
Plus<problem-13682-4456117-pre.txt>/8 78.0 ms
Plus<problem-13682-4456117-pre.txt>/16 49.8 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/1 2434 ms 2478 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/2 2706 ms 2688 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/4 1430 ms 1548 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/8 742 ms 883 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/16 438 ms 555 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/1 2438 ms 2481 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/2 2565 ms 2790 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/4 1434 ms 1551 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/8 765 ms 892 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/16 435 ms 559 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/1 1278 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/2 1555 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/4 833 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/8 459 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/16 250 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/1 1468 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/2 1871 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/4 957 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/8 528 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/16 294 ms
End-to-end improvements with bundle_adjuster invoked with
./bin/bundle_adjuster --num_threads 28 --num_iterations 40 \
--linear_solver iterative_schur \
--preconditioner jacobi --input
---------------------------------------------
Problem this 2fd81de
---------------------------------------------
problem-13682-4456117-pre.txt 508.6 892.7
problem-1778-993923-pre.txt 763.8 1129.9
problem-1723-156502-pre.txt 6.3 14.4
problem-356-226730-pre.txt 76.3 116.2
problem-257-65132-pre.txt 38.6 52.0
Change-Id: Ie31cc5015f13fa479c16ffb5ce48c9b880990d49
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.