| commit | b158515089a85be8425db66ed43546605f86a00e | [log] [tgz] | 
|---|---|---|
| author | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Sun Nov 27 21:35:44 2022 +0300 | 
| committer | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Sat Dec 17 02:52:27 2022 +0300 | 
| tree | 1e88ea96569aa49f06b970d0936e01daf44860d2 | |
| parent | 2fd81de12d619f8a32769a6775dc8cb06355aa70 [diff] | 
Parallel operations on vectors
Main focus of this change is to parallelize remaining operations (most of them
are operations on vectors) in code-path utilized with iterative Schur
complement.
Parallelization is handled using lazy evaluation of Eigen expressions.
On linux pc with intel 8176 processor parallelization of vector operations has
the following effect:
Running ./bin/parallel_vector_operations_benchmark
Run on (112 X 3200.32 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x56)
  L1 Instruction 32 KiB (x56)
  L2 Unified 1024 KiB (x56)
  L3 Unified 39424 KiB (x2)
Load Average: 3.30, 8.41, 11.82
-----------------------------------
Benchmark                      Time
-----------------------------------
SetZero                 10009532 ns
SetZeroParallel/1       10024139 ns
...
SetZeroParallel/16        877606 ns
Negate                   4978856 ns
NegateParallel/1         5145413 ns
...
NegateParallel/16         721823 ns
Assign                  10731408 ns
AssignParallel/1        10749944 ns
...
AssignParallel/16        1829381 ns
D2X                     15214399 ns
D2XParallel/1           15623245 ns
...
D2XParallel/16           2687060 ns
DivideSqrt               8220050 ns
DivideSqrtParallel/1     9088467 ns
...
DivideSqrtParallel/16     905569 ns
Clamp                    3502010 ns
ClampParallel/1          4507897 ns
...
ClampParallel/16          759576 ns
Norm                     4426782 ns
NormParallel/1           4442805 ns
...
NormParallel/16           430290 ns
Dot                      9023276 ns
DotParallel/1            9031304 ns
...
DotParallel/16           1157267 ns
Axpby                   14608289 ns
AxpbyParallel/1         14570825 ns
...
AxpbyParallel/16         2672220 ns
-----------------------------------
Multi-threading of vector operations in ISC and program evaluation results into
the following improvement:
Running ./bin/evaluation_benchmark
--------------------------------------------------------------------------------------
Benchmark                                                               this   2fd81de
--------------------------------------------------------------------------------------
Residuals<problem-13682-4456117-pre.txt>/1                           4136 ms   4292 ms
Residuals<problem-13682-4456117-pre.txt>/2                           2919 ms   2670 ms
Residuals<problem-13682-4456117-pre.txt>/4                           2065 ms   2198 ms
Residuals<problem-13682-4456117-pre.txt>/8                           1458 ms   1609 ms
Residuals<problem-13682-4456117-pre.txt>/16                          1152 ms   1227 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/1               19759 ms  20084 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/2               10921 ms  10977 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/4                6220 ms   6941 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/8                3490 ms   4398 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/16               2277 ms   3172 ms
Plus<problem-13682-4456117-pre.txt>/1                                 339 ms    322 ms
Plus<problem-13682-4456117-pre.txt>/2                                 220 ms
Plus<problem-13682-4456117-pre.txt>/4                                 128 ms
Plus<problem-13682-4456117-pre.txt>/8                                78.0 ms
Plus<problem-13682-4456117-pre.txt>/16                               49.8 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/1       2434 ms   2478 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/2       2706 ms   2688 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/4       1430 ms   1548 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/8        742 ms    883 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/16       438 ms    555 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/1   2438 ms   2481 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/2   2565 ms   2790 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/4   1434 ms   1551 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/8    765 ms    892 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/16   435 ms    559 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/1           1278 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/2           1555 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/4            833 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/8            459 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/16           250 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/1                1468 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/2                1871 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/4                 957 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/8                 528 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/16                294 ms
End-to-end improvements with bundle_adjuster invoked with
./bin/bundle_adjuster --num_threads 28 --num_iterations 40 \
                      --linear_solver iterative_schur \
                      --preconditioner jacobi --input
---------------------------------------------
Problem                         this  2fd81de
---------------------------------------------
problem-13682-4456117-pre.txt  508.6    892.7
problem-1778-993923-pre.txt    763.8   1129.9
problem-1723-156502-pre.txt      6.3     14.4
problem-356-226730-pre.txt      76.3    116.2
problem-257-65132-pre.txt       38.6     52.0
Change-Id: Ie31cc5015f13fa479c16ffb5ce48c9b880990d49
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.