commit | b158515089a85be8425db66ed43546605f86a00e | [log] [tgz] |
---|---|---|
author | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Sun Nov 27 21:35:44 2022 +0300 |
committer | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Sat Dec 17 02:52:27 2022 +0300 |
tree | 1e88ea96569aa49f06b970d0936e01daf44860d2 | |
parent | 2fd81de12d619f8a32769a6775dc8cb06355aa70 [diff] |
Parallel operations on vectors Main focus of this change is to parallelize remaining operations (most of them are operations on vectors) in code-path utilized with iterative Schur complement. Parallelization is handled using lazy evaluation of Eigen expressions. On linux pc with intel 8176 processor parallelization of vector operations has the following effect: Running ./bin/parallel_vector_operations_benchmark Run on (112 X 3200.32 MHz CPU s) CPU Caches: L1 Data 32 KiB (x56) L1 Instruction 32 KiB (x56) L2 Unified 1024 KiB (x56) L3 Unified 39424 KiB (x2) Load Average: 3.30, 8.41, 11.82 ----------------------------------- Benchmark Time ----------------------------------- SetZero 10009532 ns SetZeroParallel/1 10024139 ns ... SetZeroParallel/16 877606 ns Negate 4978856 ns NegateParallel/1 5145413 ns ... NegateParallel/16 721823 ns Assign 10731408 ns AssignParallel/1 10749944 ns ... AssignParallel/16 1829381 ns D2X 15214399 ns D2XParallel/1 15623245 ns ... D2XParallel/16 2687060 ns DivideSqrt 8220050 ns DivideSqrtParallel/1 9088467 ns ... DivideSqrtParallel/16 905569 ns Clamp 3502010 ns ClampParallel/1 4507897 ns ... ClampParallel/16 759576 ns Norm 4426782 ns NormParallel/1 4442805 ns ... NormParallel/16 430290 ns Dot 9023276 ns DotParallel/1 9031304 ns ... DotParallel/16 1157267 ns Axpby 14608289 ns AxpbyParallel/1 14570825 ns ... AxpbyParallel/16 2672220 ns ----------------------------------- Multi-threading of vector operations in ISC and program evaluation results into the following improvement: Running ./bin/evaluation_benchmark -------------------------------------------------------------------------------------- Benchmark this 2fd81de -------------------------------------------------------------------------------------- Residuals<problem-13682-4456117-pre.txt>/1 4136 ms 4292 ms Residuals<problem-13682-4456117-pre.txt>/2 2919 ms 2670 ms Residuals<problem-13682-4456117-pre.txt>/4 2065 ms 2198 ms Residuals<problem-13682-4456117-pre.txt>/8 1458 ms 1609 ms Residuals<problem-13682-4456117-pre.txt>/16 1152 ms 1227 ms ResidualsAndJacobian<problem-13682-4456117-pre.txt>/1 19759 ms 20084 ms ResidualsAndJacobian<problem-13682-4456117-pre.txt>/2 10921 ms 10977 ms ResidualsAndJacobian<problem-13682-4456117-pre.txt>/4 6220 ms 6941 ms ResidualsAndJacobian<problem-13682-4456117-pre.txt>/8 3490 ms 4398 ms ResidualsAndJacobian<problem-13682-4456117-pre.txt>/16 2277 ms 3172 ms Plus<problem-13682-4456117-pre.txt>/1 339 ms 322 ms Plus<problem-13682-4456117-pre.txt>/2 220 ms Plus<problem-13682-4456117-pre.txt>/4 128 ms Plus<problem-13682-4456117-pre.txt>/8 78.0 ms Plus<problem-13682-4456117-pre.txt>/16 49.8 ms ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/1 2434 ms 2478 ms ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/2 2706 ms 2688 ms ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/4 1430 ms 1548 ms ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/8 742 ms 883 ms ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/16 438 ms 555 ms ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/1 2438 ms 2481 ms ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/2 2565 ms 2790 ms ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/4 1434 ms 1551 ms ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/8 765 ms 892 ms ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/16 435 ms 559 ms JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/1 1278 ms JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/2 1555 ms JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/4 833 ms JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/8 459 ms JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/16 250 ms JacobianScaleColumns<problem-13682-4456117-pre.txt>/1 1468 ms JacobianScaleColumns<problem-13682-4456117-pre.txt>/2 1871 ms JacobianScaleColumns<problem-13682-4456117-pre.txt>/4 957 ms JacobianScaleColumns<problem-13682-4456117-pre.txt>/8 528 ms JacobianScaleColumns<problem-13682-4456117-pre.txt>/16 294 ms End-to-end improvements with bundle_adjuster invoked with ./bin/bundle_adjuster --num_threads 28 --num_iterations 40 \ --linear_solver iterative_schur \ --preconditioner jacobi --input --------------------------------------------- Problem this 2fd81de --------------------------------------------- problem-13682-4456117-pre.txt 508.6 892.7 problem-1778-993923-pre.txt 763.8 1129.9 problem-1723-156502-pre.txt 6.3 14.4 problem-356-226730-pre.txt 76.3 116.2 problem-257-65132-pre.txt 38.6 52.0 Change-Id: Ie31cc5015f13fa479c16ffb5ce48c9b880990d49
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.