Parallel operations on vectors

Main focus of this change is to parallelize remaining operations (most of them
are operations on vectors) in code-path utilized with iterative Schur
complement.

Parallelization is handled using lazy evaluation of Eigen expressions.

On linux pc with intel 8176 processor parallelization of vector operations has
the following effect:

Running ./bin/parallel_vector_operations_benchmark
Run on (112 X 3200.32 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x56)
  L1 Instruction 32 KiB (x56)
  L2 Unified 1024 KiB (x56)
  L3 Unified 39424 KiB (x2)
Load Average: 3.30, 8.41, 11.82
-----------------------------------
Benchmark                      Time
-----------------------------------
SetZero                 10009532 ns
SetZeroParallel/1       10024139 ns
...
SetZeroParallel/16        877606 ns

Negate                   4978856 ns
NegateParallel/1         5145413 ns
...
NegateParallel/16         721823 ns

Assign                  10731408 ns
AssignParallel/1        10749944 ns
...
AssignParallel/16        1829381 ns

D2X                     15214399 ns
D2XParallel/1           15623245 ns
...
D2XParallel/16           2687060 ns

DivideSqrt               8220050 ns
DivideSqrtParallel/1     9088467 ns
...
DivideSqrtParallel/16     905569 ns

Clamp                    3502010 ns
ClampParallel/1          4507897 ns
...
ClampParallel/16          759576 ns

Norm                     4426782 ns
NormParallel/1           4442805 ns
...
NormParallel/16           430290 ns

Dot                      9023276 ns
DotParallel/1            9031304 ns
...
DotParallel/16           1157267 ns

Axpby                   14608289 ns
AxpbyParallel/1         14570825 ns
...
AxpbyParallel/16         2672220 ns
-----------------------------------

Multi-threading of vector operations in ISC and program evaluation results into
the following improvement:

Running ./bin/evaluation_benchmark
--------------------------------------------------------------------------------------
Benchmark                                                               this   2fd81de
--------------------------------------------------------------------------------------
Residuals<problem-13682-4456117-pre.txt>/1                           4136 ms   4292 ms
Residuals<problem-13682-4456117-pre.txt>/2                           2919 ms   2670 ms
Residuals<problem-13682-4456117-pre.txt>/4                           2065 ms   2198 ms
Residuals<problem-13682-4456117-pre.txt>/8                           1458 ms   1609 ms
Residuals<problem-13682-4456117-pre.txt>/16                          1152 ms   1227 ms

ResidualsAndJacobian<problem-13682-4456117-pre.txt>/1               19759 ms  20084 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/2               10921 ms  10977 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/4                6220 ms   6941 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/8                3490 ms   4398 ms
ResidualsAndJacobian<problem-13682-4456117-pre.txt>/16               2277 ms   3172 ms

Plus<problem-13682-4456117-pre.txt>/1                                 339 ms    322 ms
Plus<problem-13682-4456117-pre.txt>/2                                 220 ms
Plus<problem-13682-4456117-pre.txt>/4                                 128 ms
Plus<problem-13682-4456117-pre.txt>/8                                78.0 ms
Plus<problem-13682-4456117-pre.txt>/16                               49.8 ms

ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/1       2434 ms   2478 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/2       2706 ms   2688 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/4       1430 ms   1548 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/8        742 ms    883 ms
ISCRightMultiplyAndAccumulate<problem-13682-4456117-pre.txt>/16       438 ms    555 ms

ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/1   2438 ms   2481 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/2   2565 ms   2790 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/4   1434 ms   1551 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/8    765 ms    892 ms
ISCRightMultiplyAndAccumulateDiag<problem-13682-4456117-pre.txt>/16   435 ms    559 ms

JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/1           1278 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/2           1555 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/4            833 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/8            459 ms
JacobianSquaredColumnNorm<problem-13682-4456117-pre.txt>/16           250 ms

JacobianScaleColumns<problem-13682-4456117-pre.txt>/1                1468 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/2                1871 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/4                 957 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/8                 528 ms
JacobianScaleColumns<problem-13682-4456117-pre.txt>/16                294 ms

End-to-end improvements with bundle_adjuster invoked with
./bin/bundle_adjuster --num_threads 28 --num_iterations 40 \
                      --linear_solver iterative_schur \
                      --preconditioner jacobi --input
---------------------------------------------
Problem                         this  2fd81de
---------------------------------------------
problem-13682-4456117-pre.txt  508.6    892.7
problem-1778-993923-pre.txt    763.8   1129.9
problem-1723-156502-pre.txt      6.3     14.4
problem-356-226730-pre.txt      76.3    116.2
problem-257-65132-pre.txt       38.6     52.0

Change-Id: Ie31cc5015f13fa479c16ffb5ce48c9b880990d49
31 files changed
tree: 1e88ea96569aa49f06b970d0936e01daf44860d2
  1. .github/
  2. bazel/
  3. cmake/
  4. config/
  5. data/
  6. docs/
  7. examples/
  8. include/
  9. internal/
  10. scripts/
  11. .clang-format
  12. .gitignore
  13. BUILD
  14. CITATION.cff
  15. CMakeLists.txt
  16. CONTRIBUTING.md
  17. LICENSE
  18. package.xml
  19. README.md
  20. WORKSPACE
README.md

Android Linux macOS Windows

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

  1. Non-linear Least Squares problems with bounds constraints.
  2. General unconstrained optimization problems.

Please see ceres-solver.org for more information.