Change implementation of parallel for

Implemented templated invocation routines for ParallelFor backends
in order to improve loop body inlining.

Several modifications of ParallelFor implementation using CXX threads:
 - Index order changed from interleaved to sequential
 - Static task scheduling replaced with dynamic (controlled by
   kWorkBlocksPerThread)
 - Changed index retrieval to atomic

Modifications of OpenMP backend:
 - Changed loop scheduling to guided

Changing index order from interleaved to sequential in parallel seem
to significantly improve run-times of parallel loops, for example in
evaluation of jacobian and residuals.

Other modifications provide minor improvements for unbalanced
sub-problem lengths and parallel for loops with small number of
computation per operation.

Single-threaded performance was improved by avoiding costs of
wrapping parallel loop bodies in std::function.

On BAL dataset the following improvements in time consumed for
evaluation of residuals or jacobian and residuals were observed:

                                     OLD           NEW        OLD/NEW
                 dataset threads     r     J     r     J     r     J
problem-257-65132-pre.txt      1 0.025  0.079  0.025  0.074 1.016 1.056
problem-257-65132-pre.txt      2 0.030  0.062  0.022  0.050 1.333 1.246
problem-257-65132-pre.txt      4 0.023  0.052  0.014  0.034 1.592 1.515
problem-257-65132-pre.txt      8 0.015  0.035  0.010  0.025 1.477 1.401
problem-257-65132-pre.txt     16 0.011  0.027  0.008  0.019 1.365 1.377
problem-356-226730-pre.txt     1 0.150  0.442  0.147  0.412 1.017 1.070
problem-356-226730-pre.txt     2 0.155  0.322  0.100  0.281 1.542 1.145
problem-356-226730-pre.txt     4 0.129  0.291  0.089  0.196 1.439 1.485
problem-356-226730-pre.txt     8 0.091  0.184  0.066  0.139 1.381 1.319
problem-356-226730-pre.txt    16 0.070  0.148  0.055  0.110 1.272 1.340
problem-1723-156502-pre.txt    1 0.084  0.243  0.082  0.229 1.023 1.063
problem-1723-156502-pre.txt    2 0.088  0.188  0.055  0.154 1.589 1.222
problem-1723-156502-pre.txt    4 0.072  0.159  0.049  0.108 1.475 1.475
problem-1723-156502-pre.txt    8 0.050  0.105  0.037  0.077 1.348 1.368
problem-1723-156502-pre.txt   16 0.038  0.083  0.030  0.062 1.269 1.344
problem-1778-993923-pre.txt    1 0.621  1.777  0.609  1.667 1.018 1.065
problem-1778-993923-pre.txt    2 0.621  1.273  0.415  1.199 1.494 1.061
problem-1778-993923-pre.txt    4 0.514  1.140  0.361  0.786 1.421 1.449
problem-1778-993923-pre.txt    8 0.365  0.808  0.277  0.559 1.319 1.443
problem-1778-993923-pre.txt   16 0.279  0.608  0.223  0.441 1.252 1.379
problem-13682-4456117-pre.txt  1 3.877 10.726  3.738 10.082 1.037 1.063
problem-13682-4456117-pre.txt  2 3.310  7.170  2.423  6.448 1.366 1.111
problem-13682-4456117-pre.txt  4 3.070  6.344  2.064  4.474 1.486 1.417
problem-13682-4456117-pre.txt  8 2.051  4.612  1.527  3.133 1.343 1.472
problem-13682-4456117-pre.txt 16 1.549  3.453  1.218  2.488 1.271 1.387

Run time in seconds for a single evaluation, using evaluation_benchmark
numactl -N 0 -m 0 ./bin/evaluation_benchmark --bal_root ${path_to_BAL}
Evaluation was performed on 28-core CPU.

Note: performance when running across numa-nodes degrades in both old
and proposed implementations, thus the test was executed limiting memory
and compute resources allocation to a single numa-node.

Change-Id: Ia195580bdab9d05c95ac983bfe37b045eecfaf49
6 files changed
tree: 2cbbba12cd9d231f1a10926e055d122ba457f416
  1. .github/
  2. bazel/
  3. cmake/
  4. config/
  5. data/
  6. docs/
  7. examples/
  8. include/
  9. internal/
  10. scripts/
  11. .clang-format
  12. .gitignore
  13. BUILD
  14. CITATION.cff
  15. CMakeLists.txt
  16. CONTRIBUTING.md
  17. LICENSE
  18. package.xml
  19. README.md
  20. WORKSPACE
README.md

Android Linux macOS Windows

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

  1. Non-linear Least Squares problems with bounds constraints.
  2. General unconstrained optimization problems.

Please see ceres-solver.org for more information.