commit: c0c4f93940f86e8d108e84a60104bfda4aad66b3
[log]
author: Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com>
Thu Aug 18 22:10:17 2022 +0300
committer: Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com>
Tue Sep 20 11:06:22 2022 +0300
tree: 2cbbba12cd9d231f1a10926e055d122ba457f416
parent: fc826c578032c19054a27e5bcac626a3ac6883ac [diff]

Change implementation of parallel for

Implemented templated invocation routines for ParallelFor backends
in order to improve loop body inlining.

Several modifications of ParallelFor implementation using CXX threads:
 - Index order changed from interleaved to sequential
 - Static task scheduling replaced with dynamic (controlled by
   kWorkBlocksPerThread)
 - Changed index retrieval to atomic

Modifications of OpenMP backend:
 - Changed loop scheduling to guided

Changing index order from interleaved to sequential in parallel seem
to significantly improve run-times of parallel loops, for example in
evaluation of jacobian and residuals.

Other modifications provide minor improvements for unbalanced
sub-problem lengths and parallel for loops with small number of
computation per operation.

Single-threaded performance was improved by avoiding costs of
wrapping parallel loop bodies in std::function.

On BAL dataset the following improvements in time consumed for
evaluation of residuals or jacobian and residuals were observed:

                                     OLD           NEW        OLD/NEW
                 dataset threads     r     J     r     J     r     J
problem-257-65132-pre.txt      1 0.025  0.079  0.025  0.074 1.016 1.056
problem-257-65132-pre.txt      2 0.030  0.062  0.022  0.050 1.333 1.246
problem-257-65132-pre.txt      4 0.023  0.052  0.014  0.034 1.592 1.515
problem-257-65132-pre.txt      8 0.015  0.035  0.010  0.025 1.477 1.401
problem-257-65132-pre.txt     16 0.011  0.027  0.008  0.019 1.365 1.377
problem-356-226730-pre.txt     1 0.150  0.442  0.147  0.412 1.017 1.070
problem-356-226730-pre.txt     2 0.155  0.322  0.100  0.281 1.542 1.145
problem-356-226730-pre.txt     4 0.129  0.291  0.089  0.196 1.439 1.485
problem-356-226730-pre.txt     8 0.091  0.184  0.066  0.139 1.381 1.319
problem-356-226730-pre.txt    16 0.070  0.148  0.055  0.110 1.272 1.340
problem-1723-156502-pre.txt    1 0.084  0.243  0.082  0.229 1.023 1.063
problem-1723-156502-pre.txt    2 0.088  0.188  0.055  0.154 1.589 1.222
problem-1723-156502-pre.txt    4 0.072  0.159  0.049  0.108 1.475 1.475
problem-1723-156502-pre.txt    8 0.050  0.105  0.037  0.077 1.348 1.368
problem-1723-156502-pre.txt   16 0.038  0.083  0.030  0.062 1.269 1.344
problem-1778-993923-pre.txt    1 0.621  1.777  0.609  1.667 1.018 1.065
problem-1778-993923-pre.txt    2 0.621  1.273  0.415  1.199 1.494 1.061
problem-1778-993923-pre.txt    4 0.514  1.140  0.361  0.786 1.421 1.449
problem-1778-993923-pre.txt    8 0.365  0.808  0.277  0.559 1.319 1.443
problem-1778-993923-pre.txt   16 0.279  0.608  0.223  0.441 1.252 1.379
problem-13682-4456117-pre.txt  1 3.877 10.726  3.738 10.082 1.037 1.063
problem-13682-4456117-pre.txt  2 3.310  7.170  2.423  6.448 1.366 1.111
problem-13682-4456117-pre.txt  4 3.070  6.344  2.064  4.474 1.486 1.417
problem-13682-4456117-pre.txt  8 2.051  4.612  1.527  3.133 1.343 1.472
problem-13682-4456117-pre.txt 16 1.549  3.453  1.218  2.488 1.271 1.387

Run time in seconds for a single evaluation, using evaluation_benchmark
numactl -N 0 -m 0 ./bin/evaluation_benchmark --bal_root ${path_to_BAL}
Evaluation was performed on 28-core CPU.

Note: performance when running across numa-nodes degrades in both old
and proposed implementations, thus the test was executed limiting memory
and compute resources allocation to a single numa-node.

Change-Id: Ia195580bdab9d05c95ac983bfe37b045eecfaf49

6 files changed

tree: 2cbbba12cd9d231f1a10926e055d122ba457f416

README.md

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

Non-linear Least Squares problems with bounds constraints.
General unconstrained optimization problems.

Please see ceres-solver.org for more information.