commit | 354002f989e7abdab9a4c17291662ab6e3102283 | [log] [tgz] |
---|---|---|
author | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Fri Oct 06 18:19:12 2023 +0000 |
committer | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Fri Oct 06 20:25:16 2023 +0000 |
tree | 36d2fc04af2b52d1a87a22f0ecad4c66a8bf18a7 | |
parent | a9b3fcff426eb33644413f766fdc785af937d9ed [diff] |
Schedule task in ParallerFor from the previous task As pointed out by several users, introduction of parallel operations on vectors severely impacts solver performance on small problems, with time consumption increasing with the number of threads. The problem is two-fold: - Single-threaded execution is faster than multi-threaded - Overhead of multi-threaded execution increases dramaticaly when number of threads is increased Supposedly, the second problem is due to ParallelInvoke scheduling a task for each thread via ThreadPool. When the time required to perform computations is smaller than costs of scheduling task, runtime becomes linear in num_threads. Moreover, main thread competes with working threads for mutex in ConcurrentQueue. In order to limit scheduling overhead and minimize lock contention, each new task is scheduled from the previous one, if: - Number of scheduled tasks is less than num_threads - At the moment of creating the task not all work has been done Correctness is granted by atomicity of thread_id counter. SchedulerBenchmark mini-benchmark was added to illustrate the issue. Each iteration of parallel loop performs change of a single value. With the previous scheduling strategy, increasing number of threads leads to significant increase of runtime: ----------------------------------------------------- Benchmark Time Iterations ----------------------------------------------------- SchedulerBenchmark/128/1 14.1 ns 49496153 SchedulerBenchmark/128/2 3965 ns 240173 SchedulerBenchmark/128/4 13162 ns 71478 SchedulerBenchmark/128/8 30643 ns 29614 SchedulerBenchmark/128/16 63694 ns 10000 SchedulerBenchmark/256/1 24.1 ns 28943598 SchedulerBenchmark/256/2 3878 ns 227498 SchedulerBenchmark/256/4 13293 ns 69817 SchedulerBenchmark/256/8 31117 ns 32640 SchedulerBenchmark/256/16 59503 ns 14910 SchedulerBenchmark/1024/1 56.7 ns 12048398 SchedulerBenchmark/1024/2 4346 ns 203140 SchedulerBenchmark/1024/4 13487 ns 66736 SchedulerBenchmark/1024/8 30982 ns 33090 SchedulerBenchmark/1024/16 63199 ns 14762 SchedulerBenchmark/4096/1 189 ns 3633540 SchedulerBenchmark/4096/2 5932 ns 131884 SchedulerBenchmark/4096/4 14784 ns 61236 SchedulerBenchmark/4096/8 35857 ns 29276 SchedulerBenchmark/4096/16 63934 ns 10000 With new scheduling strategy, increasing requested number of threads does not result in that high increase of runtime ----------------------------------------------------- Benchmark Time Iterations ----------------------------------------------------- SchedulerBenchmark/128/1 14.1 ns 49323498 SchedulerBenchmark/128/2 2411 ns 362916 SchedulerBenchmark/128/4 3556 ns 243026 SchedulerBenchmark/128/8 4346 ns 200626 SchedulerBenchmark/128/16 5066 ns 169698 SchedulerBenchmark/256/1 24.2 ns 28960018 SchedulerBenchmark/256/2 2330 ns 388470 SchedulerBenchmark/256/4 3864 ns 219233 SchedulerBenchmark/256/8 4399 ns 195225 SchedulerBenchmark/256/16 5111 ns 161858 SchedulerBenchmark/1024/1 55.9 ns 12204777 SchedulerBenchmark/1024/2 2541 ns 329807 SchedulerBenchmark/1024/4 3977 ns 222628 SchedulerBenchmark/1024/8 4607 ns 193548 SchedulerBenchmark/1024/16 5031 ns 160285 SchedulerBenchmark/4096/1 188 ns 3714433 SchedulerBenchmark/4096/2 4203 ns 188284 SchedulerBenchmark/4096/4 4832 ns 171811 SchedulerBenchmark/4096/8 5605 ns 159093 SchedulerBenchmark/4096/16 6425 ns 126861 (both runs were executed on 28-core 56-thread cpu) Change-Id: I91eca783280598997bfe6abd28019847731692e4
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.