Increases the performance of the C++11 threading.

Previously, the thread ID was acquired and released on every iteration
of the for loop.  The C++11 concurrent queue implementation is much
slower than TBB's version and consequently this was a huge bottleneck.

This introduces another ParallelFor API which takes the thread ID as a
parameter in the evaluation function.  This allows us to acquire and
release the thread ID for each block of work which drastically improves
the performance.

This change brings us on par with OpenMP and TBB.  See below for a
timing comparison.  Note: in this example this CLs C++11 version is
faster to compute the residuals because TBB still must acquire the
thread ID on every iteration, which has some overhead.

Tested by building and running tests for no threading, OpenMP, TBB, and
C++11 threads.  Also ran bazel tests.

./bin/bundle_adjuster --input=problem-744-543562-pre.txt --num_threads=8

C++11 @Head
Time (in seconds):
  Residual only evaluation           7.819692 (5)
  Jacobian & residual evaluation    11.606063 (6)
  Linear solver                     47.860195 (5)
Minimizer                           70.877072

Total                               90.806338
---------------------------------------------------

C++11 (This CL)
Time (in seconds):
  Residual only evaluation           1.217500 (5)
  Jacobian & residual evaluation     5.796112 (6)
  Linear solver                     44.080873 (5)
Minimizer                           54.635524

Total                               77.640072

---------------------------------------------------
OpenMP
Time (in seconds):
  Residual only evaluation           0.797023 (5)
  Jacobian & residual evaluation     5.633916 (6)
  Linear solver                     43.280020 (5)
Minimizer                           53.199058

Total                               76.250861

---------------------------------------------------
TBB
Time (in seconds):

  Residual only evaluation           1.911095 (5)
  Jacobian & residual evaluation     5.557807 (6)
  Linear solver                     44.074680 (5)
Minimizer                           55.002688

Total                               78.052687

---------------------------------------------------
No Threads

Time (in seconds):

  Residual only evaluation           2.939212 (5)
  Jacobian & residual evaluation    18.519874 (6)
  Linear solver                     74.017837 (5)
Minimizer                           98.980080

Total                              122.216391

Change-Id: I3af959b0771bbdfe8cad8c13896191d6ac903181
9 files changed
tree: db94992836d18367d010b46eef32ca81ee0fd635
  1. bazel/
  2. cmake/
  3. config/
  4. data/
  5. docs/
  6. examples/
  7. include/
  8. internal/
  9. jni/
  10. scripts/
  11. .gitignore
  12. BUILD
  13. CMakeLists.txt
  14. LICENSE
  15. package.xml
  16. README.md
  17. WORKSPACE
README.md

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

  1. Non-linear Least Squares problems with bounds constraints.
  2. General unconstrained optimization problems.

Please see ceres-solver.org for more information.

WARNING - Do not make GitHub pull requests!

Ceres development happens on Gerrit, including both repository hosting and code reviews. The GitHub Repository is a continuously updated mirror which is primarily meant for issue tracking. Please see our Contributing to Ceres Guide for more details.

The upstream Gerrit repository is

https://ceres-solver.googlesource.com/ceres-solver