)]}'
{
  "commit": "c0c4f93940f86e8d108e84a60104bfda4aad66b3",
  "tree": "2cbbba12cd9d231f1a10926e055d122ba457f416",
  "parents": [
    "fc826c578032c19054a27e5bcac626a3ac6883ac"
  ],
  "author": {
    "name": "Dmitriy Korchemkin",
    "email": "dmitriy.korchemkin@gmail.com",
    "time": "Thu Aug 18 22:10:17 2022 +0300"
  },
  "committer": {
    "name": "Dmitriy Korchemkin",
    "email": "dmitriy.korchemkin@gmail.com",
    "time": "Tue Sep 20 11:06:22 2022 +0300"
  },
  "message": "Change implementation of parallel for\n\nImplemented templated invocation routines for ParallelFor backends\nin order to improve loop body inlining.\n\nSeveral modifications of ParallelFor implementation using CXX threads:\n - Index order changed from interleaved to sequential\n - Static task scheduling replaced with dynamic (controlled by\n   kWorkBlocksPerThread)\n - Changed index retrieval to atomic\n\nModifications of OpenMP backend:\n - Changed loop scheduling to guided\n\nChanging index order from interleaved to sequential in parallel seem\nto significantly improve run-times of parallel loops, for example in\nevaluation of jacobian and residuals.\n\nOther modifications provide minor improvements for unbalanced\nsub-problem lengths and parallel for loops with small number of\ncomputation per operation.\n\nSingle-threaded performance was improved by avoiding costs of\nwrapping parallel loop bodies in std::function.\n\nOn BAL dataset the following improvements in time consumed for\nevaluation of residuals or jacobian and residuals were observed:\n\n                                     OLD           NEW        OLD/NEW\n                 dataset threads     r     J     r     J     r     J\nproblem-257-65132-pre.txt      1 0.025  0.079  0.025  0.074 1.016 1.056\nproblem-257-65132-pre.txt      2 0.030  0.062  0.022  0.050 1.333 1.246\nproblem-257-65132-pre.txt      4 0.023  0.052  0.014  0.034 1.592 1.515\nproblem-257-65132-pre.txt      8 0.015  0.035  0.010  0.025 1.477 1.401\nproblem-257-65132-pre.txt     16 0.011  0.027  0.008  0.019 1.365 1.377\nproblem-356-226730-pre.txt     1 0.150  0.442  0.147  0.412 1.017 1.070\nproblem-356-226730-pre.txt     2 0.155  0.322  0.100  0.281 1.542 1.145\nproblem-356-226730-pre.txt     4 0.129  0.291  0.089  0.196 1.439 1.485\nproblem-356-226730-pre.txt     8 0.091  0.184  0.066  0.139 1.381 1.319\nproblem-356-226730-pre.txt    16 0.070  0.148  0.055  0.110 1.272 1.340\nproblem-1723-156502-pre.txt    1 0.084  0.243  0.082  0.229 1.023 1.063\nproblem-1723-156502-pre.txt    2 0.088  0.188  0.055  0.154 1.589 1.222\nproblem-1723-156502-pre.txt    4 0.072  0.159  0.049  0.108 1.475 1.475\nproblem-1723-156502-pre.txt    8 0.050  0.105  0.037  0.077 1.348 1.368\nproblem-1723-156502-pre.txt   16 0.038  0.083  0.030  0.062 1.269 1.344\nproblem-1778-993923-pre.txt    1 0.621  1.777  0.609  1.667 1.018 1.065\nproblem-1778-993923-pre.txt    2 0.621  1.273  0.415  1.199 1.494 1.061\nproblem-1778-993923-pre.txt    4 0.514  1.140  0.361  0.786 1.421 1.449\nproblem-1778-993923-pre.txt    8 0.365  0.808  0.277  0.559 1.319 1.443\nproblem-1778-993923-pre.txt   16 0.279  0.608  0.223  0.441 1.252 1.379\nproblem-13682-4456117-pre.txt  1 3.877 10.726  3.738 10.082 1.037 1.063\nproblem-13682-4456117-pre.txt  2 3.310  7.170  2.423  6.448 1.366 1.111\nproblem-13682-4456117-pre.txt  4 3.070  6.344  2.064  4.474 1.486 1.417\nproblem-13682-4456117-pre.txt  8 2.051  4.612  1.527  3.133 1.343 1.472\nproblem-13682-4456117-pre.txt 16 1.549  3.453  1.218  2.488 1.271 1.387\n\nRun time in seconds for a single evaluation, using evaluation_benchmark\nnumactl -N 0 -m 0 ./bin/evaluation_benchmark --bal_root ${path_to_BAL}\nEvaluation was performed on 28-core CPU.\n\nNote: performance when running across numa-nodes degrades in both old\nand proposed implementations, thus the test was executed limiting memory\nand compute resources allocation to a single numa-node.\n\nChange-Id: Ia195580bdab9d05c95ac983bfe37b045eecfaf49\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "e5599cc70449d7dedb4763c39170d7c2a14e8477",
      "old_mode": 33188,
      "old_path": "internal/ceres/parallel_for.h",
      "new_id": "3c3d8874a10d64087f074c9de5c385ed14417d86",
      "new_mode": 33188,
      "new_path": "internal/ceres/parallel_for.h"
    },
    {
      "type": "modify",
      "old_id": "df2f619eadaa3f8ddb363668e9f46bb73a0a546c",
      "old_mode": 33188,
      "old_path": "internal/ceres/parallel_for_cxx.cc",
      "new_id": "13cabf90bd1619447bffbaa9e32b2fb4b27e1de0",
      "new_mode": 33188,
      "new_path": "internal/ceres/parallel_for_cxx.cc"
    },
    {
      "type": "add",
      "old_id": "0000000000000000000000000000000000000000",
      "old_mode": 0,
      "old_path": "/dev/null",
      "new_id": "90edc0774a77ca55bd4e18bd9ccd301a7e930d64",
      "new_mode": 33188,
      "new_path": "internal/ceres/parallel_for_cxx.h"
    },
    {
      "type": "modify",
      "old_id": "1c1871662c8b1ae9afd6473a9ea5e86a25b24786",
      "old_mode": 33188,
      "old_path": "internal/ceres/parallel_for_nothreads.cc",
      "new_id": "8d3611dbc36444b2ef876e73819736a31b67dd2c",
      "new_mode": 33188,
      "new_path": "internal/ceres/parallel_for_nothreads.cc"
    },
    {
      "type": "modify",
      "old_id": "1d44bf9977ab93b34f3e2fea4e37dee1f93ff596",
      "old_mode": 33188,
      "old_path": "internal/ceres/parallel_for_openmp.cc",
      "new_id": "02690f31011f18bd25087fc2df68c1c72db9446c",
      "new_mode": 33188,
      "new_path": "internal/ceres/parallel_for_openmp.cc"
    },
    {
      "type": "add",
      "old_id": "0000000000000000000000000000000000000000",
      "old_mode": 0,
      "old_path": "/dev/null",
      "new_id": "94254c45564efae99d70a4d4384bd26a8c3ef7a9",
      "new_mode": 33188,
      "new_path": "internal/ceres/parallel_for_openmp.h"
    }
  ]
}
