Optimization for custom small blas multiplication with dynamic
template parameters in C level.

- unroll for loops
- matrix access more cache coherent
- platform independant

Briefly, this commit brings 1~50% performance improvments for
most cases in small_blas_gem(m/v)_benchmark, but a small drop
for corner cases with small dimensions especially 1,2,3. Here
we list the results partially, which show decrease percentage
of executing time, compared to unoptimized version.

Platform: desktop PC (i7-7700 CPU MP8@3.60GHz + ubuntu 17.10)
(Lenovo Research Device+ Lab, <yangfan34@lenovo.com>)

Benchmark                                   Time        CPU
-----------------------------------------------------------
BM_MatrixMatrixMultiplyDynamic/2/2/2     -0.1082    -0.1083
BM_MatrixMatrixMultiplyDynamic/2/2/15    -0.1270    -0.1270
BM_MatrixMatrixMultiplyDynamic/2/4/2     -0.1433    -0.1433
BM_MatrixMatrixMultiplyDynamic/2/4/15    -0.2069    -0.2068
BM_MatrixMatrixMultiplyDynamic/2/6/2     -0.1446    -0.1446
BM_MatrixMatrixMultiplyDynamic/2/6/15    -0.2156    -0.2156
BM_MatrixMatrixMultiplyDynamic/2/8/2     -0.1788    -0.1788
BM_MatrixMatrixMultiplyDynamic/2/8/15    -0.3316    -0.3316
BM_MatrixMatrixMultiplyDynamic/2/10/2    -0.2025    -0.2025
BM_MatrixMatrixMultiplyDynamic/2/10/15   -0.3444    -0.3444
BM_MatrixMatrixMultiplyDynamic/2/12/2    -0.0515    -0.0515
BM_MatrixMatrixMultiplyDynamic/2/12/15   -0.3733    -0.3733
BM_MatrixMatrixMultiplyDynamic/2/15/2    -0.2784    -0.2784
BM_MatrixMatrixMultiplyDynamic/2/15/15   -0.3704    -0.3704
BM_MatrixMatrixMultiplyDynamic/4/2/2     -0.1839    -0.1839
BM_MatrixMatrixMultiplyDynamic/4/2/15    -0.1922    -0.1922
BM_MatrixMatrixMultiplyDynamic/4/4/2     -0.2248    -0.2248
BM_MatrixMatrixMultiplyDynamic/4/4/15    -0.3132    -0.3132
BM_MatrixMatrixMultiplyDynamic/4/6/2     -0.2311    -0.2311
BM_MatrixMatrixMultiplyDynamic/4/6/15    -0.3239    -0.3239
BM_MatrixMatrixMultiplyDynamic/4/8/2     -0.0574    -0.0574
BM_MatrixMatrixMultiplyDynamic/4/8/15    -0.4173    -0.4173
BM_MatrixMatrixMultiplyDynamic/4/10/2    -0.2861    -0.2861
BM_MatrixMatrixMultiplyDynamic/4/10/15   -0.4065    -0.4064
BM_MatrixMatrixMultiplyDynamic/4/12/2    -0.2976    -0.2975
BM_MatrixMatrixMultiplyDynamic/4/12/15   -0.4218    -0.4218
BM_MatrixMatrixMultiplyDynamic/4/15/2    -0.3116    -0.3116
BM_MatrixMatrixMultiplyDynamic/4/15/15   -0.4242    -0.4241
BM_MatrixMatrixMultiplyDynamic/8/12/2    -0.3675    -0.3674
BM_MatrixMatrixMultiplyDynamic/8/12/4    -0.5055    -0.5055
BM_MatrixMatrixMultiplyDynamic/8/12/6    -0.4302    -0.4302
BM_MatrixMatrixMultiplyDynamic/8/12/8    -0.4854    -0.4854
BM_MatrixMatrixMultiplyDynamic/8/12/10   -0.4882    -0.4882
BM_MatrixMatrixMultiplyDynamic/8/12/12   -0.5209    -0.5209
BM_MatrixMatrixMultiplyDynamic/8/12/15   -0.4558    -0.4558
BM_MatrixMatrixMultiplyDynamic/8/15/2    -0.2319    -0.2319
BM_MatrixMatrixMultiplyDynamic/8/15/4    -0.5105    -0.5105
BM_MatrixMatrixMultiplyDynamic/8/15/6    -0.4477    -0.4477
BM_MatrixMatrixMultiplyDynamic/8/15/8    -0.5479    -0.5479
BM_MatrixMatrixMultiplyDynamic/8/15/10   -0.4843    -0.4843
BM_MatrixMatrixMultiplyDynamic/8/15/12   -0.5212    -0.5212
BM_MatrixMatrixMultiplyDynamic/8/15/15   -0.4459    -0.4459

BM_MatrixVectorMultiply/1/1              +0.0978    +0.0978
BM_MatrixVectorMultiply/1/2              +0.0551    +0.0551
BM_MatrixVectorMultiply/1/3              -0.0019    -0.0020
BM_MatrixVectorMultiply/1/4              +0.0563    +0.0562
BM_MatrixVectorMultiply/1/6              +0.1379    +0.1379
BM_MatrixVectorMultiply/1/7              +0.1090    +0.1090
BM_MatrixVectorMultiply/1/12             +0.0901    +0.0901
BM_MatrixVectorMultiply/1/16             +0.0493    +0.0493
BM_MatrixVectorMultiply/1/20             +0.2255    +0.2255
BM_MatrixVectorMultiply/2/1              +0.1261    +0.1261
BM_MatrixVectorMultiply/2/2              +0.2328    +0.2328
BM_MatrixVectorMultiply/2/3              +0.1404    +0.1403
BM_MatrixVectorMultiply/2/4              +0.0257    +0.0256
BM_MatrixVectorMultiply/2/6              -0.1691    -0.1691
BM_MatrixVectorMultiply/2/7              -0.2619    -0.2619
BM_MatrixVectorMultiply/2/12             -0.4261    -0.4261
BM_MatrixVectorMultiply/2/16             -0.5387    -0.5387
BM_MatrixVectorMultiply/2/20             -0.6171    -0.6171
BM_MatrixVectorMultiply/3/1              +0.1664    +0.1664
BM_MatrixVectorMultiply/3/2              +0.0848    +0.0848
BM_MatrixVectorMultiply/3/3              -0.0044    -0.0044
BM_MatrixVectorMultiply/3/4              -0.0683    -0.0684
BM_MatrixVectorMultiply/3/6              -0.1652    -0.1652
BM_MatrixVectorMultiply/3/7              -0.1633    -0.1633
BM_MatrixVectorMultiply/3/12             -0.1921    -0.1921
BM_MatrixVectorMultiply/3/16             -0.3659    -0.3659
BM_MatrixVectorMultiply/3/20             -0.4137    -0.4137
BM_MatrixVectorMultiply/4/1              -0.0577    -0.0577
BM_MatrixVectorMultiply/4/2              -0.1337    -0.1338
BM_MatrixVectorMultiply/4/3              -0.1443    -0.1443
BM_MatrixVectorMultiply/4/4              +0.0013    +0.0013
BM_MatrixVectorMultiply/4/6              -0.1071    -0.1071
BM_MatrixVectorMultiply/4/7              -0.1396    -0.1397
BM_MatrixVectorMultiply/4/12             -0.2792    -0.2792
BM_MatrixVectorMultiply/4/16             -0.4485    -0.4486
BM_MatrixVectorMultiply/4/20             -0.3588    -0.3588

Change-Id: I64a8cf11391e3d06341a2b8764cd1b4f1b8a23f1
2 files changed
tree: ed178a0ccc72eab5c1fc8ed108057b52671fa3c4
  1. bazel/
  2. cmake/
  3. config/
  4. data/
  5. docs/
  6. examples/
  7. include/
  8. internal/
  9. jni/
  10. scripts/
  11. .gitignore
  12. BUILD
  13. CMakeLists.txt
  14. LICENSE
  15. package.xml
  16. README.md
  17. WORKSPACE
README.md

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

  1. Non-linear Least Squares problems with bounds constraints.
  2. General unconstrained optimization problems.

Please see ceres-solver.org for more information.

WARNING - Do not make GitHub pull requests!

Ceres development happens on Gerrit, including both repository hosting and code reviews. The GitHub Repository is a continuously updated mirror which is primarily meant for issue tracking. Please see our Contributing to Ceres Guide for more details.

The upstream Gerrit repository is

https://ceres-solver.googlesource.com/ceres-solver