Optimization for custom small blas multiplication with dynamic
template parameters in C level.

- unroll for loops
- matrix access more cache coherent
- platform independant

Briefly, this commit brings 1~50% performance improvments for
most cases in small_blas_gem(m/v)_benchmark, but a small drop
for corner cases with small dimensions especially 1,2,3. Here
we list the results partially, which show decrease percentage
of executing time, compared to unoptimized version.

Platform: desktop PC (i7-7700 CPU MP8@3.60GHz + ubuntu 17.10)
(Lenovo Research Device+ Lab, <yangfan34@lenovo.com>)

Benchmark                                   Time        CPU
-----------------------------------------------------------
BM_MatrixMatrixMultiplyDynamic/1/1/1     -0.0850    -0.0851
BM_MatrixMatrixMultiplyDynamic/1/1/2     -0.1444    -0.1446
BM_MatrixMatrixMultiplyDynamic/1/1/3     -0.1934    -0.1935
BM_MatrixMatrixMultiplyDynamic/1/1/4     -0.2933    -0.2934
BM_MatrixMatrixMultiplyDynamic/1/1/8     -0.1579    -0.1580
BM_MatrixMatrixMultiplyDynamic/1/1/12    -0.1556    -0.1558
BM_MatrixMatrixMultiplyDynamic/1/1/15    -0.1598    -0.1599
BM_MatrixMatrixMultiplyDynamic/1/2/1     -0.0797    -0.0799
BM_MatrixMatrixMultiplyDynamic/1/2/2     -0.2950    -0.2951
BM_MatrixMatrixMultiplyDynamic/1/2/3     -0.1363    -0.1364
BM_MatrixMatrixMultiplyDynamic/1/2/4     -0.2435    -0.2437
BM_MatrixMatrixMultiplyDynamic/1/2/8     -0.2299    -0.2300
BM_MatrixMatrixMultiplyDynamic/1/2/12    -0.2441    -0.2442
BM_MatrixMatrixMultiplyDynamic/1/2/15    -0.1671    -0.1673
BM_MatrixMatrixMultiplyDynamic/1/3/1     -0.0774    -0.0775
BM_MatrixMatrixMultiplyDynamic/1/3/2     -0.2761    -0.2762
BM_MatrixMatrixMultiplyDynamic/1/3/3     -0.0840    -0.0841
BM_MatrixMatrixMultiplyDynamic/1/3/4     -0.2027    -0.2028
BM_MatrixMatrixMultiplyDynamic/1/3/8     -0.2481    -0.2482
BM_MatrixMatrixMultiplyDynamic/1/3/12    -0.2629    -0.2630
BM_MatrixMatrixMultiplyDynamic/1/3/15    -0.1958    -0.1959
BM_MatrixMatrixMultiplyDynamic/1/4/1     -0.1260    -0.1261
BM_MatrixMatrixMultiplyDynamic/1/4/2     -0.1834    -0.1835
BM_MatrixMatrixMultiplyDynamic/1/4/3     -0.1379    -0.1380
BM_MatrixMatrixMultiplyDynamic/1/4/4     -0.2636    -0.2637
BM_MatrixMatrixMultiplyDynamic/1/4/8     -0.2838    -0.2839
BM_MatrixMatrixMultiplyDynamic/1/4/12    -0.3320    -0.3321
BM_MatrixMatrixMultiplyDynamic/1/4/15    -0.2464    -0.2465
BM_MatrixMatrixMultiplyDynamic/1/8/1     -0.0766    -0.0767
BM_MatrixMatrixMultiplyDynamic/1/8/2     -0.1713    -0.1714
BM_MatrixMatrixMultiplyDynamic/1/8/3     -0.1158    -0.1159
BM_MatrixMatrixMultiplyDynamic/1/8/4     -0.3205    -0.3206
BM_MatrixMatrixMultiplyDynamic/1/8/8     -0.3514    -0.3515
BM_MatrixMatrixMultiplyDynamic/1/8/12    -0.3658    -0.3658
BM_MatrixMatrixMultiplyDynamic/1/8/15    -0.3187    -0.3188
BM_MatrixMatrixMultiplyDynamic/1/12/1    -0.0424    -0.0425
BM_MatrixMatrixMultiplyDynamic/1/12/2    -0.1800    -0.1800
BM_MatrixMatrixMultiplyDynamic/1/12/3    -0.1457    -0.1457
BM_MatrixMatrixMultiplyDynamic/1/12/4    -0.3768    -0.3769
BM_MatrixMatrixMultiplyDynamic/1/12/8    -0.4072    -0.4073
BM_MatrixMatrixMultiplyDynamic/1/12/12   -0.4391    -0.4392
BM_MatrixMatrixMultiplyDynamic/1/12/15   -0.3383    -0.3383
BM_MatrixMatrixMultiplyDynamic/1/15/1    -0.0442    -0.0443
BM_MatrixMatrixMultiplyDynamic/1/15/2    -0.2378    -0.2379
BM_MatrixMatrixMultiplyDynamic/1/15/3    -0.1553    -0.1554
BM_MatrixMatrixMultiplyDynamic/1/15/4    -0.3954    -0.3955
BM_MatrixMatrixMultiplyDynamic/1/15/8    -0.4334    -0.4335
BM_MatrixMatrixMultiplyDynamic/1/15/12   -0.4175    -0.4175
BM_MatrixMatrixMultiplyDynamic/1/15/15   -0.3242    -0.3243

BM_MatrixVectorMultiply/1/1              +0.1613    +0.1613
BM_MatrixVectorMultiply/1/2              +0.1715    +0.1715
BM_MatrixVectorMultiply/1/3              +0.1051    +0.1051
BM_MatrixVectorMultiply/1/4              +0.1369    +0.1369
BM_MatrixVectorMultiply/1/8              +0.1180    +0.1180
BM_MatrixVectorMultiply/1/12             +0.0869    +0.0869
BM_MatrixVectorMultiply/1/15             +0.1887    +0.1886
BM_MatrixVectorMultiply/2/1              +0.1152    +0.1152
BM_MatrixVectorMultiply/2/2              +0.1520    +0.1520
BM_MatrixVectorMultiply/2/3              +0.1867    +0.1867
BM_MatrixVectorMultiply/2/4              +0.0173    +0.0173
BM_MatrixVectorMultiply/2/8              -0.0528    -0.0528
BM_MatrixVectorMultiply/2/12             -0.0176    -0.0176
BM_MatrixVectorMultiply/2/15             -0.0753    -0.0753
BM_MatrixVectorMultiply/3/1              +0.0844    +0.0844
BM_MatrixVectorMultiply/3/2              +0.0750    +0.0750
BM_MatrixVectorMultiply/3/3              -0.0153    -0.0153
BM_MatrixVectorMultiply/3/4              +0.0060    +0.0060
BM_MatrixVectorMultiply/3/8              +0.0152    +0.0152
BM_MatrixVectorMultiply/3/12             +0.0101    +0.0101
BM_MatrixVectorMultiply/3/15             -0.0795    -0.0795
BM_MatrixVectorMultiply/4/1              -0.1425    -0.1425
BM_MatrixVectorMultiply/4/2              -0.0869    -0.0869
BM_MatrixVectorMultiply/4/3              -0.1371    -0.1371
BM_MatrixVectorMultiply/4/4              -0.0088    -0.0088
BM_MatrixVectorMultiply/4/8              -0.1049    -0.1049
BM_MatrixVectorMultiply/4/12             -0.2566    -0.2566
BM_MatrixVectorMultiply/4/15             -0.2940    -0.2940
BM_MatrixVectorMultiply/6/1              -0.1798    -0.1798
BM_MatrixVectorMultiply/6/2              -0.0627    -0.0627
BM_MatrixVectorMultiply/6/3              -0.0389    -0.0389
BM_MatrixVectorMultiply/6/4              -0.1088    -0.1088
BM_MatrixVectorMultiply/6/8              -0.1815    -0.1815
BM_MatrixVectorMultiply/6/12             -0.1650    -0.1650
BM_MatrixVectorMultiply/6/15             -0.1855    -0.1855
BM_MatrixVectorMultiply/8/1              -0.1630    -0.1630
BM_MatrixVectorMultiply/8/2              -0.1248    -0.1248
BM_MatrixVectorMultiply/8/3              -0.1911    -0.1911
BM_MatrixVectorMultiply/8/4              -0.1996    -0.1996
BM_MatrixVectorMultiply/8/8              -0.2590    -0.2590
BM_MatrixVectorMultiply/8/12             -0.3266    -0.3266
BM_MatrixVectorMultiply/8/15             -0.3999    -0.3999
BM_MatrixTransposeVectorMultiply/1/1     -0.0234    -0.0234
BM_MatrixTransposeVectorMultiply/1/2     -0.0243    -0.0243
BM_MatrixTransposeVectorMultiply/1/3     -0.1324    -0.1324
BM_MatrixTransposeVectorMultiply/1/4     -0.2635    -0.2635
BM_MatrixTransposeVectorMultiply/1/8     -0.2461    -0.2461
BM_MatrixTransposeVectorMultiply/1/12    -0.2702    -0.2702
BM_MatrixTransposeVectorMultiply/1/15    -0.2538    -0.2538
BM_MatrixTransposeVectorMultiply/2/1     -0.0170    -0.0170
BM_MatrixTransposeVectorMultiply/2/2     -0.1475    -0.1475
BM_MatrixTransposeVectorMultiply/2/3     -0.1082    -0.1082
BM_MatrixTransposeVectorMultiply/2/4     -0.2594    -0.2595
BM_MatrixTransposeVectorMultiply/2/8     -0.2710    -0.2710
BM_MatrixTransposeVectorMultiply/2/12    -0.3053    -0.3053
BM_MatrixTransposeVectorMultiply/2/15    -0.2706    -0.2706
BM_MatrixTransposeVectorMultiply/3/1     -0.0096    -0.0096
BM_MatrixTransposeVectorMultiply/3/2     -0.2885    -0.2886
BM_MatrixTransposeVectorMultiply/3/3     -0.0790    -0.0790
BM_MatrixTransposeVectorMultiply/3/4     -0.2329    -0.2330
BM_MatrixTransposeVectorMultiply/3/8     -0.2742    -0.2742
BM_MatrixTransposeVectorMultiply/3/12    -0.3177    -0.3177
BM_MatrixTransposeVectorMultiply/3/15    -0.2610    -0.2610
BM_MatrixTransposeVectorMultiply/4/1     -0.0024    -0.0024
BM_MatrixTransposeVectorMultiply/4/2     -0.1578    -0.1578
BM_MatrixTransposeVectorMultiply/4/3     -0.0918    -0.0918
BM_MatrixTransposeVectorMultiply/4/4     -0.2570    -0.2570
BM_MatrixTransposeVectorMultiply/4/8     -0.3064    -0.3064
BM_MatrixTransposeVectorMultiply/4/12    -0.3316    -0.3316
BM_MatrixTransposeVectorMultiply/4/15    -0.2794    -0.2794
BM_MatrixTransposeVectorMultiply/6/1     -0.0484    -0.0484
BM_MatrixTransposeVectorMultiply/6/2     -0.1102    -0.1102
BM_MatrixTransposeVectorMultiply/6/3     -0.1188    -0.1188
BM_MatrixTransposeVectorMultiply/6/4     -0.2967    -0.2967
BM_MatrixTransposeVectorMultiply/6/8     -0.3190    -0.3190
BM_MatrixTransposeVectorMultiply/6/12    -0.3441    -0.3441
BM_MatrixTransposeVectorMultiply/6/15    -0.2723    -0.2723
BM_MatrixTransposeVectorMultiply/8/1     -0.0397    -0.0397
BM_MatrixTransposeVectorMultiply/8/2     -0.1453    -0.1453
BM_MatrixTransposeVectorMultiply/8/3     -0.1337    -0.1337
BM_MatrixTransposeVectorMultiply/8/4     -0.3084    -0.3084
BM_MatrixTransposeVectorMultiply/8/8     -0.3444    -0.3444
BM_MatrixTransposeVectorMultiply/8/12    -0.3717    -0.3717
BM_MatrixTransposeVectorMultiply/8/15    -0.3440    -0.3440

Change-Id: I17de05bf94699a07eea880b92a6d08daf1f038bb
2 files changed
tree: 79fda3b0d51c12a65f7fcec4f56acafe69432753
  1. bazel/
  2. cmake/
  3. config/
  4. data/
  5. docs/
  6. examples/
  7. include/
  8. internal/
  9. jni/
  10. scripts/
  11. .gitignore
  12. BUILD
  13. CMakeLists.txt
  14. LICENSE
  15. package.xml
  16. README.md
  17. WORKSPACE
README.md

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

  1. Non-linear Least Squares problems with bounds constraints.
  2. General unconstrained optimization problems.

Please see ceres-solver.org for more information.

WARNING - Do not make GitHub pull requests!

Ceres development happens on Gerrit, including both repository hosting and code reviews. The GitHub Repository is a continuously updated mirror which is primarily meant for issue tracking. Please see our Contributing to Ceres Guide for more details.

The upstream Gerrit repository is

https://ceres-solver.googlesource.com/ceres-solver