commit | 81f413b7205eeaa30cfe72dadf3bcb8812f5a11c | [log] [tgz] |
---|---|---|
author | yangfan <yang0773@gmail.com> | Tue Apr 10 14:40:09 2018 +0800 |
committer | yangfan <yang0773@gmail.com> | Tue Apr 10 14:42:23 2018 +0800 |
tree | 79fda3b0d51c12a65f7fcec4f56acafe69432753 | |
parent | bdda32bb169f58496d1238323842ddc46415cb6e [diff] |
Optimization for custom small blas multiplication with dynamic template parameters in C level. - unroll for loops - matrix access more cache coherent - platform independant Briefly, this commit brings 1~50% performance improvments for most cases in small_blas_gem(m/v)_benchmark, but a small drop for corner cases with small dimensions especially 1,2,3. Here we list the results partially, which show decrease percentage of executing time, compared to unoptimized version. Platform: desktop PC (i7-7700 CPU MP8@3.60GHz + ubuntu 17.10) (Lenovo Research Device+ Lab, <yangfan34@lenovo.com>) Benchmark Time CPU ----------------------------------------------------------- BM_MatrixMatrixMultiplyDynamic/1/1/1 -0.0850 -0.0851 BM_MatrixMatrixMultiplyDynamic/1/1/2 -0.1444 -0.1446 BM_MatrixMatrixMultiplyDynamic/1/1/3 -0.1934 -0.1935 BM_MatrixMatrixMultiplyDynamic/1/1/4 -0.2933 -0.2934 BM_MatrixMatrixMultiplyDynamic/1/1/8 -0.1579 -0.1580 BM_MatrixMatrixMultiplyDynamic/1/1/12 -0.1556 -0.1558 BM_MatrixMatrixMultiplyDynamic/1/1/15 -0.1598 -0.1599 BM_MatrixMatrixMultiplyDynamic/1/2/1 -0.0797 -0.0799 BM_MatrixMatrixMultiplyDynamic/1/2/2 -0.2950 -0.2951 BM_MatrixMatrixMultiplyDynamic/1/2/3 -0.1363 -0.1364 BM_MatrixMatrixMultiplyDynamic/1/2/4 -0.2435 -0.2437 BM_MatrixMatrixMultiplyDynamic/1/2/8 -0.2299 -0.2300 BM_MatrixMatrixMultiplyDynamic/1/2/12 -0.2441 -0.2442 BM_MatrixMatrixMultiplyDynamic/1/2/15 -0.1671 -0.1673 BM_MatrixMatrixMultiplyDynamic/1/3/1 -0.0774 -0.0775 BM_MatrixMatrixMultiplyDynamic/1/3/2 -0.2761 -0.2762 BM_MatrixMatrixMultiplyDynamic/1/3/3 -0.0840 -0.0841 BM_MatrixMatrixMultiplyDynamic/1/3/4 -0.2027 -0.2028 BM_MatrixMatrixMultiplyDynamic/1/3/8 -0.2481 -0.2482 BM_MatrixMatrixMultiplyDynamic/1/3/12 -0.2629 -0.2630 BM_MatrixMatrixMultiplyDynamic/1/3/15 -0.1958 -0.1959 BM_MatrixMatrixMultiplyDynamic/1/4/1 -0.1260 -0.1261 BM_MatrixMatrixMultiplyDynamic/1/4/2 -0.1834 -0.1835 BM_MatrixMatrixMultiplyDynamic/1/4/3 -0.1379 -0.1380 BM_MatrixMatrixMultiplyDynamic/1/4/4 -0.2636 -0.2637 BM_MatrixMatrixMultiplyDynamic/1/4/8 -0.2838 -0.2839 BM_MatrixMatrixMultiplyDynamic/1/4/12 -0.3320 -0.3321 BM_MatrixMatrixMultiplyDynamic/1/4/15 -0.2464 -0.2465 BM_MatrixMatrixMultiplyDynamic/1/8/1 -0.0766 -0.0767 BM_MatrixMatrixMultiplyDynamic/1/8/2 -0.1713 -0.1714 BM_MatrixMatrixMultiplyDynamic/1/8/3 -0.1158 -0.1159 BM_MatrixMatrixMultiplyDynamic/1/8/4 -0.3205 -0.3206 BM_MatrixMatrixMultiplyDynamic/1/8/8 -0.3514 -0.3515 BM_MatrixMatrixMultiplyDynamic/1/8/12 -0.3658 -0.3658 BM_MatrixMatrixMultiplyDynamic/1/8/15 -0.3187 -0.3188 BM_MatrixMatrixMultiplyDynamic/1/12/1 -0.0424 -0.0425 BM_MatrixMatrixMultiplyDynamic/1/12/2 -0.1800 -0.1800 BM_MatrixMatrixMultiplyDynamic/1/12/3 -0.1457 -0.1457 BM_MatrixMatrixMultiplyDynamic/1/12/4 -0.3768 -0.3769 BM_MatrixMatrixMultiplyDynamic/1/12/8 -0.4072 -0.4073 BM_MatrixMatrixMultiplyDynamic/1/12/12 -0.4391 -0.4392 BM_MatrixMatrixMultiplyDynamic/1/12/15 -0.3383 -0.3383 BM_MatrixMatrixMultiplyDynamic/1/15/1 -0.0442 -0.0443 BM_MatrixMatrixMultiplyDynamic/1/15/2 -0.2378 -0.2379 BM_MatrixMatrixMultiplyDynamic/1/15/3 -0.1553 -0.1554 BM_MatrixMatrixMultiplyDynamic/1/15/4 -0.3954 -0.3955 BM_MatrixMatrixMultiplyDynamic/1/15/8 -0.4334 -0.4335 BM_MatrixMatrixMultiplyDynamic/1/15/12 -0.4175 -0.4175 BM_MatrixMatrixMultiplyDynamic/1/15/15 -0.3242 -0.3243 BM_MatrixVectorMultiply/1/1 +0.1613 +0.1613 BM_MatrixVectorMultiply/1/2 +0.1715 +0.1715 BM_MatrixVectorMultiply/1/3 +0.1051 +0.1051 BM_MatrixVectorMultiply/1/4 +0.1369 +0.1369 BM_MatrixVectorMultiply/1/8 +0.1180 +0.1180 BM_MatrixVectorMultiply/1/12 +0.0869 +0.0869 BM_MatrixVectorMultiply/1/15 +0.1887 +0.1886 BM_MatrixVectorMultiply/2/1 +0.1152 +0.1152 BM_MatrixVectorMultiply/2/2 +0.1520 +0.1520 BM_MatrixVectorMultiply/2/3 +0.1867 +0.1867 BM_MatrixVectorMultiply/2/4 +0.0173 +0.0173 BM_MatrixVectorMultiply/2/8 -0.0528 -0.0528 BM_MatrixVectorMultiply/2/12 -0.0176 -0.0176 BM_MatrixVectorMultiply/2/15 -0.0753 -0.0753 BM_MatrixVectorMultiply/3/1 +0.0844 +0.0844 BM_MatrixVectorMultiply/3/2 +0.0750 +0.0750 BM_MatrixVectorMultiply/3/3 -0.0153 -0.0153 BM_MatrixVectorMultiply/3/4 +0.0060 +0.0060 BM_MatrixVectorMultiply/3/8 +0.0152 +0.0152 BM_MatrixVectorMultiply/3/12 +0.0101 +0.0101 BM_MatrixVectorMultiply/3/15 -0.0795 -0.0795 BM_MatrixVectorMultiply/4/1 -0.1425 -0.1425 BM_MatrixVectorMultiply/4/2 -0.0869 -0.0869 BM_MatrixVectorMultiply/4/3 -0.1371 -0.1371 BM_MatrixVectorMultiply/4/4 -0.0088 -0.0088 BM_MatrixVectorMultiply/4/8 -0.1049 -0.1049 BM_MatrixVectorMultiply/4/12 -0.2566 -0.2566 BM_MatrixVectorMultiply/4/15 -0.2940 -0.2940 BM_MatrixVectorMultiply/6/1 -0.1798 -0.1798 BM_MatrixVectorMultiply/6/2 -0.0627 -0.0627 BM_MatrixVectorMultiply/6/3 -0.0389 -0.0389 BM_MatrixVectorMultiply/6/4 -0.1088 -0.1088 BM_MatrixVectorMultiply/6/8 -0.1815 -0.1815 BM_MatrixVectorMultiply/6/12 -0.1650 -0.1650 BM_MatrixVectorMultiply/6/15 -0.1855 -0.1855 BM_MatrixVectorMultiply/8/1 -0.1630 -0.1630 BM_MatrixVectorMultiply/8/2 -0.1248 -0.1248 BM_MatrixVectorMultiply/8/3 -0.1911 -0.1911 BM_MatrixVectorMultiply/8/4 -0.1996 -0.1996 BM_MatrixVectorMultiply/8/8 -0.2590 -0.2590 BM_MatrixVectorMultiply/8/12 -0.3266 -0.3266 BM_MatrixVectorMultiply/8/15 -0.3999 -0.3999 BM_MatrixTransposeVectorMultiply/1/1 -0.0234 -0.0234 BM_MatrixTransposeVectorMultiply/1/2 -0.0243 -0.0243 BM_MatrixTransposeVectorMultiply/1/3 -0.1324 -0.1324 BM_MatrixTransposeVectorMultiply/1/4 -0.2635 -0.2635 BM_MatrixTransposeVectorMultiply/1/8 -0.2461 -0.2461 BM_MatrixTransposeVectorMultiply/1/12 -0.2702 -0.2702 BM_MatrixTransposeVectorMultiply/1/15 -0.2538 -0.2538 BM_MatrixTransposeVectorMultiply/2/1 -0.0170 -0.0170 BM_MatrixTransposeVectorMultiply/2/2 -0.1475 -0.1475 BM_MatrixTransposeVectorMultiply/2/3 -0.1082 -0.1082 BM_MatrixTransposeVectorMultiply/2/4 -0.2594 -0.2595 BM_MatrixTransposeVectorMultiply/2/8 -0.2710 -0.2710 BM_MatrixTransposeVectorMultiply/2/12 -0.3053 -0.3053 BM_MatrixTransposeVectorMultiply/2/15 -0.2706 -0.2706 BM_MatrixTransposeVectorMultiply/3/1 -0.0096 -0.0096 BM_MatrixTransposeVectorMultiply/3/2 -0.2885 -0.2886 BM_MatrixTransposeVectorMultiply/3/3 -0.0790 -0.0790 BM_MatrixTransposeVectorMultiply/3/4 -0.2329 -0.2330 BM_MatrixTransposeVectorMultiply/3/8 -0.2742 -0.2742 BM_MatrixTransposeVectorMultiply/3/12 -0.3177 -0.3177 BM_MatrixTransposeVectorMultiply/3/15 -0.2610 -0.2610 BM_MatrixTransposeVectorMultiply/4/1 -0.0024 -0.0024 BM_MatrixTransposeVectorMultiply/4/2 -0.1578 -0.1578 BM_MatrixTransposeVectorMultiply/4/3 -0.0918 -0.0918 BM_MatrixTransposeVectorMultiply/4/4 -0.2570 -0.2570 BM_MatrixTransposeVectorMultiply/4/8 -0.3064 -0.3064 BM_MatrixTransposeVectorMultiply/4/12 -0.3316 -0.3316 BM_MatrixTransposeVectorMultiply/4/15 -0.2794 -0.2794 BM_MatrixTransposeVectorMultiply/6/1 -0.0484 -0.0484 BM_MatrixTransposeVectorMultiply/6/2 -0.1102 -0.1102 BM_MatrixTransposeVectorMultiply/6/3 -0.1188 -0.1188 BM_MatrixTransposeVectorMultiply/6/4 -0.2967 -0.2967 BM_MatrixTransposeVectorMultiply/6/8 -0.3190 -0.3190 BM_MatrixTransposeVectorMultiply/6/12 -0.3441 -0.3441 BM_MatrixTransposeVectorMultiply/6/15 -0.2723 -0.2723 BM_MatrixTransposeVectorMultiply/8/1 -0.0397 -0.0397 BM_MatrixTransposeVectorMultiply/8/2 -0.1453 -0.1453 BM_MatrixTransposeVectorMultiply/8/3 -0.1337 -0.1337 BM_MatrixTransposeVectorMultiply/8/4 -0.3084 -0.3084 BM_MatrixTransposeVectorMultiply/8/8 -0.3444 -0.3444 BM_MatrixTransposeVectorMultiply/8/12 -0.3717 -0.3717 BM_MatrixTransposeVectorMultiply/8/15 -0.3440 -0.3440 Change-Id: I17de05bf94699a07eea880b92a6d08daf1f038bb
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.
Ceres development happens on Gerrit, including both repository hosting and code reviews. The GitHub Repository is a continuously updated mirror which is primarily meant for issue tracking. Please see our Contributing to Ceres Guide for more details.
The upstream Gerrit repository is
https://ceres-solver.googlesource.com/ceres-solver