commit | 68cc71ce5d7ce58751343ccf9cb5aec6dcd60f8e | [log] [tgz] |
---|---|---|
author | yangfan <yang0773@gmail.com> | Mon Mar 12 10:57:52 2018 +0800 |
committer | Sameer Agarwal <sameeragarwal@google.com> | Sun Apr 08 14:23:10 2018 +0000 |
tree | ed178a0ccc72eab5c1fc8ed108057b52671fa3c4 | |
parent | f27082a17463803e4deed9eea28075883e58a31c [diff] |
Optimization for custom small blas multiplication with dynamic template parameters in C level. - unroll for loops - matrix access more cache coherent - platform independant Briefly, this commit brings 1~50% performance improvments for most cases in small_blas_gem(m/v)_benchmark, but a small drop for corner cases with small dimensions especially 1,2,3. Here we list the results partially, which show decrease percentage of executing time, compared to unoptimized version. Platform: desktop PC (i7-7700 CPU MP8@3.60GHz + ubuntu 17.10) (Lenovo Research Device+ Lab, <yangfan34@lenovo.com>) Benchmark Time CPU ----------------------------------------------------------- BM_MatrixMatrixMultiplyDynamic/2/2/2 -0.1082 -0.1083 BM_MatrixMatrixMultiplyDynamic/2/2/15 -0.1270 -0.1270 BM_MatrixMatrixMultiplyDynamic/2/4/2 -0.1433 -0.1433 BM_MatrixMatrixMultiplyDynamic/2/4/15 -0.2069 -0.2068 BM_MatrixMatrixMultiplyDynamic/2/6/2 -0.1446 -0.1446 BM_MatrixMatrixMultiplyDynamic/2/6/15 -0.2156 -0.2156 BM_MatrixMatrixMultiplyDynamic/2/8/2 -0.1788 -0.1788 BM_MatrixMatrixMultiplyDynamic/2/8/15 -0.3316 -0.3316 BM_MatrixMatrixMultiplyDynamic/2/10/2 -0.2025 -0.2025 BM_MatrixMatrixMultiplyDynamic/2/10/15 -0.3444 -0.3444 BM_MatrixMatrixMultiplyDynamic/2/12/2 -0.0515 -0.0515 BM_MatrixMatrixMultiplyDynamic/2/12/15 -0.3733 -0.3733 BM_MatrixMatrixMultiplyDynamic/2/15/2 -0.2784 -0.2784 BM_MatrixMatrixMultiplyDynamic/2/15/15 -0.3704 -0.3704 BM_MatrixMatrixMultiplyDynamic/4/2/2 -0.1839 -0.1839 BM_MatrixMatrixMultiplyDynamic/4/2/15 -0.1922 -0.1922 BM_MatrixMatrixMultiplyDynamic/4/4/2 -0.2248 -0.2248 BM_MatrixMatrixMultiplyDynamic/4/4/15 -0.3132 -0.3132 BM_MatrixMatrixMultiplyDynamic/4/6/2 -0.2311 -0.2311 BM_MatrixMatrixMultiplyDynamic/4/6/15 -0.3239 -0.3239 BM_MatrixMatrixMultiplyDynamic/4/8/2 -0.0574 -0.0574 BM_MatrixMatrixMultiplyDynamic/4/8/15 -0.4173 -0.4173 BM_MatrixMatrixMultiplyDynamic/4/10/2 -0.2861 -0.2861 BM_MatrixMatrixMultiplyDynamic/4/10/15 -0.4065 -0.4064 BM_MatrixMatrixMultiplyDynamic/4/12/2 -0.2976 -0.2975 BM_MatrixMatrixMultiplyDynamic/4/12/15 -0.4218 -0.4218 BM_MatrixMatrixMultiplyDynamic/4/15/2 -0.3116 -0.3116 BM_MatrixMatrixMultiplyDynamic/4/15/15 -0.4242 -0.4241 BM_MatrixMatrixMultiplyDynamic/8/12/2 -0.3675 -0.3674 BM_MatrixMatrixMultiplyDynamic/8/12/4 -0.5055 -0.5055 BM_MatrixMatrixMultiplyDynamic/8/12/6 -0.4302 -0.4302 BM_MatrixMatrixMultiplyDynamic/8/12/8 -0.4854 -0.4854 BM_MatrixMatrixMultiplyDynamic/8/12/10 -0.4882 -0.4882 BM_MatrixMatrixMultiplyDynamic/8/12/12 -0.5209 -0.5209 BM_MatrixMatrixMultiplyDynamic/8/12/15 -0.4558 -0.4558 BM_MatrixMatrixMultiplyDynamic/8/15/2 -0.2319 -0.2319 BM_MatrixMatrixMultiplyDynamic/8/15/4 -0.5105 -0.5105 BM_MatrixMatrixMultiplyDynamic/8/15/6 -0.4477 -0.4477 BM_MatrixMatrixMultiplyDynamic/8/15/8 -0.5479 -0.5479 BM_MatrixMatrixMultiplyDynamic/8/15/10 -0.4843 -0.4843 BM_MatrixMatrixMultiplyDynamic/8/15/12 -0.5212 -0.5212 BM_MatrixMatrixMultiplyDynamic/8/15/15 -0.4459 -0.4459 BM_MatrixVectorMultiply/1/1 +0.0978 +0.0978 BM_MatrixVectorMultiply/1/2 +0.0551 +0.0551 BM_MatrixVectorMultiply/1/3 -0.0019 -0.0020 BM_MatrixVectorMultiply/1/4 +0.0563 +0.0562 BM_MatrixVectorMultiply/1/6 +0.1379 +0.1379 BM_MatrixVectorMultiply/1/7 +0.1090 +0.1090 BM_MatrixVectorMultiply/1/12 +0.0901 +0.0901 BM_MatrixVectorMultiply/1/16 +0.0493 +0.0493 BM_MatrixVectorMultiply/1/20 +0.2255 +0.2255 BM_MatrixVectorMultiply/2/1 +0.1261 +0.1261 BM_MatrixVectorMultiply/2/2 +0.2328 +0.2328 BM_MatrixVectorMultiply/2/3 +0.1404 +0.1403 BM_MatrixVectorMultiply/2/4 +0.0257 +0.0256 BM_MatrixVectorMultiply/2/6 -0.1691 -0.1691 BM_MatrixVectorMultiply/2/7 -0.2619 -0.2619 BM_MatrixVectorMultiply/2/12 -0.4261 -0.4261 BM_MatrixVectorMultiply/2/16 -0.5387 -0.5387 BM_MatrixVectorMultiply/2/20 -0.6171 -0.6171 BM_MatrixVectorMultiply/3/1 +0.1664 +0.1664 BM_MatrixVectorMultiply/3/2 +0.0848 +0.0848 BM_MatrixVectorMultiply/3/3 -0.0044 -0.0044 BM_MatrixVectorMultiply/3/4 -0.0683 -0.0684 BM_MatrixVectorMultiply/3/6 -0.1652 -0.1652 BM_MatrixVectorMultiply/3/7 -0.1633 -0.1633 BM_MatrixVectorMultiply/3/12 -0.1921 -0.1921 BM_MatrixVectorMultiply/3/16 -0.3659 -0.3659 BM_MatrixVectorMultiply/3/20 -0.4137 -0.4137 BM_MatrixVectorMultiply/4/1 -0.0577 -0.0577 BM_MatrixVectorMultiply/4/2 -0.1337 -0.1338 BM_MatrixVectorMultiply/4/3 -0.1443 -0.1443 BM_MatrixVectorMultiply/4/4 +0.0013 +0.0013 BM_MatrixVectorMultiply/4/6 -0.1071 -0.1071 BM_MatrixVectorMultiply/4/7 -0.1396 -0.1397 BM_MatrixVectorMultiply/4/12 -0.2792 -0.2792 BM_MatrixVectorMultiply/4/16 -0.4485 -0.4486 BM_MatrixVectorMultiply/4/20 -0.3588 -0.3588 Change-Id: I64a8cf11391e3d06341a2b8764cd1b4f1b8a23f1
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.
Ceres development happens on Gerrit, including both repository hosting and code reviews. The GitHub Repository is a continuously updated mirror which is primarily meant for issue tracking. Please see our Contributing to Ceres Guide for more details.
The upstream Gerrit repository is
https://ceres-solver.googlesource.com/ceres-solver