commit | 6c835f81a7c4315518c0ed766b1eef511335bf0b | [log] [tgz] |
---|---|---|
author | Sameer Agarwal <sameeragarwal@google.com> | Sun Feb 25 13:36:33 2018 -0800 |
committer | Sameer Agarwal <sameeragarwal@google.com> | Sun Feb 25 13:44:22 2018 -0800 |
tree | c7398b8f2bd27ae334eaeddbce3caa6901d009a0 | |
parent | 72cb66cb8f9ffa8d482d00a5dfe4784d7978fe2e [diff] |
Improve the performance of MatrixTransposeVector Multiply. This is done by making the matrix access more cache coherent. For small matrices there isn't much difference, but for larger matrices like 4x20, this leads to ~50% performance improvement. Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------- BM_MatrixVectorMultiply/1/1 +0.0138 +0.0046 10 10 10 10 BM_MatrixVectorMultiply/1/2 +0.0215 +0.0146 10 11 10 10 BM_MatrixVectorMultiply/1/3 +0.0469 +0.0422 10 10 10 10 BM_MatrixVectorMultiply/1/4 +0.0696 +0.0671 10 11 10 11 BM_MatrixVectorMultiply/1/6 +0.0827 +0.0795 12 13 12 13 BM_MatrixVectorMultiply/1/7 +0.1408 +0.1152 13 15 13 14 BM_MatrixVectorMultiply/1/12 +0.0317 +0.0232 17 18 17 17 BM_MatrixVectorMultiply/1/16 -0.0362 -0.0168 21 20 20 20 BM_MatrixVectorMultiply/1/20 +0.0304 +0.0290 22 22 22 22 BM_MatrixVectorMultiply/2/1 +0.0019 +0.0009 10 10 10 10 BM_MatrixVectorMultiply/2/2 +0.0285 +0.0291 11 12 11 12 BM_MatrixVectorMultiply/2/3 -0.1178 -0.0977 16 14 15 14 BM_MatrixVectorMultiply/2/4 +0.0938 +0.1144 15 17 15 17 BM_MatrixVectorMultiply/2/6 +0.0617 +0.0629 17 18 17 18 BM_MatrixVectorMultiply/2/7 +0.0956 +0.0876 20 22 19 21 BM_MatrixVectorMultiply/2/12 +0.0043 +0.0119 24 24 24 24 BM_MatrixVectorMultiply/2/16 +0.0361 +0.0337 29 30 29 30 BM_MatrixVectorMultiply/2/20 +0.0463 +0.0365 33 35 33 34 BM_MatrixVectorMultiply/3/1 -0.0201 -0.0210 12 12 12 11 BM_MatrixVectorMultiply/3/2 -0.0741 -0.0766 16 15 16 15 BM_MatrixVectorMultiply/3/3 -0.0076 -0.0118 18 18 18 18 BM_MatrixVectorMultiply/3/4 +0.1071 +0.0963 19 21 19 21 BM_MatrixVectorMultiply/3/6 +0.0449 +0.0390 23 24 23 23 BM_MatrixVectorMultiply/3/7 +0.1099 +0.1018 25 28 24 27 BM_MatrixVectorMultiply/3/12 +0.1512 +0.0992 33 38 32 35 BM_MatrixVectorMultiply/3/16 +0.2281 +0.2005 37 46 37 44 BM_MatrixVectorMultiply/3/20 +0.2387 +0.1799 49 61 48 57 BM_MatrixVectorMultiply/4/1 +0.4444 +0.2574 14 21 14 18 BM_MatrixVectorMultiply/4/2 +0.0313 +0.0230 19 20 19 20 BM_MatrixVectorMultiply/4/3 +0.0626 +0.0596 23 24 23 24 BM_MatrixVectorMultiply/4/4 +0.2322 +0.1440 23 28 23 26 BM_MatrixVectorMultiply/4/6 +0.0936 +0.0768 26 29 26 28 BM_MatrixVectorMultiply/4/7 +0.0848 +0.0835 28 30 28 30 BM_MatrixVectorMultiply/4/12 +0.1607 +0.1101 39 46 39 43 BM_MatrixVectorMultiply/4/16 +0.0752 +0.0687 48 52 48 51 BM_MatrixVectorMultiply/4/20 +0.1782 +0.1463 61 72 60 69 BM_MatrixTransposeVectorMultiply/1/1 +0.3609 +0.2857 9 13 9 12 BM_MatrixTransposeVectorMultiply/1/2 +0.3106 +0.2970 10 13 10 12 BM_MatrixTransposeVectorMultiply/1/3 +0.3018 +0.2383 11 14 11 13 BM_MatrixTransposeVectorMultiply/1/4 -0.0795 -0.0819 14 13 14 12 BM_MatrixTransposeVectorMultiply/1/6 -0.0108 -0.0629 18 18 18 16 BM_MatrixTransposeVectorMultiply/1/7 -0.1073 -0.0879 20 18 19 17 BM_MatrixTransposeVectorMultiply/1/12 -0.3035 -0.3016 26 18 26 18 BM_MatrixTransposeVectorMultiply/1/16 -0.4909 -0.4951 39 20 38 19 BM_MatrixTransposeVectorMultiply/1/20 -0.4619 -0.4985 43 23 42 21 BM_MatrixTransposeVectorMultiply/2/1 +0.3471 +0.2906 10 13 10 13 BM_MatrixTransposeVectorMultiply/2/2 +0.2323 +0.2266 12 15 12 15 BM_MatrixTransposeVectorMultiply/2/3 +0.0802 +0.0779 16 17 16 17 BM_MatrixTransposeVectorMultiply/2/4 -0.0951 -0.0983 19 17 19 17 BM_MatrixTransposeVectorMultiply/2/6 -0.0974 -0.1064 24 21 24 21 BM_MatrixTransposeVectorMultiply/2/7 +0.0612 -0.0457 27 29 27 26 BM_MatrixTransposeVectorMultiply/2/12 -0.3777 -0.3838 41 25 41 25 BM_MatrixTransposeVectorMultiply/2/16 -0.4783 -0.4843 53 28 53 27 BM_MatrixTransposeVectorMultiply/2/20 -0.5567 -0.5566 71 32 70 31 BM_MatrixTransposeVectorMultiply/3/1 +0.4607 +0.4753 10 15 10 15 BM_MatrixTransposeVectorMultiply/3/2 +0.2870 +0.2754 14 19 14 18 BM_MatrixTransposeVectorMultiply/3/3 +0.1270 +0.1245 19 21 19 21 BM_MatrixTransposeVectorMultiply/3/4 +0.0160 +0.0076 22 22 22 22 BM_MatrixTransposeVectorMultiply/3/6 -0.0612 -0.0635 27 26 27 25 BM_MatrixTransposeVectorMultiply/3/7 -0.0531 -0.0695 31 29 30 28 BM_MatrixTransposeVectorMultiply/3/12 -0.4009 -0.3938 49 29 47 29 BM_MatrixTransposeVectorMultiply/3/16 -0.4584 -0.4537 64 35 62 34 BM_MatrixTransposeVectorMultiply/3/20 -0.5087 -0.5098 78 38 77 38 BM_MatrixTransposeVectorMultiply/4/1 +0.6696 +0.6837 11 18 11 18 BM_MatrixTransposeVectorMultiply/4/2 +0.3085 +0.3085 17 22 17 22 BM_MatrixTransposeVectorMultiply/4/3 +0.2908 +0.2821 21 26 20 26 BM_MatrixTransposeVectorMultiply/4/4 +0.0076 -0.0031 24 25 24 24 BM_MatrixTransposeVectorMultiply/4/6 -0.0884 -0.0841 34 31 34 31 BM_MatrixTransposeVectorMultiply/4/7 -0.0834 -0.0825 37 34 36 33 BM_MatrixTransposeVectorMultiply/4/12 -0.4477 -0.4453 62 34 61 34 BM_MatrixTransposeVectorMultiply/4/16 -0.5324 -0.5203 86 40 83 40 BM_MatrixTransposeVectorMultiply/4/20 -0.4905 -0.4933 99 50 98 50 Change-Id: I7f2a1c986e4a345bb67cb9eb0235234573024889
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.
Ceres development happens on Gerrit, including both repository hosting and code reviews. The GitHub Repository is a continuously updated mirror which is primarily meant for issue tracking. Please see our Contributing to Ceres Guide for more details.
The upstream Gerrit repository is
https://ceres-solver.googlesource.com/ceres-solver