Improve the performance of MatrixTransposeVector Multiply.

This is done by making the matrix access more cache coherent.
For small matrices there isn't much difference, but for
larger matrices like 4x20, this leads to ~50% performance
improvement.

Benchmark                                        Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------
BM_MatrixVectorMultiply/1/1                   +0.0138         +0.0046            10            10            10            10
BM_MatrixVectorMultiply/1/2                   +0.0215         +0.0146            10            11            10            10
BM_MatrixVectorMultiply/1/3                   +0.0469         +0.0422            10            10            10            10
BM_MatrixVectorMultiply/1/4                   +0.0696         +0.0671            10            11            10            11
BM_MatrixVectorMultiply/1/6                   +0.0827         +0.0795            12            13            12            13
BM_MatrixVectorMultiply/1/7                   +0.1408         +0.1152            13            15            13            14
BM_MatrixVectorMultiply/1/12                  +0.0317         +0.0232            17            18            17            17
BM_MatrixVectorMultiply/1/16                  -0.0362         -0.0168            21            20            20            20
BM_MatrixVectorMultiply/1/20                  +0.0304         +0.0290            22            22            22            22
BM_MatrixVectorMultiply/2/1                   +0.0019         +0.0009            10            10            10            10
BM_MatrixVectorMultiply/2/2                   +0.0285         +0.0291            11            12            11            12
BM_MatrixVectorMultiply/2/3                   -0.1178         -0.0977            16            14            15            14
BM_MatrixVectorMultiply/2/4                   +0.0938         +0.1144            15            17            15            17
BM_MatrixVectorMultiply/2/6                   +0.0617         +0.0629            17            18            17            18
BM_MatrixVectorMultiply/2/7                   +0.0956         +0.0876            20            22            19            21
BM_MatrixVectorMultiply/2/12                  +0.0043         +0.0119            24            24            24            24
BM_MatrixVectorMultiply/2/16                  +0.0361         +0.0337            29            30            29            30
BM_MatrixVectorMultiply/2/20                  +0.0463         +0.0365            33            35            33            34
BM_MatrixVectorMultiply/3/1                   -0.0201         -0.0210            12            12            12            11
BM_MatrixVectorMultiply/3/2                   -0.0741         -0.0766            16            15            16            15
BM_MatrixVectorMultiply/3/3                   -0.0076         -0.0118            18            18            18            18
BM_MatrixVectorMultiply/3/4                   +0.1071         +0.0963            19            21            19            21
BM_MatrixVectorMultiply/3/6                   +0.0449         +0.0390            23            24            23            23
BM_MatrixVectorMultiply/3/7                   +0.1099         +0.1018            25            28            24            27
BM_MatrixVectorMultiply/3/12                  +0.1512         +0.0992            33            38            32            35
BM_MatrixVectorMultiply/3/16                  +0.2281         +0.2005            37            46            37            44
BM_MatrixVectorMultiply/3/20                  +0.2387         +0.1799            49            61            48            57
BM_MatrixVectorMultiply/4/1                   +0.4444         +0.2574            14            21            14            18
BM_MatrixVectorMultiply/4/2                   +0.0313         +0.0230            19            20            19            20
BM_MatrixVectorMultiply/4/3                   +0.0626         +0.0596            23            24            23            24
BM_MatrixVectorMultiply/4/4                   +0.2322         +0.1440            23            28            23            26
BM_MatrixVectorMultiply/4/6                   +0.0936         +0.0768            26            29            26            28
BM_MatrixVectorMultiply/4/7                   +0.0848         +0.0835            28            30            28            30
BM_MatrixVectorMultiply/4/12                  +0.1607         +0.1101            39            46            39            43
BM_MatrixVectorMultiply/4/16                  +0.0752         +0.0687            48            52            48            51
BM_MatrixVectorMultiply/4/20                  +0.1782         +0.1463            61            72            60            69
BM_MatrixTransposeVectorMultiply/1/1          +0.3609         +0.2857             9            13             9            12
BM_MatrixTransposeVectorMultiply/1/2          +0.3106         +0.2970            10            13            10            12
BM_MatrixTransposeVectorMultiply/1/3          +0.3018         +0.2383            11            14            11            13
BM_MatrixTransposeVectorMultiply/1/4          -0.0795         -0.0819            14            13            14            12
BM_MatrixTransposeVectorMultiply/1/6          -0.0108         -0.0629            18            18            18            16
BM_MatrixTransposeVectorMultiply/1/7          -0.1073         -0.0879            20            18            19            17
BM_MatrixTransposeVectorMultiply/1/12         -0.3035         -0.3016            26            18            26            18
BM_MatrixTransposeVectorMultiply/1/16         -0.4909         -0.4951            39            20            38            19
BM_MatrixTransposeVectorMultiply/1/20         -0.4619         -0.4985            43            23            42            21
BM_MatrixTransposeVectorMultiply/2/1          +0.3471         +0.2906            10            13            10            13
BM_MatrixTransposeVectorMultiply/2/2          +0.2323         +0.2266            12            15            12            15
BM_MatrixTransposeVectorMultiply/2/3          +0.0802         +0.0779            16            17            16            17
BM_MatrixTransposeVectorMultiply/2/4          -0.0951         -0.0983            19            17            19            17
BM_MatrixTransposeVectorMultiply/2/6          -0.0974         -0.1064            24            21            24            21
BM_MatrixTransposeVectorMultiply/2/7          +0.0612         -0.0457            27            29            27            26
BM_MatrixTransposeVectorMultiply/2/12         -0.3777         -0.3838            41            25            41            25
BM_MatrixTransposeVectorMultiply/2/16         -0.4783         -0.4843            53            28            53            27
BM_MatrixTransposeVectorMultiply/2/20         -0.5567         -0.5566            71            32            70            31
BM_MatrixTransposeVectorMultiply/3/1          +0.4607         +0.4753            10            15            10            15
BM_MatrixTransposeVectorMultiply/3/2          +0.2870         +0.2754            14            19            14            18
BM_MatrixTransposeVectorMultiply/3/3          +0.1270         +0.1245            19            21            19            21
BM_MatrixTransposeVectorMultiply/3/4          +0.0160         +0.0076            22            22            22            22
BM_MatrixTransposeVectorMultiply/3/6          -0.0612         -0.0635            27            26            27            25
BM_MatrixTransposeVectorMultiply/3/7          -0.0531         -0.0695            31            29            30            28
BM_MatrixTransposeVectorMultiply/3/12         -0.4009         -0.3938            49            29            47            29
BM_MatrixTransposeVectorMultiply/3/16         -0.4584         -0.4537            64            35            62            34
BM_MatrixTransposeVectorMultiply/3/20         -0.5087         -0.5098            78            38            77            38
BM_MatrixTransposeVectorMultiply/4/1          +0.6696         +0.6837            11            18            11            18
BM_MatrixTransposeVectorMultiply/4/2          +0.3085         +0.3085            17            22            17            22
BM_MatrixTransposeVectorMultiply/4/3          +0.2908         +0.2821            21            26            20            26
BM_MatrixTransposeVectorMultiply/4/4          +0.0076         -0.0031            24            25            24            24
BM_MatrixTransposeVectorMultiply/4/6          -0.0884         -0.0841            34            31            34            31
BM_MatrixTransposeVectorMultiply/4/7          -0.0834         -0.0825            37            34            36            33
BM_MatrixTransposeVectorMultiply/4/12         -0.4477         -0.4453            62            34            61            34
BM_MatrixTransposeVectorMultiply/4/16         -0.5324         -0.5203            86            40            83            40
BM_MatrixTransposeVectorMultiply/4/20         -0.4905         -0.4933            99            50            98            50

Change-Id: I7f2a1c986e4a345bb67cb9eb0235234573024889
3 files changed
tree: c7398b8f2bd27ae334eaeddbce3caa6901d009a0
  1. bazel/
  2. cmake/
  3. config/
  4. data/
  5. docs/
  6. examples/
  7. include/
  8. internal/
  9. jni/
  10. scripts/
  11. .gitignore
  12. BUILD
  13. CMakeLists.txt
  14. LICENSE
  15. package.xml
  16. README.md
  17. WORKSPACE
README.md

Ceres Solver

Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

  1. Non-linear Least Squares problems with bounds constraints.
  2. General unconstrained optimization problems.

Please see ceres-solver.org for more information.

WARNING - Do not make GitHub pull requests!

Ceres development happens on Gerrit, including both repository hosting and code reviews. The GitHub Repository is a continuously updated mirror which is primarily meant for issue tracking. Please see our Contributing to Ceres Guide for more details.

The upstream Gerrit repository is

https://ceres-solver.googlesource.com/ceres-solver