Further BLAS improvements.
1. Switch to Eigen's implementation when all dimensions are fixed.
2. Use lazyProduct for eigen matrix-vector product. This brings
eigen's performance on iterative_schur closer to what it used
to be before the last commit. There is however still an
improvement to be had by using the naive implementation when
the matrix and vector have dynamic dimensions.
BENCHMARK
HEAD CHANGE
problem-16-22106-pre.txt
gcc-eigen sparse_schur 0.859 gcc-eigen sparse_schur 0.853
clang-eigen sparse_schur 0.848 clang-eigen sparse_schur 0.850
gcc-blas sparse_schur 0.956 gcc-blas sparse_schur 0.865
clang-blas sparse_schur 0.954 clang-blas sparse_schur 0.858
gcc-eigen iterative_schur 4.656 gcc-eigen iterative_schur 3.271
clang-eigen iterative_schur 4.664 clang-eigen iterative_schur 3.307
gcc-blas iterative_schur 2.598 gcc-blas iterative_schur 2.620
clang-blas iterative_schur 2.554 clang-blas iterative_schur 2.567
problem-49-7776-pre.txt
gcc-eigen sparse_schur 0.477 gcc-eigen sparse_schur 0.472
clang-eigen sparse_schur 0.475 clang-eigen sparse_schur 0.479
gcc-blas sparse_schur 0.521 gcc-blas sparse_schur 0.469
clang-blas sparse_schur 0.508 clang-blas sparse_schur 0.471
gcc-eigen iterative_schur 3.172 gcc-eigen iterative_schur 2.088
clang-eigen iterative_schur 3.161 clang-eigen iterative_schur 2.079
gcc-blas iterative_schur 1.701 gcc-blas iterative_schur 1.720
clang-blas iterative_schur 1.708 clang-blas iterative_schur 1.694
problem-245-198739-pre.txt
gcc-eigen sparse_schur 28.092 gcc-eigen sparse_schur 28.233
clang-eigen sparse_schur 28.148 clang-eigen sparse_schur 28.400
gcc-blas sparse_schur 30.919 gcc-blas sparse_schur 28.110
clang-blas sparse_schur 31.001 clang-blas sparse_schur 28.407
gcc-eigen iterative_schur 63.095 gcc-eigen iterative_schur 43.694
clang-eigen iterative_schur 63.412 clang-eigen iterative_schur 43.473
gcc-blas iterative_schur 33.353 gcc-blas iterative_schur 33.321
clang-blas iterative_schur 33.276 clang-blas iterative_schur 33.278
problem-257-65132-pre.txt
gcc-eigen sparse_schur 3.687 gcc-eigen sparse_schur 3.629
clang-eigen sparse_schur 3.669 clang-eigen sparse_schur 3.652
gcc-blas sparse_schur 3.947 gcc-blas sparse_schur 3.673
clang-blas sparse_schur 3.952 clang-blas sparse_schur 3.678
gcc-eigen iterative_schur 121.512 gcc-eigen iterative_schur 76.833
clang-eigen iterative_schur 123.547 clang-eigen iterative_schur 78.763
gcc-blas iterative_schur 68.334 gcc-blas iterative_schur 68.612
clang-blas iterative_schur 67.793 clang-blas iterative_schur 68.266
Notes:
1. Naive BLAS was a bit worse than eigen on fixed sized matrices. We did not see this
before because of the different inlining thresholds. Fixing this boosted eigen's
performance. Also the disparity between gcc and clang has gone away.
2. SPARSE_SCHUR performance remains the same, since it is only testing static sized
matrices.
3. ITERATIVE_SCHUR performance goes up substantially due to the lazyProduct change,
but even there, since most of the products are dynamic sized, the naive implementation
wins handily.
Change-Id: Idc17f35b9c68aaebb1b2e131adf3af8374a85a4c
1 file changed