commit | b1fe60330559371c9e985886412ae2d39b581fe3 | [log] [tgz] |
---|---|---|
author | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Wed Sep 07 16:06:51 2022 +0300 |
committer | Dmitriy Korchemkin <dmitriy.korchemkin@gmail.com> | Fri Sep 30 17:23:13 2022 +0300 |
tree | ef572ab94ee4f7215e75aa88245d150baf8927a4 | |
parent | 16668eedf34f1bba70649b381a60ad9d26972064 [diff] |
Parallel right products for partitioned view Parallel implementations for right-multiply by dense vector for: - Partitioned matrix view - Block-sparse matrix - CRS matrix (non-symmetric only) When coupled with non-interleaving indexes in parallel for, this simple aproach provides a reasonable speedup. For example, in CRS case difference with GPGPU approach reduces closer to memory throughput ratio for high enough core count. ./bin/spmv_benchmark ------------------------------------------------------------------- Benchmark Time ------------------------------------------------------------------- BM_BlockSparseRightMultiplyAndAccumulateBA/1 28.5 ms BM_BlockSparseRightMultiplyAndAccumulateBA/2 15.7 ms BM_BlockSparseRightMultiplyAndAccumulateBA/4 9.01 ms BM_BlockSparseRightMultiplyAndAccumulateBA/8 5.60 ms BM_BlockSparseRightMultiplyAndAccumulateBA/16 3.86 ms BM_BlockSparseRightMultiplyAndAccumulateBA/28 3.84 ms BM_BlockSparseRightMultiplyAndAccumulateUnstructured/1 23.8 ms BM_BlockSparseRightMultiplyAndAccumulateUnstructured/2 15.0 ms BM_BlockSparseRightMultiplyAndAccumulateUnstructured/4 8.01 ms BM_BlockSparseRightMultiplyAndAccumulateUnstructured/8 4.02 ms BM_BlockSparseRightMultiplyAndAccumulateUnstructured/16 2.39 ms BM_BlockSparseRightMultiplyAndAccumulateUnstructured/28 1.68 ms BM_BlockSparseLeftMultiplyAndAccumulateBA 30.7 ms BM_BlockSparseLeftMultiplyAndAccumulateUnstructured 41.5 ms BM_CRSRightMultiplyAndAccumulateBA/1 24.1 ms BM_CRSRightMultiplyAndAccumulateBA/2 13.6 ms BM_CRSRightMultiplyAndAccumulateBA/4 8.70 ms BM_CRSRightMultiplyAndAccumulateBA/8 5.34 ms BM_CRSRightMultiplyAndAccumulateBA/16 3.99 ms BM_CRSRightMultiplyAndAccumulateBA/28 4.00 ms BM_CRSRightMultiplyAndAccumulateUnstructured/1 21.1 ms BM_CRSRightMultiplyAndAccumulateUnstructured/2 10.83 ms BM_CRSRightMultiplyAndAccumulateUnstructured/4 5.88 ms BM_CRSRightMultiplyAndAccumulateUnstructured/8 3.68 ms BM_CRSRightMultiplyAndAccumulateUnstructured/16 2.21 ms BM_CRSRightMultiplyAndAccumulateUnstructured/28 1.71 ms BM_CRSLeftMultiplyAndAccumulateBA 23.6 ms BM_CRSLeftMultiplyAndAccumulateUnstructured 22.5 ms BM_CudaRightMultiplyAndAccumulateBA 0.679 ms BM_CudaRightMultiplyAndAccumulateUnstructured 0.480 ms BM_CudaLeftMultiplyAndAccumulateBA 0.774 ms BM_CudaLeftMultiplyAndAccumulateUnstructured 0.361 ms ./bin/partitioned_matrix_view_benchmark ----------------------------------------------------------------- Benchmark Time ----------------------------------------------------------------- BM_PatitionedViewRightMultiplyAndAccumulateE_Static/1 18.5 ms BM_PatitionedViewRightMultiplyAndAccumulateE_Static/2 10.7 ms BM_PatitionedViewRightMultiplyAndAccumulateE_Static/4 6.34 ms BM_PatitionedViewRightMultiplyAndAccumulateE_Static/8 4.26 ms BM_PatitionedViewRightMultiplyAndAccumulateE_Static/16 3.86 ms BM_PatitionedViewRightMultiplyAndAccumulateE_Static/28 3.75 ms BM_PatitionedViewRightMultiplyAndAccumulateF_Static/1 18.8 ms BM_PatitionedViewRightMultiplyAndAccumulateF_Static/2 11.9 ms BM_PatitionedViewRightMultiplyAndAccumulateF_Static/4 6.94 ms BM_PatitionedViewRightMultiplyAndAccumulateF_Static/8 4.41 ms BM_PatitionedViewRightMultiplyAndAccumulateF_Static/16 3.63 ms BM_PatitionedViewRightMultiplyAndAccumulateF_Static/28 3.86 ms Timings correspond to intel 8176 cpu and 2080ti nvidia gpu, with OpenMP threading backend. Change-Id: Idc07d0563103d057ca3c8412de81a7823fe232af
Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.
Please see ceres-solver.org for more information.