Replace Eigen block operations with small GEMM and GEMV loops.

1. Add Matrix-Matrix and Matrix-Vector multiply functions.
2. Replace Eigen usage in SchurEliminator with these custom
   matrix operations.
3. Save on some memory allocations in ChunkOuterProduct.
4. Replace LDLT with LLT.

As a result on problem-16-22106-pre.txt, the linear solver time
goes down from 1.2s to 0.64s.

Change-Id: I2daa667960e0a1e8834489965a30be31f37fd87f
diff --git a/CMakeLists.txt b/CMakeLists.txt
index a7b77fe..1f234c7 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -451,10 +451,19 @@
        ON)
 
 IF (NOT ${LINE_SEARCH_MINIMIZER})
- ADD_DEFINITIONS(-DCERES_NO_LINE_SEARCH_MINIMIZER)
- MESSAGE("-- Disabling line search minimizer")
+  ADD_DEFINITIONS(-DCERES_NO_LINE_SEARCH_MINIMIZER)
+  MESSAGE("-- Disabling line search minimizer")
 ENDIF (NOT ${LINE_SEARCH_MINIMIZER})
 
+OPTION(CUSTOM_BLAS
+       "Use handcoded BLAS routines (usually faster) instead of Eigen."
+       ON)
+
+IF (NOT ${CUSTOM_BLAS})
+  ADD_DEFINITIONS(-DCERES_NO_CUSTOM_BLAS)
+  MESSAGE("-- Disabling custom blas")
+ENDIF (NOT ${CUSTOM_BLAS})
+
 # Multithreading using OpenMP
 OPTION(OPENMP
        "Enable threaded solving in Ceres (requires OpenMP)"