Let ITERATIVE_SCHUR use an explicit Schur Complement matrix.

Up till now ITERATIVE_SCHUR evaluates matrix-vector products
between the Schur complement and a vector implicitly by exploiting
the algebraic expression for the Schur complement.

This cost of this evaluation scales with the number of non-zeros
in the Jacobian.

For small to medium sized problems there is a sweet spot where
computing the Schur complement is cheap enough that it is much
more efficient to explicitly compute it and use it for evaluating
the matrix-vector products.

This changes implements support for an explicit Schur complement
in ITERATIVE_SCHUR in combination with the SCHUR_JACOBI preconditioner.

API wise a new bool Solver::Options::use_explicit_schur_complement
has been added.

The implementation extends the SparseSchurComplementSolver to use
Conjugate Gradients.

Example speedup:

use_explicit_schur_complement = false

Time (in seconds):
Preprocessor                            0.585

  Residual evaluation                   0.319
  Jacobian evaluation                   1.590
  Linear solver                        25.685
Minimizer                              27.990

Postprocessor                           0.010
Total                                  28.585

use_explicit_schur_complement = true

Time (in seconds):
Preprocessor                            0.638

  Residual evaluation                   0.318
  Jacobian evaluation                   1.507
  Linear solver                         5.930
Minimizer                               8.144

Postprocessor                           0.010
Total                                   8.791

Which indicates an end-to-end speedup of more than 3x, with the linear
solver being sped up by > 4x.

The idea to explore this optimization was inspired by the recent paper:

Mining structure fragments for smart bundle adjustment
L. Carlone, P. Alcantarilla, H. Chiu, K. Zsolt, F. Dellaert
British Machine Vision Conference, 2014

which uses a more complicated algorithm to compute parts of the
Schur complement to speed up the matrix-vector product.

Change-Id: I95324af0ab351faa1600f5204039a1d2a64ae61d
diff --git a/include/ceres/solver.h b/include/ceres/solver.h
index 0af34ca..a5efa2a 100644
--- a/include/ceres/solver.h
+++ b/include/ceres/solver.h
@@ -118,6 +118,7 @@
 #endif
 
       num_linear_solver_threads = 1;
+      use_explicit_schur_complement = false;
       use_postordering = false;
       dynamic_sparsity = false;
       min_linear_solver_iterations = 0;
@@ -496,6 +497,29 @@
     // smaller than the group containing cameras.
     shared_ptr<ParameterBlockOrdering> linear_solver_ordering;
 
+    // Use an explicitly computed Schur complement matrix with
+    // ITERATIVE_SCHUR.
+    //
+    // By default this option is disabled and ITERATIVE_SCHUR
+    // evaluates evaluates matrix-vector products between the Schur
+    // complement and a vector implicitly by exploiting the algebraic
+    // expression for the Schur complement.
+    //
+    // The cost of this evaluation scales with the number of non-zeros
+    // in the Jacobian.
+    //
+    // For small to medium sized problems there is a sweet spot where
+    // computing the Schur complement is cheap enough that it is much
+    // more efficient to explicitly compute it and use it for evaluating
+    // the matrix-vector products.
+    //
+    // Enabling this option tells ITERATIVE_SCHUR to use an explicitly
+    // computed Schur complement.
+    //
+    // NOTE: This option can only be used with the SCHUR_JACOBI
+    // preconditioner.
+    bool use_explicit_schur_complement;
+
     // Sparse Cholesky factorization algorithms use a fill-reducing
     // ordering to permute the columns of the Jacobian matrix. There
     // are two ways of doing this.