Align Jet matrices where supported

We currently don't align the infinitesimal part of a Jet to a 16-byte
boundary (and thus force Eigen to avoid using SSE ops)--as a member of a
larger struct, we couldn't guarantee Jets would be allocated on
appropriately-aligned boundaries.  However, C++11 adds better support
for requesting alignment: we can use it to guarantee the members will be
properly aligned, and tell Eigen to vectorize.

There is a significant gotcha here: the standard gives wide latitude to
implementations as to which alignments they choose to support.  If we
ask for 16 and the system only supports 8, we may have misaligned
Jets. So we test (using alignof(std::max_align_t)) that the current
system supports 16-byte aligned values; if not, we fall back to the
current solution.

Two other small notes:
- This is obviously gated on C++11 support, and
  thus we put the logic in port.h and export some useful #defines.

- GCC 4.8.x has a
  bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56019) that has
  max_align_t in the wrong namespace.  This will not be a problem with a
  modern GCC, but add a small workaround since many systems still ship 4.8.

This results (on a x86 workstation) in a 60% speedup in Jacobian
evaluation on bin/simple_bundle_adjuster problem-16-22106-pre.txt.

Change-Id: I169b637a1e2a106956b536c41d6a514a266e7cc0
diff --git a/include/ceres/internal/port.h b/include/ceres/internal/port.h
index e57049d..1193fcf 100644
--- a/include/ceres/internal/port.h
+++ b/include/ceres/internal/port.h
@@ -35,7 +35,7 @@
 #ifdef __cplusplus
 
 #include "ceres/internal/config.h"
-
+#include "Eigen/Core"
 #if defined(CERES_TR1_MEMORY_HEADER)
 #include <tr1/memory>
 #else
@@ -50,6 +50,41 @@
 using std::shared_ptr;
 #endif
 
+// We allocate some Eigen objects on the stack and other places they
+// might not be aligned to 16-byte boundaries.  If we have C++11, we
+// can specify their alignment anyway, and thus can safely enable
+// vectorization on those matrices; in C++99, we are out of luck.  Figure out
+// what case we're in and write macros that do the right thing.
+#ifdef CERES_USE_CXX11
+namespace port_constants {
+
+static constexpr size_t kMaxAlignBytes =
+    // Work around a GCC 4.8 bug
+    // (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56019) where
+    // std::max_align_t is misplaced.
+#if defined (__GNUC__) && __GNUC__ == 4 && __GNUC_MINOR__ == 8
+    alignof(::max_align_t);
+#else
+    alignof(std::max_align_t);
+#endif
+
+static constexpr bool kShouldAlignMatrix = 16 <= kMaxAlignBytes;
+static constexpr size_t kAlignment = kShouldAlignMatrix ? 16 : 1;
+
+static constexpr int kEigenAlignmentHint =
+    kShouldAlignMatrix ? Eigen::AutoAlign : Eigen::DontAlign;
+}  // namespace port_constants
+
+#define CERES_ALIGNMENT_SPECIFIER alignas(::ceres::port_constants::kAlignment)
+#define CERES_MATRIX_ALIGN_HINT ::ceres::port_constants::kEigenAlignmentHint
+
+#else // !CXX_11
+
+#define CERES_ALIGNMENT_SPECIFIER
+#define CERES_MATRIX_ALIGN_HINT Eigen::DontAlign
+
+#endif
+
 }  // namespace ceres
 
 #endif  // __cplusplus
diff --git a/include/ceres/jet.h b/include/ceres/jet.h
index a21fd7a..8515be8 100644
--- a/include/ceres/jet.h
+++ b/include/ceres/jet.h
@@ -164,6 +164,7 @@
 
 #include "Eigen/Core"
 #include "ceres/fpclassify.h"
+#include "ceres/internal/port.h"
 
 namespace ceres {
 
@@ -227,21 +228,8 @@
   T a;
 
   // The infinitesimal part.
-  //
-  // Note the Eigen::DontAlign bit is needed here because this object
-  // gets allocated on the stack and as part of other arrays and
-  // structs. Forcing the right alignment there is the source of much
-  // pain and suffering. Even if that works, passing Jets around to
-  // functions by value has problems because the C++ ABI does not
-  // guarantee alignment for function arguments.
-  //
-  // Setting the DontAlign bit prevents Eigen from using SSE for the
-  // various operations on Jets. This is a small performance penalty
-  // since the AutoDiff code will still expose much of the code as
-  // statically sized loops to the compiler. But given the subtle
-  // issues that arise due to alignment, especially when dealing with
-  // multiple platforms, it seems to be a trade off worth making.
-  Eigen::Matrix<T, N, 1, Eigen::DontAlign> v;
+  // See ceres/include/internal/port.h for meaning of the #defines here.
+  CERES_ALIGNMENT_SPECIFIER Eigen::Matrix<T, N, 1, CERES_MATRIX_ALIGN_HINT> v;
 };
 
 // Unary +