Enable support for AVX instructions for Jets.
- Eigen versions < 3.3 only supported SIMD instructions that required
16-byte alignment (SSE), whereas Eigen >= 3.3 also supports AVX
instructions which require 32+-byte alignment.
- Previously we only ever requested 16-byte alignment for Jets (if >=
C++11 was enabled) which resulted in Eigen assertions being triggered
on some compilers as reported as Issue #251:
https://github.com/ceres-solver/ceres-solver/issues/251.
- Now we use Eigen’s EIGEN_MAX_ALIGN_BYTES macro, defined in Eigen >=
3.3 to specify the byte alignment for Jets when C++11 is enabled for
Eigen >= 3.3. For Eigen versions < 3.3, we maintain the previous
behaviour of 16.
Change-Id: I749af7a70ae794e0c2a59301128db781e338422c
diff --git a/include/ceres/internal/port.h b/include/ceres/internal/port.h
index 8660a66..652f6fb 100644
--- a/include/ceres/internal/port.h
+++ b/include/ceres/internal/port.h
@@ -50,13 +50,13 @@
#endif
// We allocate some Eigen objects on the stack and other places they
-// might not be aligned to 16-byte boundaries. If we have C++11, we
-// can specify their alignment (which is desirable, as it means we can safely
-// enable vectorisation on matrices). However, the standard gives wide
-// lattitude as to what alignments are legal. It must be the case that
+// might not be aligned to X(=16 [SSE], 32 [AVX] etc)-byte boundaries. If we
+// have C++11, we can specify their alignment (which is desirable, as it means
+// we can safely enable vectorisation on matrices). However, the standard gives
+// wide lattitude as to what alignments are legal. It must be the case that
// alignments up to alignof(std::max_align_t) are valid, but this might be < 16
// on some platforms, in which case even if using C++11, on these platforms
-// we should not attempt to align to 16-byte boundaries. If using < C++11,
+// we should not attempt to align to X-byte boundaries. If using < C++11,
// we cannot specify the alignment.
#ifdef CERES_USE_CXX11
namespace port_constants {
diff --git a/include/ceres/jet.h b/include/ceres/jet.h
index 4f517f7..a0505a2 100644
--- a/include/ceres/jet.h
+++ b/include/ceres/jet.h
@@ -228,27 +228,51 @@
T a;
// The infinitesimal part.
-
+ //
// We allocate Jets on the stack and other places they might not be aligned
- // to 16-byte boundaries, which would prevent the safe use of vectorisation.
- // If we have C++11, we can specify the alignment. However, the standard
- // gives wide lattitude as to what alignments are valid, and it might be that
- // the maximum supported alignment is < 16, in which case even with C++11, we
- // cannot specify 16-byte alignment. If using < C++11, we cannot specify
- // alignment.
+ // to X(=16 [SSE], 32 [AVX] etc)-byte boundaries, which would prevent the safe
+ // use of vectorisation. If we have C++11, we can specify the alignment.
+ // However, the standard gives wide lattitude as to what alignments are valid,
+ // and it might be that the maximum supported alignment *guaranteed* to be
+ // supported is < 16, in which case we do not specify an alignment, as this
+ // implies the host is not a modern x86 machine. If using < C++11, we cannot
+ // specify alignment.
#ifndef CERES_USE_CXX11
- // fall back to safe version:
+ // Without >= C++11, we cannot specify the alignment so fall back to safe,
+ // unvectorised version.
Eigen::Matrix<T, N, 1, Eigen::DontAlign> v;
#else
- static constexpr bool kShouldAlignMatrix =
- 16 <= ::ceres::port_constants::kMaxAlignBytes;
- static constexpr int kAlignHint = kShouldAlignMatrix ?
- Eigen::AutoAlign : Eigen::DontAlign;
- // Default to the native alignment of double if 16-byte alignment is not
- // supported. We cannot use alignof(T) as if we do, GCC complains that the
- // alignment 'is not an integer constant', although Clang accepts it.
- static constexpr size_t kAlignment = kShouldAlignMatrix ? 16 : alignof(double);
- alignas(kAlignment) Eigen::Matrix<T, N, 1, kAlignHint> v;
+ // Enable vectorisation iff the maximum supported scalar alignment is >=
+ // 16 bytes, as this is the minimum required by Eigen for any vectorisation.
+ //
+ // NOTE: It might be the case that we could get >= 16-byte alignment even if
+ // kMaxAlignBytes < 16. However we can't guarantee that this
+ // would happen (and it should not for any modern x86 machine) and if it
+ // didn't, we could get misaligned Jets.
+ static constexpr int kAlignOrNot =
+ 16 <= ::ceres::port_constants::kMaxAlignBytes
+ ? Eigen::AutoAlign : Eigen::DontAlign;
+#if defined(EIGEN_MAX_ALIGN_BYTES)
+ // Eigen >= 3.3 supports AVX & FMA instructions that require 32-byte alignment
+ // (greater for AVX512). Rather than duplicating the detection logic, use
+ // Eigen's macro for the alignment size.
+ //
+ // NOTE: EIGEN_MAX_ALIGN_BYTES can be > 16 (e.g. 32 for AVX), even though
+ // kMaxAlignBytes will max out at 16. We are therefore relying on
+ // Eigen's detection logic to ensure that this does not result in
+ // misaligned Jets.
+#define CERES_JET_ALIGN_BYTES EIGEN_MAX_ALIGN_BYTES
+#else
+ // Eigen < 3.3 only supported 16-byte alignment.
+#define CERES_JET_ALIGN_BYTES 16
+#endif
+ // Default to the native alignment if 16-byte alignment is not guaranteed to
+ // be supported. We cannot use alignof(T) as if we do, GCC 4.8 complains that
+ // the alignment 'is not an integer constant', although Clang accepts it.
+ static constexpr size_t kAlignment = kAlignOrNot == Eigen::AutoAlign
+ ? CERES_JET_ALIGN_BYTES : alignof(double);
+#undef CERES_JET_ALIGN_BYTES
+ alignas(kAlignment) Eigen::Matrix<T, N, 1, kAlignOrNot> v;
#endif
};