CUDA Cleanup

* All Cuda* objects now take in a ContextImpl* during
  construction, and save the context instead of individual
* Since we no longer use the legacy default stream, we need to
  explicitly synchronize the stream before performing GPU->CPU
  transfers, and CudaBuffer is responsible for such synchronization
  when asked to perform GPU to CPU transfers.
* Remove all manual syncs and relegate syncing to CudaBuffer
  before performing GPU to CPU transfers.

