.. role:: cpp(code) :language: cpp Multi-GPU Support ================= Kokkos has support for launching kernels on multiple GPUs from a single host process, e.g., a single MPI rank. This feature currently exists for the CUDA, HIP, and SYCL backends. Using this feature requires knowledge of backend specific API calls for creating non-default execution space instances. Once the execution space has been created, it can be used to create execution policies, allocate views, etc. all on the device chosen by the user. Constructing Execution Spaces ----------------------------- CUDA ~~~~ For CUDA backend, the user creates a ``cudaStream_t`` object and passes it to the ``Kokkos::Cuda`` constructor. .. note:: The lifetime of all Kokkos objects related to the stream (e.g., the execution space) needs to end before the stream itself is invalidated via ``cudaStreamDestroy``. This can be done by adding a scope around such objects, as is done below. .. code-block:: cpp // Query number of devices available int n_devices; cudaGetDeviceCount(&n_devices); // Choose some 0 <= N < n_devices int N = ...; // Create stream on device N cudaStream_t stream; cudaSetDevice(N); cudaStreamCreate(&stream); // Scope execution space to ensure stream // is destroyed *after* execution space { // Create execution space Kokkos::Cuda exec_space(stream); // Use execution space /* ... */ } // Destroying stream after use cudaSetDevice(N); cudaStreamDestroy(stream); HIP ~~~ For the HIP backend, like with CUDA, the user creates a ``hipStream_t`` object and passes it to the ``Kokkos::HIP`` constructor. .. note:: The lifetime of all Kokkos objects related to the stream (e.g., the execution space) needs to end before the stream itself is invalidated via ``hipStreamDestroy``. This can be done by adding a scope around such objects, as is done below. .. warning:: Multi-GPU only supported for ROCm 5.6 and later. Because of the lack of HIP API functions for querying a stream's device before ROCm 5.6, constructing a ``Kokkos::HIP`` instance on a non-default device isn't supported. .. code-block:: cpp // Query number of devices available int n_devices; hipGetDeviceCount(&n_devices); // Choose some 0 <= N < n_devices int N = ...; // Create stream on device N hipStream_t stream; hipSetDevice(N); hipStreamCreate(&stream); // Scope execution space to ensure stream // is destroyed *after* execution space { // Create execution space Kokkos::HIP exec_space(stream); // Use execution space /* ... */ } // Destroying stream after use hipSetDevice(N); hipStreamDestroy(stream); SYCL ~~~~ For the SYCL backend, the user creates a ``sycl::queue`` object and passes it to the ``Kokkos::SYCL`` constructor. .. code-block:: cpp // Get list of devices available std::vector gpu_devices = sycl::device::get_devices(sycl::info::device_type::gpu); // Choose some 0 <= N < gpu_devices.size() int N = ...; // Create a queue on device N. // Note: Kokkos requires SYCL queues to be "in_order" sycl::queue queue{gpu_devices[N], sycl::property::queue::in_order()}; // Create execution space Kokkos::SYCL exec_space(queue); // Use execution space /* ... */ Using Kokkos Methods -------------------- Once an execution space has been created on the chosen device, the execution space must be passed to all Kokkos methods intended to be used on the chosen device. If no execution space is passed, Kokkos will use the default execution space instance associated with the device with which Kokkos was initialized. Allocating Managed Views ~~~~~~~~~~~~~~~~~~~~~~~~ To allocate a managed view on device, pass the execution space to ``Kokkos::view_alloc()``. Example: .. code-block:: cpp using ExecutionSpace = decltype(exec_space); Kokkos::View V(Kokkos::view_alloc("V", exec_space), 10); Launching Kernels ~~~~~~~~~~~~~~~~~ To launch a kernel on device, pass the execution space to the policy constructor. Example: .. code-block:: cpp Kokkos::parallel_for("inc_V", Kokkos::RangePolicy(exec_space, 0, 10), KOKKOS_LAMBDA (const int i) { V(i) += i; }); Notes ----- - A `tutorial `_ for using multi-GPU on CUDA is available.