Core¶
Data Structures¶
Communicators¶
-
template<CommunicationSpace Comm = DefaultCommunicationSpace, KokkosExecutionSpace Exec = Kokkos::DefaultExecutionSpace>
class Communicator¶ Template class for communicator wrappers of different communication space types.
Communicators wrap a communication library-specific communicator (e.g.
MPI_Comm) and a Kokkos execution space, tightly coupling the two.Communicatorobjects are constructed via factory member functions. The parameterized constructor is private, and no default constructor is defined. They are move-only objects: copy construction and copy assignment are explicitly deleted. Use theduplicatemember functions to create equivalent “copies” of communicators. There is always exactly one owner of aCommunicator.- Template Parameters:
Co – The communication space (transport backend) to use. Defaults to
DefaultCommunicationSpace.Ex – The Kokkos execution space to use. Defaults to
Kokkos::DefaultExecutionSpace.
Common interfaces¶
Both specializations share the following interface:
-
~Communicator() noexcept¶
Destructor.
-
Communicator(const Communicator&) = delete¶
Copy constructor is deleted because a
Communicatorcannot be implicitly copied. Useduplicateinstead.
-
auto operator=(const Communicator&) -> Communicator& = delete¶
Copy assignment operator is deleted because a
Communicatorcannot be implicitly copied. Useduplicateinstead.
-
Communicator(Communicator&&) noexcept¶
Move-constructs a
Communicator.
-
auto operator=(Communicator&&) noexcept -> Communicator&¶
Move-assigns a
Communicator.
-
[[nodiscard]] constexpr auto size() const noexcept -> size_type¶
- Returns:
The size (i.e., number of processes) in the communicator.
-
[[nodiscard]] constexpr auto rank() const noexcept -> rank_type¶
- Returns:
The rank that identifies the calling process within the communicator.
-
[[nodiscard]] auto split(int color, int key) noexcept -> std::optional<Communicator<Comm, Exec>>¶
Splits a
Communicator.Given a color and a key, creates as many new communicators as distinct values of
colorare given, ordering processes according to the value ofkey. All processes with the same color join the same communicator.- Parameters:
color – A value controlling in which split communicator the calling process should be in.
key – A value ordering the calling process within the split communicator.
- Returns:
A communicator if the calling process is part of one of the split communicators,
std::nulloptif the color is a special value excluding the process at this rank or on error.
-
[[nodiscard]] auto duplicate() noexcept -> std::optional<Communicator<Comm, Exec>>¶
Duplicates a
Communicator.- Returns:
A communicator on success,
std::nullopton error.
MPI specialization¶
-
template<KokkosExecutionSpace Exec>
class Communicator<MpiSpace, Exec>¶ Communicator specialization for the
MpiSpacecommunication space. Wraps anMPI_Commhandle.-
using communicator_type = MPI_Comm¶
-
using size_type = int¶
-
using rank_type = int¶
-
[[nodiscard]] static auto from_raw(MPI_Comm comm, const Exec &exec = Exec{}) noexcept -> Communicator<MpiSpace, Exec>¶
Constructs a
Communicatorfrom a rawMPI_Commhandle and a Kokkos execution space instance. DefaultsexectoExec. The passed handle must be a valid handle and must not be an inter-communicator parent handle. The returned communicator does not own the underlying handle, and the user is responsible for destroying it.- Parameters:
comm – A valid communicator handle.
exec – A Kokkos execution space instance. Defaults to
Kokkos::DefaultExecutionSpace.
- Returns:
A communicator on success,
std::nulloptif the passed handle wasMPI_COMM_NULL.
-
[[nodiscard]] static auto split_from_raw(const MPI_Comm comm, int color, int key, const Exec &exec = Exec{}) noexcept -> std::optional<Communicator<MpiSpace, Exec>>¶
Splits from a raw MPI communicator and associates it to a Kokkos execution space instance. Defaults
exectoExec.Creates as many new communicators as distinct values of
colorare given, and orders processes according to the value ofkey. All processes with the same value ofcolorjoin the same communicator. A process that passesMPI_UNDEFINEDascolorwill not join a new communicator.- Parameters:
comm – A valid communicator handle.
color – A value controlling in which split communicator the calling process should be in.
key – A value ordering the calling process within the split communicator.
exec – A Kokkos execution space instance. Defaults to
Kokkos::DefaultExecutionSpace.
- Returns:
A split communicator on success,
std::nulloptif the passed color wasMPI_UNDEFINEDor on error.
-
[[nodiscard]] static auto duplicate_from_raw(const MPI_Comm comm, const Exec &exec = Exec{}) noexcept -> std::optional<Communicator<MpiSpace, Exec>>¶
Duplicates from a raw MPI communicator.
- Parameters:
comm – A valid communicator handle.
exec – A Kokkos execution space instance. Defaults to
Kokkos::DefaultExecutionSpace.
- Returns:
A communicator on success,
std::nullopton error.
-
[[nodiscard]] auto comm() noexcept -> MPI_Comm&¶
-
[[nodiscard]] auto comm() const noexcept -> const MPI_Comm&¶
- Returns:
A reference to the underlying
MPI_Commobject.
-
[[nodiscard]] auto exec() const noexcept -> const execution_space&¶
- Returns:
A const reference to the associated execution space instance.
-
using communicator_type = MPI_Comm¶
NCCL specialization¶
-
template<>
class Communicator<Experimental::NcclSpace, Kokkos::Cuda>¶ Communicator specialization for the
Experimental::NcclSpacecommunication space. Wraps anncclComm_thandle.-
using execution_space = Kokkos::Cuda¶
-
using communicator_type = ncclComm_t¶
-
using size_type = int¶
-
using rank_type = int¶
-
[[nodiscard]] static auto from_raw(ncclComm_t comm, const Kokkos::Cuda &exec = Kokkos::Cuda{}) noexcept -> Communicator<Experimental::NcclSpace, Kokkos::Cuda>¶
Constructs a
Communicatorfrom a rawncclComm_thandle and a Kokkos CUDA execution space instance. DefaultsexectoKokkos::Cuda. The returned communicator does not own the underlying handle, and the user is responsible for destroying it.- Parameters:
comm – A valid communicator handle.
exec – A Kokkos CUDA execution space instance. Defaults to
Kokkos::Cuda.
- Returns:
A communicator on success,
std::nulloptif the passed handle wasnullptr.
-
[[nodiscard]] static auto split_from_raw(const ncclComm_t comm, int color, int key, const Kokkos::Cuda &exec = Kokkos::Cuda{}) noexcept -> std::optional<Communicator<Experimental::NcclSpace, Kokkos::Cuda>>¶
Splits from a raw NCCL communicator and associates it to a Kokkos CUDA, tion space instanc and
MPI_COMM_NULL. DefaultsexectoKokkos::Cuda.Creates as many new communicators as distinct values of
colorare given, and orders processes according to the value ofkey. All processes with the same value ofcolorjoin the same communicator. A process that passesNCCL_SPLIT_NOCOLORascolorwill not join a new communicator.- Parameters:
comm – A valid communicator handle.
color – A value controlling in which split communicator the calling process should be in.
key – A value ordering the calling process within the split communicator.
exec – A Kokkos CUDA execution space instance. Defaults to
Kokkos::Cuda.
- Returns:
A split communicator on success,
std::nulloptif the passed color wasNCCL_SPLIT_NOCOLORor on error.
-
[[nodiscard]] static auto duplicate_from_raw(const ncclComm_t comm, const Kokkos::Cuda &exec = Kokkos::Cuda{}) noexcept -> std::optional<Communicator<Experimental::NcclSpace, Kokkos::Cuda>>¶
Duplicates from a raw NCCL communicator.
- Parameters:
comm – A valid communicator handle.
exec – A Kokkos CUDA execution space instance. Defaults to
Kokkos::Cuda.
- Returns:
A communicator on success,
std::nullopton error.
-
[[nodiscard]] auto comm() noexcept -> ncclComm_t&¶
-
[[nodiscard]] auto comm() const noexcept -> const ncclComm_t&¶
- Returns:
A reference to the underlying
ncclComm_tobject.
-
[[nodiscard]] auto exec() const noexcept -> const Kokkos::Cuda&¶
- Returns:
A const reference to the associated
Kokkos::Cudaexecution space instance.
-
using execution_space = Kokkos::Cuda¶
Requests¶
-
template<CommunicationSpace C = DefaultCommSpace>
class Request¶ Template class for request wrappers of different communication space types.
Requestobjects are move-only: copy construction and copy assignment are explicitly deleted. There is always exactly one owner of aRequestand its associated callbacks. This design ensures it is impossible for the same callback to be executed more than once.- Template Parameters:
C – The communication backend to use. Defaults to
DefaultCommunicationSpace.
Common interfaces¶
Both specializations share the, llowing interface and MPI_COMM_NULL:
-
auto operator=(const Request&) -> Request& = delete¶
Copy assignment operator is deleted because a
Requestcan only be moved.
-
template<KokkosView V>
auto extend_view_lifetime(const V &view) -> void¶ Captures a Kokkos View to extend its lifetime until the request completes. Has no effect on unmanaged Views.
- Template Parameters:
V – A Kokkos View type.
- Parameters:
view – The view whose lifetime should be extended.
-
auto add_callback(std::function<void()> &&cb) -> void¶
Registers a callback to be invoked after the request completes.
- Parameters:
cb – The callback function to register.
-
auto wait() -> void¶
Blocks until the associated operation completes. Executes all registered callbacks upon completion.
-
auto test() -> bool¶
Non-blocking query for the completion of the associated operation. Executes all registered callbacks if the operation has completed.
- Returns:
trueif the request has completed,falseotherwise.
-
auto wait(Request &request) -> void¶
Free function overload. Waits on
requestuntil the associated operation completes.- Parameters:
request – A reference to the request to wait on.
-
auto wait(Request &&request) -> void¶
Free function overload. Waits on an r-value
request, consuming it upon completion.- Parameters:
request – An r-value reference to the request to wait on.
-
auto wait_all(std::span<Request> requests) -> void¶
Waits for the completion of all requests in
requests.- Parameters:
requests – The list of requests to complete.
MPI specialization¶
-
template<>
class Request<MpiSpace>¶ Request specialization for the
MpiSpacecommunication space. Wraps anMPI_Requesthandle.-
using request_type = MpiSpace::request_type¶
-
explicit Request(request_type request = MPI_REQUEST_NULL)¶
Constructs a
Requestfrom anMPI_Requesthandle.- Parameters:
request – The
MPI_Requestto encapsulate. Defaults toMPI_REQUEST_NULL.
-
auto request() noexcept -> request_type&¶
-
auto request() const noexcept -> const request_type&¶
- Returns:
A reference to the underlying
MPI_Requestobject.
-
auto request_ptr() noexcept -> request_type*¶
-
auto request_ptr() const noexcept -> const request_type*¶
- Returns:
A pointer to the underlying
MPI_Requestobject.
Note
Both
wait_allandwait_anycopy the underlyingMPI_Requestobjects into an intermediate container before callingMPI_WaitallandMPI_Waitany, respectively, which incurs an allocation overhead.-
using request_type = MpiSpace::request_type¶
NCCL specialization¶
-
template<>
class Request<Experimental::NcclSpace>¶ Request specialization for the
Experimental::NcclSpacecommunication space. Wraps acudaEvent_thandle to track the completion of CUDA stream operations.-
using request_type = Experimental::NcclSpace::request_type¶
-
explicit Request()¶
Constructs an empty
Requestwith a null event handle.
-
~Request() noexcept¶
Destructor. Destroys the underlying
cudaEvent_tif one has been created.
-
auto capture_stream_state(cudaStream_t stream) noexcept -> void¶
Records a CUDA event on
streamto capture its current state for completion tracking. If acudaEvent_twas previously created on this request, it is destroyed first.- Parameters:
stream – The CUDA stream whose state to capture.
-
auto request() noexcept -> request_type&¶
-
auto request() const noexcept -> const request_type&¶
- Returns:
A reference to the underlying
cudaEvent_tobject.
-
auto request_ptr() noexcept -> request_type*¶
-
auto request_ptr() const noexcept -> const request_type*¶
- Returns:
A pointer to the underlying
cudaEvent_tobject.
Note
Both
wait_allandwait_anyuse active polling loops rather than blocking synchronization. While this increases CPU utilization, it avoids the overhead of spawning threads or completing requests sequentially.-
using request_type = Experimental::NcclSpace::request_type¶
Communication Primitives¶
Point-to-point¶
Send¶
Warning
This is not a blocking operation despite being named like MPI_Send.
-
template<KokkosView SendView, KokkosExecutionSpace ExecSpace = Kokkos::DefaultExecutionSpace, CommunicationSpace CommSpace = DefaultCommunicationSpace>
auto send(Communicator<CommSpace, ExecSpace> &h, SendView &sv, int dest) -> Request<CommSpace>¶ Initiates a non-blocking send operation.
- Template Parameters:
SendView – The type of the Kokkos view to send.
ExecSpace – The execution space to use. Defaults to
Kokkos::DefaultExecutionSpace.CommSpace – The communication backend to use. Defaults to
DefaultCommunicationSpace.
- Parameters:
h – A handle to the execution space and transport mechanism.
sv – The Kokkos view to send.
dest – The destination rank.
- Returns:
A request object of type
Request<CommSpace>representing the non-blocking send operation.
-
template<KokkosView SendView, KokkosExecutionSpace ExecSpace = Kokkos::DefaultExecutionSpace, CommunicationSpace CommSpace = DefaultCommunicationSpace>
auto send(SendView &sv, int dest) -> Request<CommSpace>¶ Initiates a non-blocking send operation using a default handle.
- Template Parameters:
SendView – The type of the Kokkos view to send.
ExecSpace – The execution space to use. Defaults to
Kokkos::DefaultExecutionSpace.CommSpace – The communication backend to use. Defaults to
DefaultCommunicationSpace.
- Parameters:
sv – The Kokkos view to send.
dest – The destination rank.
- Returns:
A request object of type
Request<CommSpace>representing the non-blocking send operation.
Example usage:
#include <Kokkos_Core.hpp>
#include <KokkosComm/KokkosComm.hpp>
// Create an execution space instance
auto exec = Kokkos::DefaultExecutionSpace();
// Create a communicator
auto comm = KokkosComm::Communicator<>::duplicate_from_raw(raw_comm_handle, exec).value();
// Create a Kokkos view
Kokkos::View<double*> data("send_data", 100);
// Fill the view with some data
Kokkos::parallel_for(
"fill_data", Kokkos::RangePolicy(exec, 0, 100), KOKKOS_LAMBDA(int i) { data(i) = static_cast<double>(i); }
);
exec.fence();
// Destination rank
int dst_rank = 1;
// Initiate a non-blocking send with a handle
auto req1 = KokkosComm::send(comm, data, dst_rank);
// Simulate a blocking send by waiting immediately
KokkosComm::send(comm, data, dst_rank).wait();
// Wait for a request to complete
KokkosComm::wait(req1);
Receive¶
Warning
This is not a blocking operation despite being named like MPI_Recv.
-
template<KokkosView RecvView, KokkosExecutionSpace ExecSpace = Kokkos::DefaultExecutionSpace, CommunicationSpace CommSpace = DefaultCommunicationSpace>
auto recv(Communicator<CommSpace, ExecSpace> &h, RecvView &sv, int dest) -> Request<CommSpace>¶ Initiates a non-blocking receive operation.
- Template Parameters:
RecvView – The type of the Kokkos view for receiving data.
ExecSpace – The execution space where the operation will be performed. Defaults to
Kokkos::DefaultExecutionSpace.CommSpace – The communication backend to use. Defaults to
DefaultCommunicationSpace.
- Parameters:
h – A handle to the execution space and transport mechanism.
rv – The Kokkos view where the received data will be stored.
src – The source rank from which to receive data.
- Returns:
A request object of type
Request<CommSpace>representing the non-blocking receive operation.
This function initiates a non-blocking receive operation using the specified execution space and transport mechanism. The data will be received into the provided view from the specified source rank and message tag. The function returns a request object that can be used to check the status of the receive operation or to wait for its completion.
-
template<KokkosView RecvView, KokkosExecutionSpace ExecSpace = Kokkos::DefaultExecutionSpace, CommunicationSpace CommSpace = DefaultCommunicationSpace>
auto recv(RecvView &sv, int dest) -> Request<CommSpace>¶ Initiates a non-blocking receive operation using a default handle.
- Template Parameters:
RecvView – The type of the Kokkos view for receiving data.
ExecSpace – The execution space where the operation will be performed. Defaults to Kokkos::DefaultExecutionSpace.
CommSpace – The communication backend to use. Defaults to
DefaultCommunicationSpace.
- Parameters:
rv – The Kokkos view where the received data will be stored.
src – The source rank from which to receive data.
- Returns:
A request object of type
Request<CommSpace>representing the non-blocking receive operation.
Example usage:
#include <Kokkos_Core.hpp>
#include <KokkosComm/KokkosComm.hpp>
// Create an execution space instance
auto exec = Kokkos::DefaultExecutionSpace();
// Create a communicator
auto comm = KokkosComm::Communicator<>::duplicate_from_raw(raw_comm_handle, exec).value();
// Allocate a view to receive the data
Kokkos::View<double*> data("recv_view", 100);
// Source rank
int src_rank = 1;
// Initiate a non-blocking receive with a handle
auto req1 = KokkosComm::recv(comm, data, src_rank);
// Simulate a blocking receive by waiting immediately
KokkosComm::recv(comm, data, src_rank).wait();
// Wait for a requests to complete
KokkosComm::wait(req1);
Collectives¶
Important
Collective operations act element-wise on the input Views. Multi-dimensional Views are treated as a logically flattened sequence of values, and the reduction is applied over that sequence. All participating Views must have identical extents; mismatched shapes result in undefined behavior.
The reduction operator must be associative, but ordering of partial combinations is not guaranteed, and the operation is not required to be commutative.
Utilities¶
Warning
Non-system data types (i.e. the data types not natively supported by the communication space) are not convertible. This notably includes user-defined types.
-
template<CommunicationSpace C, typename T>
auto datatype() -> C::datatype_type¶ Converts a type
Tto its communication spaceCequivalent representation.When
Cis:MpiSpace, returns the correspondingMPI_Datatypetype.NcclSpace, returns the correspondingncclDataType_ttype.
- Template Parameters:
C – The target communication space backend to use for data type conversion.
T – The C++-native data type to convert from.
- Returns:
The communication space representation of the C++-native data type.
-
template<CommunicationSpace C, KokkosView V>
auto datatype_for([[maybe_unused]] const V &view) -> C::datatype_type¶ - Template Parameters:
C – The target communication space backend to use for data type conversion.
V – A Kokkos View type.
- Parameters:
view – The Kokkos View to convert the value type from.
- Returns:
The communication space representation of the Kokkos View value type.
-
template<CommunicationSpace C, KokkosView V>
auto datatype_for([[maybe_unused]] C &&comm, [[maybe_unused]] const V &view) -> C::datatype_type¶ - Template Parameters:
C – The target communication space backend to use for data type conversion.
V – A Kokkos View type.
- Parameters:
comm – A communication space object, immediately consumed.
view – The Kokkos View to convert the value type from.
- Returns:
The communication space representation of the Kokkos View value type.