Core

Data Structures

Communicators

template<CommunicationSpace Comm = DefaultCommunicationSpace, KokkosExecutionSpace Exec = Kokkos::DefaultExecutionSpace>
class Communicator

Template class for communicator wrappers of different communication space types.

Communicators wrap a communication library-specific communicator (e.g. MPI_Comm) and a Kokkos execution space, tightly coupling the two.

Communicator objects are constructed via factory member functions. The parameterized constructor is private, and no default constructor is defined. They are move-only objects: copy construction and copy assignment are explicitly deleted. Use the duplicate member functions to create equivalent “copies” of communicators. There is always exactly one owner of a Communicator.

Template Parameters:
  • Co – The communication space (transport backend) to use. Defaults to DefaultCommunicationSpace.

  • Ex – The Kokkos execution space to use. Defaults to Kokkos::DefaultExecutionSpace.

using execution_space = Exec
using communication_space = Comm
using communicator_type = Comm::communicator_type
using size_type = Comm::size_type
using rank_type = Comm::rank_type

Common interfaces

Both specializations share the following interface:

~Communicator() noexcept

Destructor.

Communicator(const Communicator&) = delete

Copy constructor is deleted because a Communicator cannot be implicitly copied. Use duplicate instead.

auto operator=(const Communicator&) -> Communicator& = delete

Copy assignment operator is deleted because a Communicator cannot be implicitly copied. Use duplicate instead.

Communicator(Communicator&&) noexcept

Move-constructs a Communicator.

auto operator=(Communicator&&) noexcept -> Communicator&

Move-assigns a Communicator.

[[nodiscard]] constexpr auto size() const noexcept -> size_type
Returns:

The size (i.e., number of processes) in the communicator.

[[nodiscard]] constexpr auto rank() const noexcept -> rank_type
Returns:

The rank that identifies the calling process within the communicator.

[[nodiscard]] auto split(int color, int key) noexcept -> std::optional<Communicator<Comm, Exec>>

Splits a Communicator.

Given a color and a key, creates as many new communicators as distinct values of color are given, ordering processes according to the value of key. All processes with the same color join the same communicator.

Parameters:
  • color – A value controlling in which split communicator the calling process should be in.

  • key – A value ordering the calling process within the split communicator.

Returns:

A communicator if the calling process is part of one of the split communicators, std::nullopt if the color is a special value excluding the process at this rank or on error.

[[nodiscard]] auto duplicate() noexcept -> std::optional<Communicator<Comm, Exec>>

Duplicates a Communicator.

Returns:

A communicator on success, std::nullopt on error.

MPI specialization

template<KokkosExecutionSpace Exec>
class Communicator<MpiSpace, Exec>

Communicator specialization for the MpiSpace communication space. Wraps an MPI_Comm handle.

using execution_space = Exec
using communication_space = MpiSpace
using communicator_type = MPI_Comm
using size_type = int
using rank_type = int
[[nodiscard]] static auto from_raw(MPI_Comm comm, const Exec &exec = Exec{}) noexcept -> Communicator<MpiSpace, Exec>

Constructs a Communicator from a raw MPI_Comm handle and a Kokkos execution space instance. Defaults exec to Exec. The passed handle must be a valid handle and must not be an inter-communicator parent handle. The returned communicator does not own the underlying handle, and the user is responsible for destroying it.

Parameters:
  • comm – A valid communicator handle.

  • exec – A Kokkos execution space instance. Defaults to Kokkos::DefaultExecutionSpace.

Returns:

A communicator on success, std::nullopt if the passed handle was MPI_COMM_NULL.

[[nodiscard]] static auto split_from_raw(const MPI_Comm comm, int color, int key, const Exec &exec = Exec{}) noexcept -> std::optional<Communicator<MpiSpace, Exec>>

Splits from a raw MPI communicator and associates it to a Kokkos execution space instance. Defaults exec to Exec.

Creates as many new communicators as distinct values of color are given, and orders processes according to the value of key. All processes with the same value of color join the same communicator. A process that passes MPI_UNDEFINED as color will not join a new communicator.

Parameters:
  • comm – A valid communicator handle.

  • color – A value controlling in which split communicator the calling process should be in.

  • key – A value ordering the calling process within the split communicator.

  • exec – A Kokkos execution space instance. Defaults to Kokkos::DefaultExecutionSpace.

Returns:

A split communicator on success, std::nullopt if the passed color was MPI_UNDEFINED or on error.

[[nodiscard]] static auto duplicate_from_raw(const MPI_Comm comm, const Exec &exec = Exec{}) noexcept -> std::optional<Communicator<MpiSpace, Exec>>

Duplicates from a raw MPI communicator.

Parameters:
  • comm – A valid communicator handle.

  • exec – A Kokkos execution space instance. Defaults to Kokkos::DefaultExecutionSpace.

Returns:

A communicator on success, std::nullopt on error.

[[nodiscard]] auto comm() noexcept -> MPI_Comm&
[[nodiscard]] auto comm() const noexcept -> const MPI_Comm&
Returns:

A reference to the underlying MPI_Comm object.

[[nodiscard]] auto exec() const noexcept -> const execution_space&
Returns:

A const reference to the associated execution space instance.

NCCL specialization

template<>
class Communicator<Experimental::NcclSpace, Kokkos::Cuda>

Communicator specialization for the Experimental::NcclSpace communication space. Wraps an ncclComm_t handle.

using execution_space = Kokkos::Cuda
using communication_space = Experimental::NcclSpace
using communicator_type = ncclComm_t
using size_type = int
using rank_type = int
[[nodiscard]] static auto from_raw(ncclComm_t comm, const Kokkos::Cuda &exec = Kokkos::Cuda{}) noexcept -> Communicator<Experimental::NcclSpace, Kokkos::Cuda>

Constructs a Communicator from a raw ncclComm_t handle and a Kokkos CUDA execution space instance. Defaults exec to Kokkos::Cuda. The returned communicator does not own the underlying handle, and the user is responsible for destroying it.

Parameters:
  • comm – A valid communicator handle.

  • exec – A Kokkos CUDA execution space instance. Defaults to Kokkos::Cuda.

Returns:

A communicator on success, std::nullopt if the passed handle was nullptr.

[[nodiscard]] static auto split_from_raw(const ncclComm_t comm, int color, int key, const Kokkos::Cuda &exec = Kokkos::Cuda{}) noexcept -> std::optional<Communicator<Experimental::NcclSpace, Kokkos::Cuda>>

Splits from a raw NCCL communicator and associates it to a Kokkos CUDA, tion space instanc and MPI_COMM_NULL. Defaults exec to Kokkos::Cuda.

Creates as many new communicators as distinct values of color are given, and orders processes according to the value of key. All processes with the same value of color join the same communicator. A process that passes NCCL_SPLIT_NOCOLOR as color will not join a new communicator.

Parameters:
  • comm – A valid communicator handle.

  • color – A value controlling in which split communicator the calling process should be in.

  • key – A value ordering the calling process within the split communicator.

  • exec – A Kokkos CUDA execution space instance. Defaults to Kokkos::Cuda.

Returns:

A split communicator on success, std::nullopt if the passed color was NCCL_SPLIT_NOCOLOR or on error.

[[nodiscard]] static auto duplicate_from_raw(const ncclComm_t comm, const Kokkos::Cuda &exec = Kokkos::Cuda{}) noexcept -> std::optional<Communicator<Experimental::NcclSpace, Kokkos::Cuda>>

Duplicates from a raw NCCL communicator.

Parameters:
  • comm – A valid communicator handle.

  • exec – A Kokkos CUDA execution space instance. Defaults to Kokkos::Cuda.

Returns:

A communicator on success, std::nullopt on error.

[[nodiscard]] auto comm() noexcept -> ncclComm_t&
[[nodiscard]] auto comm() const noexcept -> const ncclComm_t&
Returns:

A reference to the underlying ncclComm_t object.

[[nodiscard]] auto exec() const noexcept -> const Kokkos::Cuda&
Returns:

A const reference to the associated Kokkos::Cuda execution space instance.

Requests

template<CommunicationSpace C = DefaultCommSpace>
class Request

Template class for request wrappers of different communication space types.

Request objects are move-only: copy construction and copy assignment are explicitly deleted. There is always exactly one owner of a Request and its associated callbacks. This design ensures it is impossible for the same callback to be executed more than once.

Template Parameters:

C – The communication backend to use. Defaults to DefaultCommunicationSpace.

using communication_space = C
using request_type = C::request_type
using rank_type = C::rank_type

Common interfaces

Both specializations share the, llowing interface and MPI_COMM_NULL:

Request(const Request&) = delete

Copy constructor is deleted because a Request can only be moved.

auto operator=(const Request&) -> Request& = delete

Copy assignment operator is deleted because a Request can only be moved.

Request(Request&&) = default

Move constructor is defaulted.

auto operator=(Request&&) -> Request& = default

Move assignment operator is defaulted.

template<KokkosView V>
auto extend_view_lifetime(const V &view) -> void

Captures a Kokkos View to extend its lifetime until the request completes. Has no effect on unmanaged Views.

Template Parameters:

V – A Kokkos View type.

Parameters:

view – The view whose lifetime should be extended.

auto add_callback(std::function<void()> &&cb) -> void

Registers a callback to be invoked after the request completes.

Parameters:

cb – The callback function to register.

auto wait() -> void

Blocks until the associated operation completes. Executes all registered callbacks upon completion.

auto test() -> bool

Non-blocking query for the completion of the associated operation. Executes all registered callbacks if the operation has completed.

Returns:

true if the request has completed, false otherwise.

auto wait(Request &request) -> void

Free function overload. Waits on request until the associated operation completes.

Parameters:

request – A reference to the request to wait on.

auto wait(Request &&request) -> void

Free function overload. Waits on an r-value request, consuming it upon completion.

Parameters:

request – An r-value reference to the request to wait on.

auto wait_all(std::span<Request> requests) -> void

Waits for the completion of all requests in requests.

Parameters:

requests – The list of requests to complete.

auto wait_any(std::span<Request> requests) -> std::optional<rank_type>

Waits for the completion of at least one request in requests.

Parameters:

requests – The list of requests to poll.

Returns:

The index of the completed request, or std::nullopt if requests is empty.

auto test(Request &request) -> bool

Free function overload. Queries request for completion of the associated operation.

Parameters:

request – A reference to the request to query.

Returns:

true if the request has completed, false otherwise.

MPI specialization

template<>
class Request<MpiSpace>

Request specialization for the MpiSpace communication space. Wraps an MPI_Request handle.

using communication_space = MpiSpace
using request_type = MpiSpace::request_type
using rank_type = MpiSpace::rank_type
explicit Request(request_type request = MPI_REQUEST_NULL)

Constructs a Request from an MPI_Request handle.

Parameters:

request – The MPI_Request to encapsulate. Defaults to MPI_REQUEST_NULL.

auto request() noexcept -> request_type&
auto request() const noexcept -> const request_type&
Returns:

A reference to the underlying MPI_Request object.

auto request_ptr() noexcept -> request_type*
auto request_ptr() const noexcept -> const request_type*
Returns:

A pointer to the underlying MPI_Request object.

Note

Both wait_all and wait_any copy the underlying MPI_Request objects into an intermediate container before calling MPI_Waitall and MPI_Waitany, respectively, which incurs an allocation overhead.

NCCL specialization

template<>
class Request<Experimental::NcclSpace>

Request specialization for the Experimental::NcclSpace communication space. Wraps a cudaEvent_t handle to track the completion of CUDA stream operations.

using communication_space = Experimental::NcclSpace
using request_type = Experimental::NcclSpace::request_type
using rank_type = Experimental::NcclSpace::rank_type
explicit Request()

Constructs an empty Request with a null event handle.

~Request() noexcept

Destructor. Destroys the underlying cudaEvent_t if one has been created.

auto capture_stream_state(cudaStream_t stream) noexcept -> void

Records a CUDA event on stream to capture its current state for completion tracking. If a cudaEvent_t was previously created on this request, it is destroyed first.

Parameters:

stream – The CUDA stream whose state to capture.

auto request() noexcept -> request_type&
auto request() const noexcept -> const request_type&
Returns:

A reference to the underlying cudaEvent_t object.

auto request_ptr() noexcept -> request_type*
auto request_ptr() const noexcept -> const request_type*
Returns:

A pointer to the underlying cudaEvent_t object.

Note

Both wait_all and wait_any use active polling loops rather than blocking synchronization. While this increases CPU utilization, it avoids the overhead of spawning threads or completing requests sequentially.

Communication Primitives

Point-to-point

Send

Warning

This is not a blocking operation despite being named like MPI_Send.

template<KokkosView SendView, KokkosExecutionSpace ExecSpace = Kokkos::DefaultExecutionSpace, CommunicationSpace CommSpace = DefaultCommunicationSpace>
auto send(Communicator<CommSpace, ExecSpace> &h, SendView &sv, int dest) -> Request<CommSpace>

Initiates a non-blocking send operation.

Template Parameters:
  • SendView – The type of the Kokkos view to send.

  • ExecSpace – The execution space to use. Defaults to Kokkos::DefaultExecutionSpace.

  • CommSpace – The communication backend to use. Defaults to DefaultCommunicationSpace.

Parameters:
  • h – A handle to the execution space and transport mechanism.

  • sv – The Kokkos view to send.

  • dest – The destination rank.

Returns:

A request object of type Request<CommSpace> representing the non-blocking send operation.

template<KokkosView SendView, KokkosExecutionSpace ExecSpace = Kokkos::DefaultExecutionSpace, CommunicationSpace CommSpace = DefaultCommunicationSpace>
auto send(SendView &sv, int dest) -> Request<CommSpace>

Initiates a non-blocking send operation using a default handle.

Template Parameters:
  • SendView – The type of the Kokkos view to send.

  • ExecSpace – The execution space to use. Defaults to Kokkos::DefaultExecutionSpace.

  • CommSpace – The communication backend to use. Defaults to DefaultCommunicationSpace.

Parameters:
  • sv – The Kokkos view to send.

  • dest – The destination rank.

Returns:

A request object of type Request<CommSpace> representing the non-blocking send operation.

Example usage:

#include <Kokkos_Core.hpp>
#include <KokkosComm/KokkosComm.hpp>

// Create an execution space instance
auto exec = Kokkos::DefaultExecutionSpace();
// Create a communicator
auto comm = KokkosComm::Communicator<>::duplicate_from_raw(raw_comm_handle, exec).value();

// Create a Kokkos view
Kokkos::View<double*> data("send_data", 100);

// Fill the view with some data
Kokkos::parallel_for(
    "fill_data", Kokkos::RangePolicy(exec, 0, 100), KOKKOS_LAMBDA(int i) { data(i) = static_cast<double>(i); }
);
exec.fence();

// Destination rank
int dst_rank = 1;

// Initiate a non-blocking send with a handle
auto req1 = KokkosComm::send(comm, data, dst_rank);

// Simulate a blocking send by waiting immediately
KokkosComm::send(comm, data, dst_rank).wait();

// Wait for a request to complete
KokkosComm::wait(req1);

Receive

Warning

This is not a blocking operation despite being named like MPI_Recv.

template<KokkosView RecvView, KokkosExecutionSpace ExecSpace = Kokkos::DefaultExecutionSpace, CommunicationSpace CommSpace = DefaultCommunicationSpace>
auto recv(Communicator<CommSpace, ExecSpace> &h, RecvView &sv, int dest) -> Request<CommSpace>

Initiates a non-blocking receive operation.

Template Parameters:
  • RecvView – The type of the Kokkos view for receiving data.

  • ExecSpace – The execution space where the operation will be performed. Defaults to Kokkos::DefaultExecutionSpace.

  • CommSpace – The communication backend to use. Defaults to DefaultCommunicationSpace.

Parameters:
  • h – A handle to the execution space and transport mechanism.

  • rv – The Kokkos view where the received data will be stored.

  • src – The source rank from which to receive data.

Returns:

A request object of type Request<CommSpace> representing the non-blocking receive operation.

This function initiates a non-blocking receive operation using the specified execution space and transport mechanism. The data will be received into the provided view from the specified source rank and message tag. The function returns a request object that can be used to check the status of the receive operation or to wait for its completion.

template<KokkosView RecvView, KokkosExecutionSpace ExecSpace = Kokkos::DefaultExecutionSpace, CommunicationSpace CommSpace = DefaultCommunicationSpace>
auto recv(RecvView &sv, int dest) -> Request<CommSpace>

Initiates a non-blocking receive operation using a default handle.

Template Parameters:
  • RecvView – The type of the Kokkos view for receiving data.

  • ExecSpace – The execution space where the operation will be performed. Defaults to Kokkos::DefaultExecutionSpace.

  • CommSpace – The communication backend to use. Defaults to DefaultCommunicationSpace.

Parameters:
  • rv – The Kokkos view where the received data will be stored.

  • src – The source rank from which to receive data.

Returns:

A request object of type Request<CommSpace> representing the non-blocking receive operation.

Example usage:

#include <Kokkos_Core.hpp>
#include <KokkosComm/KokkosComm.hpp>

// Create an execution space instance
auto exec = Kokkos::DefaultExecutionSpace();
// Create a communicator
auto comm = KokkosComm::Communicator<>::duplicate_from_raw(raw_comm_handle, exec).value();

// Allocate a view to receive the data
Kokkos::View<double*> data("recv_view", 100);

// Source rank
int src_rank = 1;

// Initiate a non-blocking receive with a handle
auto req1 = KokkosComm::recv(comm, data, src_rank);

// Simulate a blocking receive by waiting immediately
KokkosComm::recv(comm, data, src_rank).wait();

// Wait for a requests to complete
KokkosComm::wait(req1);

Collectives

Important

Collective operations act element-wise on the input Views. Multi-dimensional Views are treated as a logically flattened sequence of values, and the reduction is applied over that sequence. All participating Views must have identical extents; mismatched shapes result in undefined behavior.

The reduction operator must be associative, but ordering of partial combinations is not guaranteed, and the operation is not required to be commutative.

Utilities

Warning

Non-system data types (i.e. the data types not natively supported by the communication space) are not convertible. This notably includes user-defined types.

template<CommunicationSpace C, typename T>
auto datatype() -> C::datatype_type

Converts a type T to its communication space C equivalent representation.

When C is:

  • MpiSpace, returns the corresponding MPI_Datatype type.

  • NcclSpace, returns the corresponding ncclDataType_t type.

Template Parameters:
  • C – The target communication space backend to use for data type conversion.

  • T – The C++-native data type to convert from.

Returns:

The communication space representation of the C++-native data type.

template<CommunicationSpace C, KokkosView V>
auto datatype_for([[maybe_unused]] const V &view) -> C::datatype_type
Template Parameters:
  • C – The target communication space backend to use for data type conversion.

  • V – A Kokkos View type.

Parameters:

view – The Kokkos View to convert the value type from.

Returns:

The communication space representation of the Kokkos View value type.

template<CommunicationSpace C, KokkosView V>
auto datatype_for([[maybe_unused]] C &&comm, [[maybe_unused]] const V &view) -> C::datatype_type
Template Parameters:
  • C – The target communication space backend to use for data type conversion.

  • V – A Kokkos View type.

Parameters:
  • comm – A communication space object, immediately consumed.

  • view – The Kokkos View to convert the value type from.

Returns:

The communication space representation of the Kokkos View value type.