Low-level NCCL interfaces

NCCL API Support

NCCL routines

KokkosComm::Experimental::nccl:: namespace

Kokkos::View support

ncclSend

send

ncclRecv

recv

ncclAllGather

allgather

ncclAllReduce

allreduce

ncclReduce

reduce

ncclBroadcast

broadcast

ncclAllToAll

alltoall

Point-to-point

template<KokkosExecutionSpace ExecSpace, KokkosView SendView>
auto send(const ExecSpace &space, const SendView &sv, int peer, ncclComm_t comm) -> Request<NcclSpace>

Initiates a non-blocking send operation on the given CUDA stream.

Template Parameters:
  • ExecSpace – The execution space (e.g. Kokkos::Cuda).

  • SendView – The type of the view to be sent.

Parameters:
  • space – The execution space instance.

  • sv – The view to be sent.

  • peer – The destination rank.

  • comm – The NCCL communicator.

Returns:

A request object representing the asynchronous send operation.

template<KokkosExecutionSpace ExecSpace, KokkosView RecvView>
auto recv(const ExecSpace &space, RecvView &rv, int peer, ncclComm_t comm) -> Request<NcclSpace>

Initiates a non-blocking receive operation on the given CUDA stream.

Template Parameters:
  • ExecSpace – The execution space (e.g. Kokkos::Cuda).

  • RecvView – The type of the view to be received.

Parameters:
  • space – The execution space instance.

  • rv – The view to be received.

  • peer – The source rank.

  • comm – The NCCL communicator.

Returns:

A request object representing the asynchronous receive operation.

Collectives

template<KokkosExecutionSpace ExecSpace, KokkosView SendView, KokkosView RecvView>
auto allgather(const ExecSpace &space, const SendView &sv, const RecvView &rv, ncclComm_t comm) -> Request<NcclSpace>

Performs an all-gather operation, gathering data from all processes and distributing it to all processes.

Template Parameters:
  • ExecSpace – The execution space (e.g. Kokkos::Cuda).

  • SendView – The type of the view to be sent.

  • RecvView – The type of the view to be received.

Parameters:
  • space – The execution space instance.

  • sv – The view to be sent.

  • rv – The view to be received.

  • comm – The NCCL communicator.

Returns:

A request object representing the asynchronous all-gather operation.

template<KokkosExecutionSpace ExecSpace, KokkosView SendView, KokkosView RecvView>
auto allreduce(const ExecSpace &space, const SendView &sv, const RecvView &rv, ncclRedOp_t op, ncclComm_t comm) -> Request<NcclSpace>

Performs an all-reduce operation, combining data from all processes and distributing the result to all processes.

Template Parameters:
  • ExecSpace – The execution space (e.g. Kokkos::Cuda).

  • SendView – The type of the view to be sent.

  • RecvView – The type of the view to be received.

Parameters:
  • space – The execution space instance.

  • sv – The view to be sent.

  • rv – The view to be received.

  • op – The NCCL reduction operation to be applied.

  • comm – The NCCL communicator.

Returns:

A request object representing the asynchronous all-reduce operation.

template<KokkosExecutionSpace ExecSpace, KokkosView SendView, KokkosView RecvView>
auto reduce(const ExecSpace &space, const SendView &sv, RecvView &rv, ncclRedOp_t op, int root, int rank, ncclComm_t comm) -> Request<NcclSpace>

Performs a reduction operation, combining data from all processes and placing the result on the root process.

Template Parameters:
  • ExecSpace – The execution space (e.g. Kokkos::Cuda).

  • SendView – The type of the view to be sent.

  • RecvView – The type of the view to be received.

Parameters:
  • space – The execution space instance.

  • sv – The view to be sent.

  • rv – The view to be received (used on the root process).

  • op – The NCCL reduction operation to be applied.

  • root – The rank of the root process.

  • rank – The rank of the calling process.

  • comm – The NCCL communicator.

Returns:

A request object representing the asynchronous reduce operation.

template<KokkosView View>
auto broadcast(const Kokkos::Cuda &space, View &v, int root, ncclComm_t comm) -> Request<NcclSpace>

Broadcasts data from the root process to all other processes in the communicator.

Template Parameters:

View – The type of the view to be broadcast.

Parameters:
  • space – The Kokkos::Cuda execution space instance.

  • v – The view to be broadcast (in-place on all ranks).

  • root – The rank of the root process.

  • comm – The NCCL communicator.

Returns:

A request object representing the asynchronous broadcast operation.

template<KokkosExecutionSpace ExecSpace, KokkosView SendView, KokkosView RecvView>
auto alltoall(const ExecSpace &space, const SendView &sv, const RecvView &rv, int count, ncclComm_t comm) -> Request<NcclSpace>

Performs an all-to-all exchange where each process sends count elements to every other process.

Template Parameters:
  • ExecSpace – The execution space (e.g. Kokkos::Cuda).

  • SendView – The type of the view to be sent.

  • RecvView – The type of the view to be received.

Parameters:
  • space – The execution space instance.

  • sv – The view to be sent.

  • rv – The view to be received.

  • count – The number of elements sent to each process.

  • comm – The NCCL communicator.

Returns:

A request object representing the asynchronous all-to-all operation.