Low-level NCCL interfaces¶
NCCL routines |
|
|
|---|---|---|
|
|
✓ |
|
|
✓ |
|
|
✓ |
|
|
✓ |
|
|
✓ |
|
|
✓ |
|
|
✓ |
Point-to-point¶
-
template<KokkosExecutionSpace ExecSpace, KokkosView SendView>
auto send(const ExecSpace &space, const SendView &sv, int peer, ncclComm_t comm) -> Request<NcclSpace>¶ Initiates a non-blocking send operation on the given CUDA stream.
- Template Parameters:
ExecSpace – The execution space (e.g.
Kokkos::Cuda).SendView – The type of the view to be sent.
- Parameters:
space – The execution space instance.
sv – The view to be sent.
peer – The destination rank.
comm – The NCCL communicator.
- Returns:
A request object representing the asynchronous send operation.
-
template<KokkosExecutionSpace ExecSpace, KokkosView RecvView>
auto recv(const ExecSpace &space, RecvView &rv, int peer, ncclComm_t comm) -> Request<NcclSpace>¶ Initiates a non-blocking receive operation on the given CUDA stream.
- Template Parameters:
ExecSpace – The execution space (e.g.
Kokkos::Cuda).RecvView – The type of the view to be received.
- Parameters:
space – The execution space instance.
rv – The view to be received.
peer – The source rank.
comm – The NCCL communicator.
- Returns:
A request object representing the asynchronous receive operation.
Collectives¶
-
template<KokkosExecutionSpace ExecSpace, KokkosView SendView, KokkosView RecvView>
auto allgather(const ExecSpace &space, const SendView &sv, const RecvView &rv, ncclComm_t comm) -> Request<NcclSpace>¶ Performs an all-gather operation, gathering data from all processes and distributing it to all processes.
- Template Parameters:
ExecSpace – The execution space (e.g.
Kokkos::Cuda).SendView – The type of the view to be sent.
RecvView – The type of the view to be received.
- Parameters:
space – The execution space instance.
sv – The view to be sent.
rv – The view to be received.
comm – The NCCL communicator.
- Returns:
A request object representing the asynchronous all-gather operation.
-
template<KokkosExecutionSpace ExecSpace, KokkosView SendView, KokkosView RecvView>
auto allreduce(const ExecSpace &space, const SendView &sv, const RecvView &rv, ncclRedOp_t op, ncclComm_t comm) -> Request<NcclSpace>¶ Performs an all-reduce operation, combining data from all processes and distributing the result to all processes.
- Template Parameters:
ExecSpace – The execution space (e.g.
Kokkos::Cuda).SendView – The type of the view to be sent.
RecvView – The type of the view to be received.
- Parameters:
space – The execution space instance.
sv – The view to be sent.
rv – The view to be received.
op – The NCCL reduction operation to be applied.
comm – The NCCL communicator.
- Returns:
A request object representing the asynchronous all-reduce operation.
-
template<KokkosExecutionSpace ExecSpace, KokkosView SendView, KokkosView RecvView>
auto reduce(const ExecSpace &space, const SendView &sv, RecvView &rv, ncclRedOp_t op, int root, int rank, ncclComm_t comm) -> Request<NcclSpace>¶ Performs a reduction operation, combining data from all processes and placing the result on the root process.
- Template Parameters:
ExecSpace – The execution space (e.g.
Kokkos::Cuda).SendView – The type of the view to be sent.
RecvView – The type of the view to be received.
- Parameters:
space – The execution space instance.
sv – The view to be sent.
rv – The view to be received (used on the root process).
op – The NCCL reduction operation to be applied.
root – The rank of the root process.
rank – The rank of the calling process.
comm – The NCCL communicator.
- Returns:
A request object representing the asynchronous reduce operation.
-
template<KokkosView View>
auto broadcast(const Kokkos::Cuda &space, View &v, int root, ncclComm_t comm) -> Request<NcclSpace>¶ Broadcasts data from the root process to all other processes in the communicator.
- Template Parameters:
View – The type of the view to be broadcast.
- Parameters:
space – The
Kokkos::Cudaexecution space instance.v – The view to be broadcast (in-place on all ranks).
root – The rank of the root process.
comm – The NCCL communicator.
- Returns:
A request object representing the asynchronous broadcast operation.
-
template<KokkosExecutionSpace ExecSpace, KokkosView SendView, KokkosView RecvView>
auto alltoall(const ExecSpace &space, const SendView &sv, const RecvView &rv, int count, ncclComm_t comm) -> Request<NcclSpace>¶ Performs an all-to-all exchange where each process sends
countelements to every other process.- Template Parameters:
ExecSpace – The execution space (e.g.
Kokkos::Cuda).SendView – The type of the view to be sent.
RecvView – The type of the view to be received.
- Parameters:
space – The execution space instance.
sv – The view to be sent.
rv – The view to be received.
count – The number of elements sent to each process.
comm – The NCCL communicator.
- Returns:
A request object representing the asynchronous all-to-all operation.