16. Graphs#
16.1. Usage#
Kokkos::Graph
is an abstraction that describes
asynchronous workloads organised as a direct acyclic graph (DAG).
Once defined, the graph can be executed many times.
Kokkos::Graph
is specialized for some backends:
Cuda
HIP
SYCL
On these backends, the Kokkos::Graph
specialisations map to the native graph API, namely, the CUDA Graph API, the HIP Graph API, and the SYCL (command) Graph API, respectively.
For other backends, Kokkos::Graph
provides a defaulted implementation.
16.1.1. Execution space instance versus graph#
Workloads submitted on Kokkos
execution space instances execute eagerly, i.e.,
once the Kokkos::parallel_
function is called, the workload is immediately launched on the device.
By contrast, the Kokkos::Graph
abstraction follows lazy execution,
i.e, workloads added to a Kokkos::Graph
are not executed until
the whole graph is ready and submitted.
16.1.2. Always in 3 phases#
Typically, 3 phases are needed:
definition
instantiation
submission
The definition phase consists in describing the workloads: what they do, as well as their dependencies. In other words, this phase creates a topological graph of workloads.
The instantiation phase locks the topology, i.e., it cannot be changed anymore. During this phase, the graph will be checked for flaws. The backend creates an executable graph.
The last phase is submission. It will execute the workloads, observing their dependencies. This phase can be run multiple times.
16.1.3. Advantages#
There are many advantages. Here are a few:
Since the workloads are described ahead of execution, the backend driver and/or compiler can leverage optimization opportunities.
Launch overhead is reduced, benefitting DAGs consisting of small workloads.
16.2. Examples#
16.2.1. Diamond DAG#
Consider a diamond-like DAG.

The following snippet defines, instantiates and submits a Kokkos::Graph
for this DAG.
auto graph = Kokkos::create_graph([&](auto root) {
auto node_A = root.then_parallel_for("workload A", ...policy..., ...functor...);
auto node_B = node_A.then_parallel_for("workload B", ...policy..., ...functor...);
auto node_C = node_A.then_parallel_for("workload C", ...policy..., ...functor...);
auto node_D = Kokkos::when_all(node_B, node_C).then_parallel_for("workload D", ...policy..., ...functor...);
});
graph.instantiate();
graph.submit();