16. Graphs#

16.1. Usage#

Kokkos::Graph is an abstraction that describes asynchronous workloads organised as a direct acyclic graph (DAG).

Once defined, the graph can be executed many times.

Kokkos::Graph is specialized for some backends:

  • Cuda

  • HIP

  • SYCL

On these backends, the Kokkos::Graph specialisations map to the native graph API, namely, the CUDA Graph API, the HIP Graph API, and the SYCL (command) Graph API, respectively.

For other backends, Kokkos::Graph provides a defaulted implementation.

16.1.1. Execution space instance versus graph#

Workloads submitted on Kokkos execution space instances execute eagerly, i.e., once the Kokkos::parallel_ function is called, the workload is immediately launched on the device.

By contrast, the Kokkos::Graph abstraction follows lazy execution, i.e, workloads added to a Kokkos::Graph are not executed until the whole graph is ready and submitted.

16.1.2. Always in 3 phases#

Typically, 3 phases are needed:

  1. definition

  2. instantiation

  3. submission

The definition phase consists in describing the workloads: what they do, as well as their dependencies. In other words, this phase creates a topological graph of workloads.

The instantiation phase locks the topology, i.e., it cannot be changed anymore. During this phase, the graph will be checked for flaws. The backend creates an executable graph.

The last phase is submission. It will execute the workloads, observing their dependencies. This phase can be run multiple times.

16.1.3. Advantages#

There are many advantages. Here are a few:

  • Since the workloads are described ahead of execution, the backend driver and/or compiler can leverage optimization opportunities.

  • Launch overhead is reduced, benefitting DAGs consisting of small workloads.

16.2. Examples#

16.2.1. Diamond DAG#

Consider a diamond-like DAG.

digraph diamond {
    A -> B;
    A -> C;
    B -> D;
    C -> D;
}

The following snippet defines, instantiates and submits a Kokkos::Graph for this DAG.

auto graph = Kokkos::create_graph([&](auto root) {
    auto node_A = root.then_parallel_for("workload A", ...policy..., ...functor...);

    auto node_B = node_A.then_parallel_for("workload B", ...policy..., ...functor...);
    auto node_C = node_A.then_parallel_for("workload C", ...policy..., ...functor...);

    auto node_D = Kokkos::when_all(node_B, node_C).then_parallel_for("workload D", ...policy..., ...functor...);
});

graph.instantiate();

graph.submit();