Graphs ====== Usage ----- :cpp:`Kokkos::Graph` is an abstraction that describes asynchronous workloads organised as a direct acyclic graph (DAG). Once defined, the graph can be executed many times. :cpp:`Kokkos::Graph` is specialized for some backends: * :cpp:`Cuda` * :cpp:`HIP` * :cpp:`SYCL` On these backends, the :cpp:`Kokkos::Graph` specialisations map to the native graph API, namely, the CUDA Graph API, the HIP Graph API, and the SYCL (command) Graph API, respectively. For other backends, :cpp:`Kokkos::Graph` provides a defaulted implementation. Execution space instance versus graph ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Workloads submitted on :cpp:`Kokkos` execution space instances execute *eagerly*, *i.e.*, once the :cpp:`Kokkos::parallel_` function is called, the workload is immediately launched on the device. By contrast, the :cpp:`Kokkos::Graph` abstraction follows *lazy* execution, *i.e*, workloads added to a :cpp:`Kokkos::Graph` are **not** executed *until* the whole graph is ready and submitted. Always in 3 phases ~~~~~~~~~~~~~~~~~~ Typically, 3 phases are needed: 1. definition 2. instantiation 3. submission The *definition* phase consists in describing the workloads: what they do, as well as their dependencies. In other words, this phase creates a *topological* graph of workloads. The *instantiation* phase **locks** the topology, *i.e.*, it cannot be changed anymore. During this phase, the graph will be checked for flaws. The backend creates an *executable* graph. The last phase is *submission*. It will execute the workloads, observing their dependencies. This phase can be run multiple times. Advantages ~~~~~~~~~~ There are many advantages. Here are a few: * Since the workloads are described ahead of execution, the backend driver and/or compiler can leverage optimization opportunities. * Launch overhead is reduced, benefitting DAGs consisting of small workloads. Examples -------- Diamond DAG ~~~~~~~~~~~ Consider a diamond-like DAG. .. graphviz:: digraph diamond { A -> B; A -> C; B -> D; C -> D; } The following snippet defines, instantiates and submits a :cpp:`Kokkos::Graph` for this DAG. .. code-block:: c++ auto graph = Kokkos::create_graph([&](auto root) { auto node_A = root.then_parallel_for("workload A", ...policy..., ...functor...); auto node_B = node_A.then_parallel_for("workload B", ...policy..., ...functor...); auto node_C = node_A.then_parallel_for("workload C", ...policy..., ...functor...); auto node_D = Kokkos::when_all(node_B, node_C).then_parallel_for("workload D", ...policy..., ...functor...); }); graph.instantiate(); graph.submit();