Patterns

Pattern is the first key concept (out of three) in PyKokkos (Concepts). Patterns specify the structure of computation. There are three key patterns available in PyKokkos:

  • parallel_for, which is also known as a map operation in other frameworks/languages

  • parallel_reduce, which is also known as a fold operation in other frameworks/languages

  • parallel_scan, which implements a prefix scan

Parallel for

The most commonly used pattern is parallel_for. The pattern is available as a function in the pykokkos library and has the following signature:

parallel_for([label], policy, workunit, [keyword arguments])
  • label is an optional string value helpful for debugging and profiling

  • policy specifies the way computations are executed (execution place and number of work units to run in parallel). In its simplest form, policy is an integer value that specifies a range of values. More details about policies is provided in a separate page (Policies)

  • workunit is the name of the @pk.workunit function that performs one unit of work

  • arguments are keyword arguments passed to the workunit

Based on the policy, the parallel_for will execute a number of work units in parallel. Each work unit is executed independently and there are no guarantees about the execution order. At the same time, any number of work units might be running in parallel or they might be executed sequentially if the runtime determines that such an execution would be beneficial for the overall performance.

Below is an example to illustrate the parallel_for pattern.

import pykokkos as pk

@pk.workunit
def hello(i: int):
    pk.printf("Hello, World! from i = %d\n", i)

def main():
    pk.parallel_for(10, hello)

main()

In this example, the policy is simply an integer value (10) that specifies a range (0..9) of unique ids for work units to be spawned (one work unit for one id). Here is the output for the example:

Hello, World! from i = 0
Hello, World! from i = 8
Hello, World! from i = 4
Hello, World! from i = 1
Hello, World! from i = 9
Hello, World! from i = 5
Hello, World! from i = 6
Hello, World! from i = 2
Hello, World! from i = 3
Hello, World! from i = 7

Parallel reduce

The pattern parallel_reduce implements a reduction. This pattern is similar in many ways to parallel_for except that each work unit produces a value, and all the values are eventually accumulated into a single value (known as an accumulator). This pattern is available as a function in the pykokkos library and has the following signature:

parallel_reduce([label], policy, workunit, [keyword arguments])
  • label is an optional string value helpful for debugging and profiling

  • policy specifies the way computations are executed (execution place and number of workunits to run in parallel). In its simplest form, policy is an integer value that specifies a range of values. More details about policies is provided in a separate page (Policies)

  • workunit is the name of the @pk.workunit function that performs one unit of work

  • arguments are keyword arguments passed to the workunit

Based on the policy, parallel_reduce runs a number of work units. Each work unit receives two arguments in addition to the specified keyword arguments: (1) unique id of the work unit, and (2) an accumulator.

Below is an example to illustrate the parallel_reduce pattern:

import pykokkos as pk
import numpy as np

@pk.workunit
def work(wid, acc, a):
    acc += a[wid]

def main():
    N = 10
    a = np.random.randint(100, size=(N))
    print(a)
    total = pk.parallel_reduce("work", N, work, a=a)
    print(total)

main()

In the example, we run N (which is set to 10) work units to compute the sum of all elements in a numpy array (a). Note that the first two arguments to the workunit (wid which is a unique identifier of a work unit, and acc which is an accumulator) are provided at runtime by the framework.

Parallel scan

The pattern parallel_scan implements a prefix scan. This pattern is very much like parallel_reduce, but it also stores all intermediate results. The pattern is available as a function in the pykokkos library and has the following signature:

parallel_scan([label], policy, workunit, [keyword arguments])
  • label is an optional string value helpful for debugging and profiling

  • policy specifies the way computations are executed (execution place and number of workunits to run in parallel). In its simplest form, policy is an integer value that specifies a range of values. More details about policies is provided in a separate page (Policies)

  • workunit is the name of the @pk.workunit function that performs one unit of work

  • arguments are keyword arguments passed to the workunit

As before, based on the policy, parallel_scan runs a number of units of work. Each unit of work receives three arguments in addition to the given keyword arguments: (1) unique id of the unit of work, (2) an accumulator, and (3) a boolean flag to indicate if the scan for the current unit of work is complete.

Below is an example to illustrate the parallel_scan pattern:

import pykokkos as pk
import numpy as np

@pk.workunit
def work(wid, acc, final, a):
    acc += a[wid]
    if final:
        a[wid] = acc

def main():
    N = 10
    a = np.random.randint(100, size=(N))
    print(a)

    pk.parallel_scan("work", N, work, a=a)
    print(a)

main()

The output for the example above for a single run is:

[59 60 48 65 41 22 64 59 91 24]
[ 59 119 167 232 273 295 359 418 509 533]