Policies

Policy is the second key concept (out of three) in PyKokkos (Concepts). Each pattern (Patterns) accepts a policy as an argument. A policy specifies the way computation is executed and provides:

Number of units of work to run by providing a set of unique ids
Space in which the execution should happen (e.g., Cuda)

PyKokkos currently supports the following policies:

RangePolicy
TeamPolicy

In this document, we will use the parallel_for pattern for illustrations in our examples, but similar reasoning applies to other patterns.

RangePolicy

This is the simplest policy, which specifies unique ids for units of work as a 1-D range of values. One can create an instance of RangePolicy using the following function from the library:

pk.RangePolicy([ExecutionSpace], begin_value, end_value)

ExecutionSpace is optional and discussed in detail in ExecutionSpace. If ExecutionSpace is not provided, the default space will be used.

We can use an range policy like so:

parallel_for("work", pk.RangePolicy(0, N), work, kwargs_for_work)

This is equivalent to a simple case we have seen on other pages of this documentation:

parallel_for("work", N, work, kwargs_for_work)

Below is a complete example:

import pykokkos as pk
import numpy as np

@pk.workunit
def work(wid, a):
    a[wid] += 1

def main():
    N = 10
    a = np.random.randint(100, size=(N))
    print(a)

    pk.parallel_for("work", pk.RangePolicy(0, N), work, a=a)
    # OR
    # pk.parallel_for("work", N, work, a=a)
    print(a)

main()

An example output is shown below:

[68 30 75 59  0 25 54 80 36 62]
[69 31 76 60  1 26 55 81 37 63]

Team Policy

TeamPolicy is used to implemente hierarchical parallelism. Threads are grouped into teams, and there can be an arbitrary many teams. Each team has a number of threads (the team size). All threads in a team are guaranteed to run concurrently. One can create an instance of TeamPolicy using the following function from the library:

pk.TeamPolicy([ExecutionSpace], league_size, team_size)

ExecutionSpace is optional and discussed in detail in ExecutionSpace. If ExecutionSpace is not provided, the default space will be used.

Below is an example of adding one to each element of an array, which uses TeamPolicy to organize the computation.

import numpy as np
import pykokkos as pk

@pk.workunit
def work(team_member, view):
    j: int = team_member.league_rank()
    k: int = team_member.team_size()

    def inner(i: int):
        view[j * k + i] = view[j * k + i] + 1

    pk.parallel_for(pk.TeamThreadRange(team_member, k), inner)

def main():
    pk.set_default_space(pk.OpenMP)
    a = np.zeros(100)
    pk.parallel_for("work", pk.TeamPolicy(50, 2), work, view=a)
    print(a)

main()

Note

Those familiar with Cuda might want to think about league_rank as block id, team_size as block size, and team_rank as a thread id.

Execution Space

ExecutionSpace specifies the place where units of work will be executed. The following are valid values:

pk.OpenMP - execution on the host using OpenMP
pk.Cuda - execution on a Cuda device
pk.HIP - execution on an AMD GPU

If the execution space is not provided in a policy (at the time of a pattern execution), then the default execution space will be used. The default execution space is set to pk.OpenMP when an application starts. The default execution space can be changed at any point use the pk.set_default_space() function.