ThreadVectorMDRange
#
Header File: <Kokkos_Core.hpp>
Description#
ThreadVectorMDRange is a nested execution policy used inside of hierarchical parallelism.
Interface#
-
template<class Rank, typename TeamHandle>
class ThreadVectorMDRange# Constructor
-
ThreadVectorMDRange(team, extent_1, extent_2, ...);#
Splits the index range
0
toextent
over the vector lanes of the calling thread, whereextent
is the backend-dependent rank that will be vectorized- Parameters:
team – TeamHandle to the calling team execution context
extent_1, extent_2, ... – index range lengths of each rank
Requirements
TeamHandle
is a type that models TeamHandleextent_1, extent_2, ...
are intsextent_i
is such thati >= 2 && i <= 8
is true. For example:ThreadVectorMDRange(team, 4); // NOT OK, violates i>=2 ThreadVectorMDRange(team, 4,5); // OK ThreadVectorMDRange(team, 4,5,6); // OK ThreadVectorMDRange(team, 4,5,6,2,3,4,5,6); // OK, max num of extents allowed
The constructor can not be called inside a parallel operation dispatched using a
TeamVectorRange
policy,TeamVectorRange
policy,TeamVectorMDRange
policy orThreadVectorMDRange
policy.
-
ThreadVectorMDRange(team, extent_1, extent_2, ...);#
Examples#
using TeamHandle = TeamPolicy<>::member_type;
parallel_for(TeamPolicy<>(N, Kokkos::AUTO),
KOKKOS_LAMBDA(TeamHandle const& team) {
int leagueRank = team.league_rank();
auto teamThreadRange = TeamThreadRange(team, n0);
auto threadVectorMDRange =
ThreadVectorMDRange<Rank<3>, TeamHandle>(
team, n1, n2, n3);
parallel_for(teamThreadRange, [=](int i0) {
parallel_for(threadVectorMDRange, [=](int i1, int i2, int i3) {
A(leagueRank, i0, i1, i2, i3) += B(leagueRank, i1) + C(i1, i2, i3);
});
});
team.team_barrier();
int teamSum = 0;
parallel_for(teamThreadRange, [=, &teamSum](int const& i0) {
int threadSum = 0;
parallel_reduce(threadVectorMDRange,
[=](int i1, int i2, int i3, int& vectorSum) {
vectorSum += D(leagueRank, i0, i1, i2, i3);
}, threadSum
);
teamSum += threadSum;
});
});