KokkosBatched::Dot¶

Defined in header: KokkosBatched_Dot.hpp

template <typename ArgTrans, int Axis>
struct SerialDot {
  template <typename XViewType, typename YViewType, typename NormViewType>
  KOKKOS_INLINE_FUNCTION static int invoke(const XViewType &X, const YViewType &Y, const NormViewType &dot);
};

template <typename MemberType, typename ArgTrans, int Axis>
struct TeamDot {
  template <typename XViewType, typename YViewType, typename NormViewType>
  KOKKOS_INLINE_FUNCTION static int invoke(const MemberType &member, const XViewType &X, const YViewType &Y, const NormViewType &dot);
};

template <typename MemberType, typename ArgTrans, int Axis>
struct TeamVectorDot {
  template <typename XViewType, typename YViewType, typename NormViewType>
  KOKKOS_INLINE_FUNCTION static int invoke(const MemberType &member, const XViewType &X, const YViewType &Y, const NormViewType &dot);
};

Performs the dot product of two vectors \(X\) and \(Y\).

\[\begin{split}\begin{align} dot &= X^T * Y \: \text{(if ArgTrans == KokkosBatched::Trans::Transpose)} \\ dot &= X^H * Y \: \text{(if ArgTrans == KokkosBatched::Trans::ConjTranspose)} \end{align}\end{split}\]

If ArgTrans == KokkosBatched::Trans::Transpose, this operation is equivalent to the BLAS routine SDOT (CDOTU) or DDOT (ZDOTU) for single or double precision for real (complex) vectors.
If ArgTrans == KokkosBatched::Trans::ConjTranspose, this operation is equivalent to the BLAS routine CDOTC or ZDOTC for single or double precision for complex vectors.

Note

This kernel does not support the BLAS routine SDSDOT which returns the single precision dot product of two single precision vectors with dot product accumulated in double precision. For DSDOT, provide \(X\) and \(Y\) in single precision and provide the output \(dot\) in double precision.

Parameters¶

X:: On input, \(X\) is a length \(n\) vector or a \(m\) by \(n\) matrix.
Y:: On input, \(Y\) is a length \(n\) vector or a \(m\) by \(n\) matrix.
dot:: On output, \(dot\) is the computed dot product if \(X\) and \(Y\) are vectors, or the computed dot products along the specified axis if \(X\) and \(Y\) are matrices.

Type Requirements¶

MemberType must be a Kokkos team member handle (only for TeamDot and TeamVectorDot)
ArgTrans must be one of the following:
- KokkosBatched::Trans::Transpose for \(dot = X^T * Y\)
- KokkosBatched::Trans::ConjTranspose for \(dot = X^H * Y\)
Axis must be one of the following:
- 0 to perform the operation along the first dimension (columns) when \(X\) and \(Y\) are matrices
- 1 to perform the operation along the second dimension (rows) when \(X\) and \(Y\) are matrices
XViewType must be a Kokkos View of rank 1 or 2 containing a vector or matrix \(X\)
YViewType must be a Kokkos View of rank 1 or 2 containing a vector or matrix \(Y\)
NormViewType must be a Kokkos View of rank 0 or 1 containing the output \(dot\). dot product is accumulated is accumulated in the type of the elements of NormViewType

Note

This kernel supports both vector and matrix operations. When the input views \(X\) and \(Y\) are of rank 1, the kernel performs a vector operation (BLAS dot). Axis must be set to 0 for this case. When the input views \(X\) and \(Y\) are of rank 2, the kernel performs a vector operation along the specified axis (0 or 1), where each column or row is treated as a separate vector. The template argument Axis to specify the axis to perform the operation is required from 5.2.0.

Example¶

#include <iostream>
#include <Kokkos_Core.hpp>
#include <KokkosBatched_Dot.hpp>

using ExecutionSpace = Kokkos::DefaultExecutionSpace;

/// \brief Example of batched dot product
/// computing dot = x^T * y for a batch of vectors x and y.
///
/// Usage example:
///        x: [1, 2, 3]
///        y: [4, 5, 6]
///        dot: 32
///
int main(int /*argc*/, char** /*argv*/) {
  Kokkos::initialize();
  {
    using View1DType = Kokkos::View<double*, ExecutionSpace>;
    using View2DType = Kokkos::View<double**, ExecutionSpace>;
    const int Nb = 10, n = 3;

    // Vector x and y
    View2DType x("x", Nb, n), y("y", Nb, n);
    View1DType dot("dot", Nb);

    // Initialize x and y
    auto h_x = Kokkos::create_mirror_view(x);
    auto h_y = Kokkos::create_mirror_view(y);
    for (int ib = 0; ib < Nb; ib++) {
      for (int j = 0; j < n; j++) {
        h_x(ib, j) = j + 1;  // x: [1, 2, 3]
        h_y(ib, j) = j + 4;  // y: [4, 5, 6]
      }
    }
    Kokkos::deep_copy(x, h_x);
    Kokkos::deep_copy(y, h_y);

    // Compute dot = x^T * y
    ExecutionSpace exec;
    using policy_type = Kokkos::RangePolicy<ExecutionSpace, Kokkos::IndexType<int>>;
    policy_type policy{exec, 0, Nb};
    Kokkos::parallel_for(
        "dot", policy, KOKKOS_LAMBDA(int ib) {
          auto sub_x   = Kokkos::subview(x, ib, Kokkos::ALL());
          auto sub_y   = Kokkos::subview(y, ib, Kokkos::ALL());
          auto sub_dot = Kokkos::subview(dot, ib);
          KokkosBatched::SerialDot<KokkosBatched::Trans::Transpose, 0>::invoke(sub_x, sub_y, sub_dot);
        });

    // Confirm that the results are correct
    auto h_dot   = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, dot);
    bool correct = true;
    double eps   = 1.0e-12;
    for (int ib = 0; ib < Nb; ib++) {
      if (Kokkos::abs(h_dot(ib) - 32) > eps) correct = false;
    }

    if (correct) {
      std::cout << "dot works correctly!" << std::endl;
    }
  }
  Kokkos::finalize();
}

output:

dot works correctly!