KokkosBatched::Nrm¶

Defined in header: KokkosBatched_Nrm.hpp

template <typename NrmType>
struct SerialNrm {
  template <typename XViewType, typename NormViewType>
  KOKKOS_INLINE_FUNCTION static int invoke(const XViewType &X, const NormViewType &norm);
};

template <typename MemberType, typename NrmType>
struct TeamNrm {
  template <typename XViewType, typename NormViewType>
  KOKKOS_INLINE_FUNCTION static int invoke(const MemberType &member, const XViewType &X, const NormViewType &norm);
};

template <typename MemberType, typename NrmType>
struct TeamVectorNrm {
  template <typename XViewType, typename NormViewType>
  KOKKOS_INLINE_FUNCTION static int invoke(const MemberType &member, const XViewType &X, const NormViewType &norm);
};

Computes the \(L1\), \(L2\) or \(L_\infty\) norm of a vector \(X\).

\[\begin{split}\begin{align} norm &= ||x|| \: \text{(if NrmType == KokkosBatched::Norm::L1)} \\ norm &= ||x||_2 \: \text{(if NrmType == KokkosBatched::Norm::L2 or NrmType == KokkosBatched::Norm::ScaledL2)} \\ norm &= ||x||_\infty \: \text{(if NrmType == KokkosBatched::Norm::LInf)} \end{align}\end{split}\]

If NrmType == KokkosBatched::Norm::L1, this operation is equivalent to the BLAS routine SASUM (SCASUM) or DASUM (DZASUM) for single or double precision for real (complex) vectors.
If NrmType == KokkosBatched::Norm::L2 or NrmType == KokkosBatched::Norm::ScaledL2, this operation is equivalent to the BLAS routine SNRM2 (SCNRM2) or DNRM2 (DZNRM2) for single or double precision for real (complex) vectors.
If NrmType == KokkosBatched::Norm::LInf, this operation is related to the BLAS routine ISAMAX (ICAMAX) or IDAMAX (IZAMAX) for single or double precision for real (complex) vectors, where the index of the maximum absolute value is returned in the output \(norm\). This routine returns the maximum absolute value instead.

Note

Though NrmType == KokkosBatched::Norm::L2 is more efficient, it may overflow for large vectors. For large vectors, NrmType == KokkosBatched::Norm::ScaledL2 is recommended as it uses a numerically stable algorithm to compute the \(L2\) norm that avoids overflow and underflow by scaling the input vector \(X\) by the maximum absolute value of its elements that has been encountered.

Parameters¶

X:: On input, \(X\) is a length \(n\) vector
norm:: On output, \(norm\) is the computed norm of the vector \(X\).

Type Requirements¶

MemberType must be a Kokkos team member handle (only for TeamNrm and TeamVectorNrm)
NrmType must be one of the following:
- KokkosBatched::Norm::L1 for \(L1\) norm
- KokkosBatched::Norm::L2 or KokkosBatched::Norm::ScaledL2 for \(L2\) norm
- KokkosBatched::Norm::LInf for \(L_\infty\) norm
XViewType must be a Kokkos View of rank 1 containing a vector or matrix \(X\)
NormViewType must be a Kokkos View of rank 0 containing the output \(norm\). The norm is accumulated in the type of the elements of NormViewType

Example¶

#include <iostream>
#include <Kokkos_Core.hpp>
#include <KokkosBatched_Nrm.hpp>

using ExecutionSpace = Kokkos::DefaultExecutionSpace;

/// \brief Example of batched nrm
/// computing nrm = ||x||_2 for a batch of vectors x.
///
/// Usage example:
///        x: [1, 3, 5]
///        nrm: sqrt(1^2 + 3^2 + 5^2) = sqrt(35)
///
int main(int /*argc*/, char** /*argv*/) {
  Kokkos::initialize();
  {
    using View1DType = Kokkos::View<double*, ExecutionSpace>;
    using View2DType = Kokkos::View<double**, ExecutionSpace>;
    const int Nb = 10, n = 3;

    // Vector x
    View2DType x("x", Nb, n);
    View1DType norm("norm", Nb);

    // Initialize x
    auto h_x = Kokkos::create_mirror_view(x);
    for (int ib = 0; ib < Nb; ib++) {
      h_x(ib, 0) = 1;
      h_x(ib, 1) = 3;
      h_x(ib, 2) = 5;
    }
    Kokkos::deep_copy(x, h_x);

    // Compute L2 norm of x
    ExecutionSpace exec;
    using policy_type = Kokkos::RangePolicy<ExecutionSpace, Kokkos::IndexType<int>>;
    policy_type policy{exec, 0, Nb};
    Kokkos::parallel_for(
        "nrm", policy, KOKKOS_LAMBDA(int ib) {
          auto sub_x    = Kokkos::subview(x, ib, Kokkos::ALL());
          auto sub_norm = Kokkos::subview(norm, ib);
          KokkosBatched::SerialNrm<KokkosBatched::Norm::L2>::invoke(sub_x, sub_norm);
        });

    // Confirm that the results are correct
    auto h_norm  = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, norm);
    bool correct = true;
    double eps   = 1.0e-12;
    for (int ib = 0; ib < Nb; ib++) {
      if (Kokkos::abs(h_norm(ib) - Kokkos::sqrt(35)) > eps) correct = false;
    }

    if (correct) {
      std::cout << "nrm works correctly!" << std::endl;
    }
  }
  Kokkos::finalize();
}

output:

nrm works correctly!