KokkosBatched::Getrf

Defined in header: KokkosBatched_Getrf.hpp

template <typename ArgAlgo>
struct SerialGetrf {
  template <typename AViewType, typename PivViewType>
  KOKKOS_INLINE_FUNCTION
  static int
  invoke(const AViewType &A, const PivViewType &piv);
};

Computes a LU factorization of a general m-by-n matrix \(A\) using partial pivoting with row interchanges. The factorization has the format \(A = P \cdot L \cdot U\)

where

  • \(P\) is a permutation matrix

  • \(L\) is a lower triangular matrix with unit diagonal elements (lower trapezoidal if m > n)

  • \(U\) is an upper triangular matrix (upper trapezoidal if m < n)

This operation is equivalent to the LAPACK routine SGETRF (CGETRF) or DGETRF (ZGETRF) for single or double precision for real (complex) matrix. This is the recusive version of the algorithm. It divides the matrix \(A\) into four submatrices:

\[\begin{split}A = \begin{bmatrix} A_{00} & A_{01} \\ A_{10} & A_{11} \end{bmatrix}\end{split}\]

where \(A_{00}\) is a square matrix of size n0, \(A_{11}\) is a matrix of size n1 by n1 with n0 = min(m, n) / 2 and n1 = n - n0. This function calls itself to factorize

\[\begin{split}A_{0} = \begin{bmatrix} A_{00} \\ A_{10} \end{bmatrix}\end{split}\]

do the swaps on

\[\begin{split}A_{1} = \begin{bmatrix} A_{01} \\ A_{11} \end{bmatrix}\end{split}\]

solve \(A_{01}\), update \(A_{11}\), then calls itself to factorize \(A_{11}\) and do the swaps on \(A_{10}\).

Note

On GPUs, the maximum matrix size of \(A\) is limited to 4096 by 4096 as we rely on static stack to achive the recursion.

Parameters

A:

On input, \(A\) is a m by n general matrix. On output, the factors \(L\) and \(U\) from the factorization \(A = P \cdot L \cdot U\). The unit diagonal elements of \(L\) are not stored.

piv:

On output, the pivot indices.

Type Requirements

  • ArgAlgo must be KokkosBatched::Algo::Getrf::Unblocked for the unblocked algorithm

  • AViewType must be a Kokkos View of rank 2 containing the general matrix \(A\)

  • PivViewType must be a Kokkos View of rank 1 containing the pivot indices

Example

//@HEADER
// ************************************************************************
//
//                        Kokkos v. 4.0
//       Copyright (2022) National Technology & Engineering
//               Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Part of Kokkos, under the Apache License v2.0 with LLVM Exceptions.
// See https://kokkos.org/LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//@HEADER

#include <Kokkos_Core.hpp>
#include <Kokkos_Random.hpp>
#include <KokkosBatched_Getrf.hpp>
#include <KokkosBatched_Getrs.hpp>

using ExecutionSpace = Kokkos::DefaultExecutionSpace;

/// \brief Example of batched getrf/getrs
/// Solving A * x = b, where
///   A: [[1, 2, 3],
///       [2, -4, 6],
///       [3, -9, -3]]
///   b: [1, 1, 1]
///   x: [23/32, 1/8, 1/96]
///   A is a real general matrix
///
/// This corresponds to the following system of equations:
///        1 x0 + 2 x1 + 3 x2 = 1
///        2 x0 - 4 x1 + 6 x2 = 1
///        3 x0 - 9 x1 - 3 x2 = 1
///
int main(int /*argc*/, char** /*argv*/) {
  Kokkos::initialize();
  {
    using View2DType  = Kokkos::View<double**, ExecutionSpace>;
    using View3DType  = Kokkos::View<double***, ExecutionSpace>;
    using PivViewType = Kokkos::View<int**, ExecutionSpace>;
    const int Nb = 10, n = 3;

    // Matrix A
    View3DType A("A", Nb, n, n);

    // Solution
    View2DType x("x", Nb, n);

    // Pivot
    PivViewType ipiv("ipiv", Nb, n);

    // Initialize A and x
    Kokkos::deep_copy(x, 1.0);
    auto h_A = Kokkos::create_mirror_view(A);

    // Upper triangular matrix
    for (int ib = 0; ib < Nb; ib++) {
      // Fill the matrix
      h_A(ib, 0, 0) = 1.0;
      h_A(ib, 0, 1) = 2.0;
      h_A(ib, 0, 2) = 3.0;
      h_A(ib, 1, 0) = 2.0;
      h_A(ib, 1, 1) = -4.0;
      h_A(ib, 1, 2) = 6.0;
      h_A(ib, 2, 0) = 3.0;
      h_A(ib, 2, 1) = -9.0;
      h_A(ib, 2, 2) = -3.0;
    }
    Kokkos::deep_copy(A, h_A);

    // solve A * x = b with getrf and getrs
    ExecutionSpace exec;
    using policy_type = Kokkos::RangePolicy<ExecutionSpace, Kokkos::IndexType<int>>;
    policy_type policy{exec, 0, Nb};
    Kokkos::parallel_for(
        "getrf-getrs", policy, KOKKOS_LAMBDA(int ib) {
          auto sub_A    = Kokkos::subview(A, ib, Kokkos::ALL, Kokkos::ALL);
          auto sub_ipiv = Kokkos::subview(ipiv, ib, Kokkos::ALL);
          auto sub_x    = Kokkos::subview(x, ib, Kokkos::ALL);

          // PLU Factorize A by getrf
          KokkosBatched::SerialGetrf<KokkosBatched::Algo::Getrf::Unblocked>::invoke(sub_A, sub_ipiv);

          // Solve A * x = b with getrs
          KokkosBatched::SerialGetrs<KokkosBatched::Trans::NoTranspose, KokkosBatched::Algo::Getrs::Unblocked>::invoke(
              sub_A, sub_ipiv, sub_x);
        });

    // Confirm that the results are correct
    auto h_x     = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, x);
    bool correct = true;
    double eps   = 1.0e-12;
    for (int ib = 0; ib < Nb; ib++) {
      if (Kokkos::abs(h_x(ib, 0) - 23.0 / 32.0) > eps) correct = false;
      if (Kokkos::abs(h_x(ib, 1) - 1.0 / 8.0) > eps) correct = false;
      if (Kokkos::abs(h_x(ib, 2) - 1.0 / 96.0) > eps) correct = false;
    }

    if (correct) {
      std::cout << "getrf/getrs works correctly!" << std::endl;
    }
  }
  Kokkos::finalize();
}

output:

getrf/getrs works correctly!