KokkosBatched::Getrf¶
Defined in header: KokkosBatched_Getrf.hpp
template <typename ArgAlgo>
struct SerialGetrf {
template <typename AViewType, typename PivViewType>
KOKKOS_INLINE_FUNCTION
static int
invoke(const AViewType &A, const PivViewType &piv);
};
Computes a LU factorization of a general m-by-n matrix \(A\) using partial pivoting with row interchanges. The factorization has the format \(A = P \cdot L \cdot U\)
where
\(P\) is a permutation matrix
\(L\) is a lower triangular matrix with unit diagonal elements (lower trapezoidal if m > n)
\(U\) is an upper triangular matrix (upper trapezoidal if m < n)
This operation is equivalent to the LAPACK routine SGETRF
(CGETRF
) or DGETRF
(ZGETRF
) for single or double precision for real (complex) matrix.
This is the recusive version of the algorithm. It divides the matrix \(A\) into four submatrices:
where \(A_{00}\) is a square matrix of size n0, \(A_{11}\) is a matrix of size n1 by n1 with n0 = min(m, n) / 2 and n1 = n - n0. This function calls itself to factorize
do the swaps on
solve \(A_{01}\), update \(A_{11}\), then calls itself to factorize \(A_{11}\) and do the swaps on \(A_{10}\).
Note
On GPUs, the maximum matrix size of \(A\) is limited to 4096 by 4096 as we rely on static stack to achive the recursion.
Parameters¶
- A:
On input, \(A\) is a m by n general matrix. On output, the factors \(L\) and \(U\) from the factorization \(A = P \cdot L \cdot U\). The unit diagonal elements of \(L\) are not stored.
- piv:
On output, the pivot indices.
Type Requirements¶
Example¶
//@HEADER
// ************************************************************************
//
// Kokkos v. 4.0
// Copyright (2022) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Part of Kokkos, under the Apache License v2.0 with LLVM Exceptions.
// See https://kokkos.org/LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//@HEADER
#include <Kokkos_Core.hpp>
#include <Kokkos_Random.hpp>
#include <KokkosBatched_Getrf.hpp>
#include <KokkosBatched_Getrs.hpp>
using ExecutionSpace = Kokkos::DefaultExecutionSpace;
/// \brief Example of batched getrf/getrs
/// Solving A * x = b, where
/// A: [[1, 2, 3],
/// [2, -4, 6],
/// [3, -9, -3]]
/// b: [1, 1, 1]
/// x: [23/32, 1/8, 1/96]
/// A is a real general matrix
///
/// This corresponds to the following system of equations:
/// 1 x0 + 2 x1 + 3 x2 = 1
/// 2 x0 - 4 x1 + 6 x2 = 1
/// 3 x0 - 9 x1 - 3 x2 = 1
///
int main(int /*argc*/, char** /*argv*/) {
Kokkos::initialize();
{
using View2DType = Kokkos::View<double**, ExecutionSpace>;
using View3DType = Kokkos::View<double***, ExecutionSpace>;
using PivViewType = Kokkos::View<int**, ExecutionSpace>;
const int Nb = 10, n = 3;
// Matrix A
View3DType A("A", Nb, n, n);
// Solution
View2DType x("x", Nb, n);
// Pivot
PivViewType ipiv("ipiv", Nb, n);
// Initialize A and x
Kokkos::deep_copy(x, 1.0);
auto h_A = Kokkos::create_mirror_view(A);
// Upper triangular matrix
for (int ib = 0; ib < Nb; ib++) {
// Fill the matrix
h_A(ib, 0, 0) = 1.0;
h_A(ib, 0, 1) = 2.0;
h_A(ib, 0, 2) = 3.0;
h_A(ib, 1, 0) = 2.0;
h_A(ib, 1, 1) = -4.0;
h_A(ib, 1, 2) = 6.0;
h_A(ib, 2, 0) = 3.0;
h_A(ib, 2, 1) = -9.0;
h_A(ib, 2, 2) = -3.0;
}
Kokkos::deep_copy(A, h_A);
// solve A * x = b with getrf and getrs
ExecutionSpace exec;
using policy_type = Kokkos::RangePolicy<ExecutionSpace, Kokkos::IndexType<int>>;
policy_type policy{exec, 0, Nb};
Kokkos::parallel_for(
"getrf-getrs", policy, KOKKOS_LAMBDA(int ib) {
auto sub_A = Kokkos::subview(A, ib, Kokkos::ALL, Kokkos::ALL);
auto sub_ipiv = Kokkos::subview(ipiv, ib, Kokkos::ALL);
auto sub_x = Kokkos::subview(x, ib, Kokkos::ALL);
// PLU Factorize A by getrf
KokkosBatched::SerialGetrf<KokkosBatched::Algo::Getrf::Unblocked>::invoke(sub_A, sub_ipiv);
// Solve A * x = b with getrs
KokkosBatched::SerialGetrs<KokkosBatched::Trans::NoTranspose, KokkosBatched::Algo::Getrs::Unblocked>::invoke(
sub_A, sub_ipiv, sub_x);
});
// Confirm that the results are correct
auto h_x = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, x);
bool correct = true;
double eps = 1.0e-12;
for (int ib = 0; ib < Nb; ib++) {
if (Kokkos::abs(h_x(ib, 0) - 23.0 / 32.0) > eps) correct = false;
if (Kokkos::abs(h_x(ib, 1) - 1.0 / 8.0) > eps) correct = false;
if (Kokkos::abs(h_x(ib, 2) - 1.0 / 96.0) > eps) correct = false;
}
if (correct) {
std::cout << "getrf/getrs works correctly!" << std::endl;
}
}
Kokkos::finalize();
}
output:
getrf/getrs works correctly!