exec_dist_penta_compact Subroutine

public subroutine exec_dist_penta_compact(du, u, u_recv_s, u_recv_e, tdsops, blocks, threads)

Single-GPU non-periodic pentadiagonal compact-FD solve.

Calls der_penta_full which does the complete forward+backward Thomas (5-band LU) in one kernel launch. No MPI exchange needed for single-GPU. For multi-GPU periodic extension, a distributed pentadiag reduction algorithm would be required (future work).

Arguments

Type IntentOptional Attributes Name
real(kind=dp), intent(out), device, dimension(:, :, :) :: du
real(kind=dp), intent(in), device, dimension(:, :, :) :: u
real(kind=dp), intent(in), device, dimension(:, :, :) :: u_recv_s
real(kind=dp), intent(in), device, dimension(:, :, :) :: u_recv_e
type(cuda_tdsops_t), intent(in) :: tdsops
type(dim3), intent(in) :: blocks
type(dim3), intent(in) :: threads

Calls

proc~~exec_dist_penta_compact~~CallsGraph proc~exec_dist_penta_compact m_cuda_exec_dist::exec_dist_penta_compact proc~der_penta_full~2 m_cuda_kernels_dist::der_penta_full proc~exec_dist_penta_compact->proc~der_penta_full~2