Single-GPU non-periodic pentadiagonal compact-FD solve.
Calls der_penta_full which does the complete forward+backward Thomas (5-band LU) in one kernel launch. No MPI exchange needed for single-GPU. For multi-GPU periodic extension, a distributed pentadiag reduction algorithm would be required (future work).
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| real(kind=dp), | intent(out), | device, dimension(:, :, :) | :: | du | ||
| real(kind=dp), | intent(in), | device, dimension(:, :, :) | :: | u | ||
| real(kind=dp), | intent(in), | device, dimension(:, :, :) | :: | u_recv_s | ||
| real(kind=dp), | intent(in), | device, dimension(:, :, :) | :: | u_recv_e | ||
| type(cuda_tdsops_t), | intent(in) | :: | tdsops | |||
| type(dim3), | intent(in) | :: | blocks | |||
| type(dim3), | intent(in) | :: | threads |