GPU Vendor/Programming Model Compatibility Table
For a recent talk at DKRZ in the scope of the natESM project, I created a table summarizing the current state of using a certain programming model on a GPU of a certain vendor, for C++ and Fortran. Since it lead to quite a discussion in the session, I made a standalone version of it with some updates and elaborations here and there.
I present, the GPU Vendor/Programming Model Compatibility Table!
Update available (Jun 2024): Dedicated paper and page with updated content!
Compatibility Table
Read below for some caveats and technical background! There is also a PDF and an SVG version available.
- Full vendor support
- Indirect, but comprehensive support, by vendor
- Vendor support, but not (yet) entirely comprehensive
- Comprehensive support, but not by vendor
- Limited, probably indirect support -- but at least some
- No direct support available, but of course one could ISO-C-bind your way through it or directly link the libraries
- C
- C++ (sometimes also C)
- F
- Fortran
CUDA | HIP | SYCL | OpenACC | OpenMP | Standard | Kokkos | ALPAKA | etc | |||||||||
C | F | C | F | C | F | C | F | C | F | C | F | C | F | C | F | Python | |
NVIDIA | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
AMD | 18 | 19 | 20 | 4 | 21 | 6 | 22 | 23 | 24 | 24 | 25 | 26 | 27 | 14 | 28 | 16 | 29 |
Intel | 30 | 31 | 32 | 33 | 34 | 6 | 35 | 35 | 36 | 36 | 37 | 38 | 39 | 14 | 40 | 16 | 41 |
- 1: CUDA C/C++ is supported on NVIDIA GPUs through the CUDA Toolkit↺
- 2: CUDA Fortran, a proprietary Fortran extension, is supported on NVIDIA GPUs via the NVIDIA HPC SDK↺
- 3: HIP programs can directly use NVIDIA GPUs via a CUDA backend; HIP is maintained by AMD↺
- 4: No such thing like HIP for Fortran, but AMD offers Fortran interfaces to HIP and ROCm libraries in hipfort↺
- 5: SYCL can be used on NVIDIA GPUs with experimental support either in SYCL directly or in DPC++, or via hipSYCL↺
- 6: No such thing like SYCL for Fortran↺
- 7: OpenACC C/C++ supported on NVIDIA GPUs directly (and best) through NVIDIA HPC SDK; additional, somewhat limited support by GCC C compiler and in LLVM through Clacc↺
- 8: OpenACC Fortran supported on NVIDIA GPUs directly (and best) through NVIDIA HPC SDK; additional, somewhat limited support by GCC Fortran compiler and Flacc↺
- 9: OpenMP in C++ supported on NVIDIA GPUs through NVIDIA HPC SDK (albeit with a few limits), by GCC, and Clang; see OpenMP ECP BoF on status in 2022.↺
- 10: OpenMP in Fortran supported on NVIDIA GPUs through NVIDIA HPC SDK (but not full OpenMP feature set available), by GCC, and Flang↺
- 11: pSTL features supported on NVIDIA GPUs through NVIDIA HPC SDK↺
- 12: Standard Language parallel features supported on NVIDIA GPUs through NVIDIA HPC SDK↺
- 13: Kokkos supports NVIDIA GPUs by calling CUDA as part of the compilation process↺
- 14: Kokkos is a C++ model, but an official compatibility layer (Fortran Language Compatibility Layer, FLCL) is available.↺
- 15: Alpaka supports NVIDIA GPUs by calling CUDA as part of the compilation process; also, an OpenMP backend can be used↺
- 16: Alpaka is a C++ model↺
- 17: There is a vast community of offloading Python code to NVIDIA GPUs, like CuPy, Numba, cuNumeric, and many others; NVIDIA actively supports a lot of them, but has no direct product like CUDA for Python; so, the status is somewhere in between↺
- 18: hipify by AMD can translate CUDA calls to HIP calls which runs natively on AMD GPUs↺
- 19: AMD offers a Source-to-Source translator to convert some CUDA Fortran functionality to OpenMP for AMD GPUs (gpufort); in addition, there are ROCm library bindings for Fortran in hipfort OpenACC/CUDA Fortran Source-to-Source translator↺
- 20: HIP is the preferred native programming model for AMD GPUs↺
- 21: SYCL can use AMD GPUs, for example with hipSYCL or DPC++ for HIP AMD↺
- 22: OpenACC C/C++ can be used on AMD GPUs via GCC or Clacc; also, Intel's OpenACC to OpenMP Source-to-Source translator can be used to generate OpenMP directives from OpenACC directives↺
- 23: OpenACC Fortran can be used on AMD GPUs via GCC; also, AMD's gpufort Source-to-Source translator can move OpenACC Fortran code to OpenMP Fortran code, and also Intel's translator can work↺
- 24: AMD offers a dedicated, Clang-based compiler for using OpenMP on AMD GPUs: AOMP; it supports both C/C++ (Clang) and Fortran (Flang, example)↺
- 25: Intel's DPC++ (oneAPI) can be compiled with an experimental HIP AMD backend, allowing to launch STL algorithms to AMD GPUs; caveats from Intel's STL support apply↺
- 26: Currently, no (known) way to launch Standard-based parallel algorithms on AMD GPUs↺
- 27: Kokkos supports AMD GPUs through HIP↺
- 28: Alpaka supports AMD GPUs through HIP or through an OpenMP backend↺
- 29: AMD does not officially support GPU programming with Python (also not semi-officially like NVIDIA), but third-party support is available, for example through Numba (currently inactive) or a HIP version of CuPy↺
- 30: SYCLomatic translates CUDA code to SYCL code, allowing it to run on Intel GPUs; also, Intel's DPC++ Compatibility Tool can transform CUDA to SYCL↺
- 31: No direct support, only via ISO C bindings, but at least an example can be found on GitHub; it's pretty scarce and not by Intel itself, though↺
- 32: CHIP-SPV supports mapping CUDA and HIP to OpenCL and Intel's Level Zero, making it run on Intel GPUs↺
- 33: No such thing like HIP for Fortran↺
- 34: SYCL is the prime programming model for Intel GPUs; actually, SYCL is only a standard, while Intel's implementation of it is called DPC++ (Data Parallel C++), which extends the SYCL standard in various places; actually actually, Intel namespaces everything oneAPI these days, so the full proper name is Intel oneAPI DPC++ (which incorporates a C++ compiler and also a library)↺
- 35: OpenACC can be used on Intel GPUs by translating the code to OpenMP with Intel's Source-to-Source translator↺
- 36: Intel has extensive support for OpenMP through their latest compilers↺
- 37: Intel supports pSTL algorithms through their DPC++ Library (oneDPL; GitHub). It's heavily namespaced and not yet on the same level as NVIDIA↺
- 38: With Intel oneAPI 2022.3, Intel supports DO CONCURRENT with GPU offloading↺
- 39: Kokkos supports Intel GPUs through SYCL↺
- 40: Alpaka v0.9.0 introduces experimental SYCL support; also, Alpaka can use OpenMP backends↺
- 41: Not a lot of support available at the moment, but notably DPNP, a SYCL-based drop-in replacement for Numpy, and numba-dpex, an extension of Numba for DPC++.↺
Caveats
Although the table and its descriptions does a decent job in summarizing the state-of-the-art (I think), there are some caveats going along with it.
- This is the state as of Nov 2022; things are moving along quickly and might be outdated when you read this
- This is (partly) opinionated by my practical experience with things, chat me up if you a disagree with my assessment
- Most importantly: It does not say anything about performance; adding performance to the mix (which is, like, a very important metric in HPC) would make this two-dimensional table three-dimensional, and would be really hard to judge
Technical Background
As the origin of the table is in slides (which I, of course, create with LaTeX), but I also want to present it here (in HTML form), I looked for a way to generate one from the other. Nothing really worked perfectly – LaTeXML looks great, but is still a little complicated. So, I did what any reasonable programmer would do and spend way too much time to script my way out of things.
I recreated the table as a machine-readable YAML file which is transformed to TeX and HTML by using respective templates with Jinja. Jinja is really amazing and I’m a huge fan. All the data, all files, and all scripts are in a GitHub repository: https://github.com/AndiH/gpu-lang-compat. Feel free to remix, it’s MIT!
Changelog
- 2022-Nov-04: Fixed wrong icon for HIP with Fortran on NVIDIA GPUs
- 2022-Nov-08: Added
hipfort
, fixed a wrong symbol for ALPAKA, added OpenMP for ALPAKA (commit for both) - 2022-Nov-12: Fine-tuned colors of symbols; tagged new v1.2 release
- 2022-Dec-06: Update Intel’s support for Standard Parallelism, displayed wrongly because a bug in my scripts
- 2022-Dec-07: Update Standard Parallelism support on AMD GPUs via Intel DPC++