Multi-GPU Computing Tutorial at ISC22
On May 29, we held the first in person tutorial since the start of the Covid pandemic. And while it was a little weird being back physically among people, it was also great at the same time. Gone are the challenges of teaching in video conferences, albeit also all the benefits. But that’s a wholly different text still to be written.
Between 15 and 7 people attended the tutorial titled “Efficient Distributed GPU Programming for Exascale”1, which we held for a second time – this version only including some incremental changes over the inaugurative (but virtual) version at SC22 at the end of 2021. 15 attendees at the beginning, and 7 at the end. It was a full day tutorial with a steep learning curve. We covered the basics in the morning (system, GPU-MPI, profiling) and advanced topics in the evening (NCCL, NVSHMEM, NVSHMEM without CPU). Talking to people on the side2, they especially enjoyed the advanced things, but were already a bit tired by that time. We have to find about a remedy for that, but up to now all other options have more serious caveats. Still, with a new number one machine leading the Top500 and jumping over the ExaFLOP/s line using 37 000 GPUs, the tutorial is more valid then ever.
As we did already for the SC21 version, we share the entirety of the material on GitHub with an Open-Source MIT license: github.com/FZJ-JSC/tutorial-multi-gpu/. In addition, to have a static, quotable version of it, we created a Zenodo release for it, with DOI 10.5281/zenodo.6603470. The workflow is actually pretty cool: By adding Zenodo-relevant meta data to a .zenodo.json
file as part of the GitHub repo, publishing the tutorial through Zenodo just means creating a GitHub release for it. It took me a while to get right for the first time, but now for the second time (v2.0-isc22
), everything worked like a charm.
Let’s see if we get the chance to the present the material yet another time. We still have some cool ideas which didn’t make the cut, because of some missing software functionality. But it will be a challenge to accommodate time in the curriculum for them.