Poster: OpenGPT-X - Training Large Language Models on HPC Systems
Poster publication: http://hdl.handle.net/2128/32006
The 14th JLESC workshop (JLESC: Joint Laboratory for Extreme-Scale Computing) was hosted by the National Center for Supercomputing Applications (NCSA) in Urbana, Illinois from 28th September to 30th September.
We had the opportunity to present the OpenGPT-X project in form of a poster.
On it, you can find information about the project partners within OpenGPT-X, its goals and the use cases of large language models it explores. Various available language models are presented.
Recent breakthroughs became possible due to the novel neural network architecture called transformer, based on so-called self-attention layers. These allow for the parallel processing of input using highly efficient matrix products.
In order to scale to a full supercomputer, three dimensions of parallelism are intertwined: Data parallelism, pipeline parallelism, and tensor parallelism. The total number of used GPU devices is given by multiplying these three parallel degrees.
Novel AI architectures explored at the Jülich Supercomputing Centre include AMD Instinct MI250 GPUs and Graphcore IPUs.
Using the Megatron-Deepspeed training framework one can easily achieve about 50% of peak performance on Nvidia A100 GPUs. In our tests, including 32 GPUs on 8 nodes of JUWELS Booster, the highest throughput (in terms of TFLOP/s) is achieved when the focus is given to data parallelism, and pipeline parallelism is used to reduce the memory footprint of the model.
Project challenges include hardware related spurious errors and energy consumption.
Our runs in OpenGPT-X are made possible through compute time on JUWELS Booster, given by the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) through the John von Neumann Institute for Computing at JSC.