Lower Precision Training - achieving speedup without compromising performance
As the provider of HPC and AI services on supercomputing platforms at KAUST, the KAUST Supercomputing Lab (KSL) is committed to enabling cutting-edge AI/ML workloads for our research community. Building on this foundation, KSL is pleased to present an expert-led talk from NVIDIA.
Please register to reserve your place for attendance.
Abstract:
This talk explains how training large language models in low precision (mainly FP8 today, with FP4 emerging) can significantly speed up GPU training while preserving accuracy. It covers why low precision is numerically hard (wide gradient distributions, underflow/overflow), and how modern formats like MXFP8 and NVFP4 use fine-grained scaling and specialized data types to keep tensors within range. Some FP8 recipes are available on Hopper, while the newer MXFP8 variants and all FP4 training support are specific to Blackwell GPUs. Results show that MXFP8 and NVFP4 match BF16/FP8 task accuracy on large transformer models, and these recipes are already exposed through NVIDIA’s Transformer Engine, NeMo, and Megatron-LM for practical deployment at scale.
Speaker:
Giuseppe Fiameni is a senior solutions architect specialized in AI and accelerated computing at NVIDIA, where he manages the NVIDIA AI Technology Center (NVAITC) program in EMEA. He holds a PhD in Computer Engineering from the University of Modena e Reggio Emilia where he is also co-lecturer of the Scalable AI master course, which provides comprehensive insights into the challenges and techniques of training and deploying large-scale AI models. He has extensive experience in high-performance computing, having previously worked for CINECA, the Italian supercomputing center. His research focuses on AI, HPC, and scientific data analysis, and he has contributed to numerous publications in these fields.
Registration is mandatory to attend this event. Please contact mohsin.shaikh@kaust.edu.sa if you have any questions regarding the event.