Developing a Methodology to Optimize Graphics Processing Unit (GPU) Configurations to Accelerate Machine Learning Workloads

Kailas, Shankar

Optimizing Graphics Processing Unit (GPU) configurations for machine learning (ML) workloads is critical for enhancing performance in federated learning scenarios, where data privacy and decentralized processing are paramount. This paper presents an exploration of GPU configurations and their impact on…

Optimizing Graphics Processing Unit (GPU) configurations for machine learning (ML) workloads is critical for enhancing performance in federated learning scenarios, where data privacy and decentralized processing are paramount. This paper presents an exploration of GPU configurations and their impact on ML tasks, including both inference and training. By benchmarking various hardware setups, including edge devices, cloudlet machines, and servers, we demonstrate the significant performance benefits of GPU acceleration over CPU-only systems at all levels of the federated learning scheme. GPUs showed up to tenfold reductions in latency and substantial increases in throughput across diverse batch sizes and models, underscoring their capability to handle parallel computations efficiently. Further, we explore the use of Accel-Sim, a simulation framework designed to model and analyze GPU accelerators. This tool enables detailed hardware design space exploration by allowing users to simulate GPU applications on highly customizable simulated GPUs and collect performance metrics. The results of this project advocate for the integration of GPUs in federated learning environments to optimize system performance, scalability, and efficiency, justifying their initial costs and power consumption. This research contributes to the development of co-optimized hardware and software solutions involving GPUs tailored to the specific needs of ML tasks within federated learning frameworks.

Copyright Statement