Repository logo
 

Programmable and Intelligent Accelerator-aware Load Balancers in Data Centers

Date

2023-12-26

Authors

Tajbakhsh, Hesam

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The slowdown in CPU progress prompted system designers to incorporate diverse programmable accelerators (e.g., graphics processing unit (GPU), smart network interface card (SmartNIC) to address the insufficient computational capacity needed for various components within computer systems. While these programmable accelerators enhance computational capabilities, they possess distinct architectures and capacities compared to standard CPUs. Thus, it is essential to judiciously distribute the computing tasks among servers and their accelerators to avoid performance degradation. Software-defined networking is a paradigm that enables network programmability for agile and efficient network management and operations. Programmable hardware (e.g., switch) recently became a promising alternative for task distribution decisions. A programmable switch can process packets in real-time at line rates (Tbps) significantly faster than legacy server-based load balancers (LBs). Furthermore, such in-network load balancers can reduce the delay in decision-making by cutting off the latency for sending packets from the switch to load-balancing servers. There are several load balancers deployed in programmable switches, but none incorporate the capabilities of accelerators in their designs. In this thesis, we propose the first in-network accelerator-aware load balancers for performance improvement of machine learning applications in data centers. The first load balancer is called P4Mite, which deploys agents in application processing servers and accelerators to measure their capacity and shares these statuses with the switch. It uses this information and load balancing policies (e.g., weighted round robin) to dispatch loads among servers and their accelerators. However, P4Mite supports a limited number of policies. Thus, we introduce P4Hauler, which provides a load balancing framework to support a wide range of policies. Within this framework, we propose configurable building blocks that operators can dynamically select to implement various policies on-the-fly without rebooting the switch and interrupting its services. In addition to knowing the policies and statuses of accelerators, an LB must be aware of traffic condition, which makes the LB operation tedious. Thus, we propose P4Wise, a learning-based LB, to select the most suitable distribution policy automatically. We implement a prototype of the proposed load balancer and deploy it on a testbed consisting of a programmable switch (Intel Tofino), SmartNICs (Mellanox BlueField), and legacy servers to demonstrate deployment feasibility and efficiency over existing solutions. Then, we develop a realistic simulator to show the performance at scale. Specifically, P4Hauler can handle 27% more load compared to traditional LBs using only a single accelerator. In the case of hundreds of servers with multiple accelerators, the performance improvement is proportional to the number of available accelerators. Finally, P4Wise consistently selects appropriate weights with an accuracy of at least 90%. Furthermore, it responds to changes in the environment by adapting the load balancing approach accordingly.

Description

Keywords

Accelerators, Load Balancer, P4, Software Defined Networking, Programmable Data Plane

Citation