应用数学青年讨论班(午餐会)—— Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning
时间:2024-09-25 11:45-13:00
Efficiently training large language models (LLMs) necessitates the adoption of hybrid parallel methods, integrating multiple communications collectives within distributed partitioned graphs. Overcoming communication bottlenecks is crucial and is often achieved through communication and computation overlaps. However, existing overlap methodologies tend to lean towards either fine-grained kernel fusion or limited operation scheduling, constraining performance optimization in heterogeneous training environments.
In this talk, we introduce Centauri, an innovative framework that encompasses comprehensive communication partitioning and hierarchical scheduling schemes for optimized overlap. We propose a partition space comprising three inherent abstraction dimensions: primitive substitution, topology-aware group partitioning, and workload partitioning. To determine the efficient overlap of communication and computation operators, we decompose the scheduling tasks in hybrid parallel training into three hierarchical tiers: operation, layer, and model. Through these techniques, our framework Centauri effectively overlaps communication latency and enhances hardware utilization.
前沿交叉学科研究院的博士研究生,导师为杨超。她的研究方向为高性能与分布式计算,大规模机器学习系统和分布式系统。她在本次报告的工作获得了ASPLOS 2024 Best Paper award。