Seminar 9: Decentralize to Generalize? On the Asymptotic Equivalence of Decentra

168
0
2023-07-13 22:05:23
正在缓冲...
2
投币
2
分享
Speaker: Tongtian Zhu, Zhejiang University, https://raiden-zhu.github.io/ Bio: Tongtian Zhu is a second-year PhD student at the Computer Science Department of Zhejiang University, supervised by Professors Mingli Song and Chun Chen. He also works closely with Doctor Fengxiang He and Professor Dacheng Tao. His current research focuses on the theoretical foundations of decentralized learning. He is also dedicated to leveraging elegant theoretical insights to develop fast and generalizable decentralized learning algorithms. Abstract: Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization invariably undermines generalization. In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. We prove that D-SGD implicitly minimizes the loss function of an average-direction Sharpness-aware minimization (SAM) algorithm under general non-convex non-β-smooth settings. This surprising asymptotic equivalence reveals an intrinsic regularization-optimization trade-off and three advantages of decentralization: (1) D-SGD includes a free uncertainty evaluation mechanism that can improve posterior estimation; (2) D-SGD exhibits a gradient smoothing effect; and (3) the sharpness regularization effect of D-SGD does not decrease as total batch size increases, which justifies the potential generalization benefit of D-SGD over centralized SGD (C-SGD) in large-batch scenarios.
https://dlo-seminar.github.io/
客服
顶部
赛事库 课堂 2021拜年纪