Survey of edge–edge collaborative training for edge intelligence

WANG Rui; WANG Yan; YIN Pu; QI Jian-peng; SUN Ye-tao; LI Qian; ZHANG Yi-da; ZHANG Mei-kui

doi:10.13374/j.issn2095-9389.2022.09.26.004

Volume 45 Issue 8

Aug. 2023

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Engineering > 2023 > 45(8): 1400-1416

WANG Rui, WANG Yan, YIN Pu, QI Jian-peng, SUN Ye-tao, LI Qian, ZHANG Yi-da, ZHANG Mei-kui. Survey of edge–edge collaborative training for edge intelligence[J]. Chinese Journal of Engineering, 2023, 45(8): 1400-1416. doi: 10.13374/j.issn2095-9389.2022.09.26.004

Citation:

WANG Rui, WANG Yan, YIN Pu, QI Jian-peng, SUN Ye-tao, LI Qian, ZHANG Yi-da, ZHANG Mei-kui. Survey of edge–edge collaborative training for edge intelligence[J]. Chinese Journal of Engineering, 2023, 45(8): 1400-1416. doi: 10.13374/j.issn2095-9389.2022.09.26.004

Citation:

PDF( 1128 KB)

Survey of edge–edge collaborative training for edge intelligence

doi: 10.13374/j.issn2095-9389.2022.09.26.004

1.
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
2.
Chinese PLA General Hospital, Beijing 100039, China

More Information

Corresponding author: WANG Rui, E-mail: wangrui@ustb.edu.cn; ZHANG Mei-kui, E-mail: zmk301@126.com
Received Date: 2022-09-26
Available Online: 2022-12-13
Publish Date: 2023-08-25

Abstract

Abstract

With the rapid arrival of the Internet of Everything era, massive data resources are generated on edge sides, causing problems such as large network load, high energy consumption, and privacy security in traditional distributed training based on cloud computing. Edge computing sinks computing power resources to the edge side, forming a collaborative computing system that integrates “cloud, edge, and end,” which can meet the basic needs of real-time operations, intelligence, security, and privacy protection. With the help of edge computing capabilities, edge intelligence effectively promotes the intelligent development of the edge side, which has become a popular topic. Through our research, we found that edge collaborative intelligence is currently in a stage of rapid development. At this stage, several deep learning models are combined with edge computing, and many edge collaborative intelligent processing solutions have exploded, such as distributed training in edge computing scenarios, federated learning, and distributed collaborative reasoning based on technologies such as model cutting and early exit. The combination of a shallow breadth learning system and virtualization technology allows for quick implementation of edge intelligence, which considerably improves service quality and user experience and makes services more intelligent. As a key link of edge intelligence, edge intelligence collaborative training aims to assist or implement the distributed training of machine learning models on the edge side. However, in an edge computing scenario, the distributed training of the model must coordinate several edge nodes, and many challenges remain. Therefore, by fully investigating the existing research foundation of edge intelligent collaborative training, we focus on the challenges and solutions of edge intelligent collaborative training in edge scenarios such as equipment heterogeneity, limited equipment resources, and unstable network environments. This paper introduces and summarizes the overall architecture and core modules of edge intelligent collaborative training. The overall architecture mainly focuses on the interaction framework between edge devices. In terms of whether there is a central server role, it can be divided into two categories: parameter server centralized architecture and fully decentralized parallel architecture. The core module of edge intelligent collaborative training mainly focuses on the problem of collaborative training of a large number of edge devices for neural network models to update parameters. In terms of the role of parallel computing in model training, it is divided into data parallelism and model parallelism. Finally, the many challenges and prospects of edge collaborative training are analyzed and summarized.
- cloud computing,
- edge intelligence,
- collaborative training,
- edge computing,
- machine learning,
- distributed training

FullText(HTML)

References(126)

References

[1]	Zhang X Z, Wang Y F, Lu S D, et al. OpenEI: an open framework for edge intelligence // 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). Dallas, 2019: 1840
[2]	王睿, 齊建鵬, 陳亮, 等. 面向邊緣智能的協同推理綜述. 計算機研究與發展, 2023, 60（2）: 398 Wang R, Qi J P, Chen L, et al. Survey of collaborative inference for edge intelligence. J Comput Res Dev, 2023, 60（2）: 398
[3]	Zhou Z, Chen X, Li E, et al. Edge intelligence: Paving the last Mile of artificial intelligence with edge computing. Proc IEEE, 2019, 107(8): 1738 doi: 10.1109/JPROC.2019.2918951
[4]	李肯立, 劉楚波. 邊緣智能: 現狀和展望. 大數據, 2019, 5(3):69 Li K L, Liu C B. Edge intelligence: State-of-the-art and expectations. Big Data Res, 2019, 5(3): 69
[5]	談海生, 郭得科, 張弛, 等. 云邊端協同智能邊緣計算的發展與挑戰. 中國計算機協會通訊, 2020(1):16 Tan H S, Guo D, Ke Z C, et al. Development and challenges of cloud edge collaborative intelligent edge computing. CCCF, 2020(1): 16
[6]	張星洲, 魯思迪, 施巍松. 邊緣智能中的協同計算技術研究. 人工智能, 2019, 6(5):55 Zhang X Z, Lu S D, Shi W S. Research on collaborative computing technology in edge intelligence. AI-View, 2019, 6(5): 55
[7]	王曉飛. 智慧邊緣計算: 萬物互聯到萬物賦能的橋梁. 人民論壇·學術前沿, 2020(9):6 Wang X F. Intelligent edge computing: From internet of everything to internet of everything empowered. Frontiers, 2020(9): 6
[8]	Fang A D, Cui L, Zhang Z W, et al. A parallel computing framework for cloud services // 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA). Dalian, 2020: 832
[9]	Lanka S, Aung Win T, Eshan S. A review on Edge computing and 5G in IOT: Architecture & Applications // 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA). Coimbatore, 2021: 532
[10]	Carrie M, David R, Michael S. The growth in connected IoT devices is expected to generate 79.4ZB of data in 2025, according to a new IDC forecast. (2019-06-18) [2022-09-26]. https://www.businesswire.com/news/home/20190618005012
[11]	Zwolenski M, Weatherill L. The digital universe rich data and the increasing value of the internet of things. J Telecommun Digital Economy, 2014, 2(3): 47.1
[12]	Jin H, Jia L, Zhou Z. Boosting edge intelligence with collaborative cross-edge analytics. IEEE Internet Things J, 2021, 8(4): 2444 doi: 10.1109/JIOT.2020.3034891
[13]	Jiang X L, Shokri-Ghadikolaei H, Fodor G, et al. Low-latency networking: Where latency lurks and how to tame it. Proc IEEE, 2019, 107(2): 280 doi: 10.1109/JPROC.2018.2863960
[14]	Xiao Y H, Jia Y Z, Liu C C, et al. Edge computing security: State of the art and challenges. Proc IEEE, 2019, 107(8): 1608 doi: 10.1109/JPROC.2019.2918437
[15]	黃韜, 劉江, 汪碩, 等. 未來網絡技術與發展趨勢綜述. 通信學報, 2021, 42(1):130 Huang T, Liu J, Wang S, et al. Survey of the future network technology and trend. J Commun, 2021, 42(1): 130
[16]	Jennings A, Copenhagen van R, Rusmin T. Aspects of Network Edge Intelligence. Maluku Technical Report, 2001
[17]	宋純賀, 曾鵬, 于海斌. 工業互聯網智能制造邊緣計算: 現狀與挑戰. 中興通訊技術, 2019, 25(3):50 Song C H, Zeng P, Yu H B. Industrial Internet intelligent manufacturing edge computing: State-of-the-art and challenges. ZTE Technol J, 2019, 25(3): 50
[18]	Risteska Stojkoska B L, Trivodaliev K V. A review of Internet of Things for smart home: Challenges and solutions. J Clean Prod, 2017, 140: 1454 doi: 10.1016/j.jclepro.2016.10.006
[19]	Varghese B, Wang N, Barbhuiya S, et al. Challenges and opportunities in edge computing // 2016 IEEE International Conference on Smart Cloud (SmartCloud). New York, 2016: 20
[20]	施巍松, 張星洲, 王一帆, 等. 邊緣計算: 現狀與展望. 計算機研究與發展, 2019, 56(1):69 Shi W S, Zhang X Z, Wang Y F, et al. Edge computing: State-of-the-art and future directions. J Comput Res Dev, 2019, 56(1): 69
[21]	Teerapittayanon S, McDanel B, Kung H T. Distributed deep neural networks over the cloud, the edge and end devices // 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). Atlanta, 2017: 328
[22]	Wang X F, Han Y W, Wang C Y, et al. In-edge AI: Intelligentizing mobile edge computing, caching and communication by federated learning. IEEE Netw, 2019, 33(5): 156 doi: 10.1109/MNET.2019.1800286
[23]	Kang Y P, Hauswald J, Gao C, et al. Neurosurgeon. SIGOPS Oper Syst Rev, 2017, 51(2): 615 doi: 10.1145/3093315.3037698
[24]	Li E, Zhou Z, Chen X. Edge intelligence: On-demand deep learning model co-inference with device-edge synergy // Proceedings of the 2018 Workshop on Mobile Edge Communications. Budapest, 2018: 31
[25]	李逸楷, 張通, 陳俊龍. 面向邊緣計算應用的寬度孿生網絡. 自動化學報, 2020, 46(10):2060 Li Y K, Zhang T, Chen J L. Broad Siamese network for edge computing applications. Acta Autom Sin, 2020, 46(10): 2060
[26]	Al-Rakhami M, Alsahli M, Hassan M M, et al. Cost efficient edge intelligence framework using docker containers // 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). Athens, 2018: 800
[27]	Al-Rakhami M, Gumaei A, Alsahli M, et al. A lightweight and cost effective edge intelligence architecture based on containerization technology. World Wide Web, 2020, 23(2): 1341 doi: 10.1007/s11280-019-00692-y
[28]	Zaharia M, Xin R S, Wendell P, et al. Apache spark. Commun ACM, 2016, 59(11): 56 doi: 10.1145/2934664
[29]	Abadi M, Barham P, Chen J M, et al. TensorFlow: A system for large-scale machine learning [J/OL]. ArXiv Preprint (2016-05-31) [2022-09-26]. https://arxiv.org/abs/1605.08695
[30]	Chen T Q, Li M, Li Y T, et al. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems [J/OL]. ArXiv Preprint (2015-12-03) [2022-09-26]. https://arxiv.org/abs/1512.01274
[31]	Jin A L, Xu W C, Guo S, et al. PS: A simple yet effective framework for fast training on parameter server. IEEE Trans Parallel Distributed Syst, 2022, 33(12): 4625 doi: 10.1109/TPDS.2022.3200518
[32]	Padmanandam K, Lingutla L. Practice of applied edge analytics in intelligent learning framework // 2020 21st International Arab Conference on Information Technology (ACIT). Giza, 2021: 1
[33]	Ross P, Luckow A. EdgeInsight: characterizing and modeling the performance of machine learning inference on the edge and cloud // 2019 IEEE International Conference on Big Data (Big Data). Los Angeles, 2020: 1897
[34]	施巍松, 孫輝, 曹杰, 等. 邊緣計算: 萬物互聯時代新型計算模型. 計算機研究與發展, 2017, 54(5):907 Shi W S, Sun H, Cao J, et al. Edge computing–An emerging computing model for the Internet of everything era. J Comput Res Dev, 2017, 54(5): 907
[35]	Srivastava A, Nguyen D, Aggarwal S, et al. Performance and memory trade-offs of deep learning object detection in fast streaming high-definition images // 2018 IEEE International Conference on Big Data (Big Data). Seattle, 2018: 3915
[36]	Sindhu C, Vyas D V, Pradyoth K. Sentiment analysis based product rating using textual reviews // 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA). Coimbatore, 2017: 727
[37]	Hosein P, Rahaman I, Nichols K, et al. Recommendations for long-term profit optimization // Proceedings of ImpactRS@ RecSys. Copenhagen, 2019
[38]	Sharma R, Biookaghazadeh S, Li B X, et al. Are existing knowledge transfer techniques effective for deep learning with edge devices? // 2018 IEEE International Conference on Edge Computing (EDGE). San Francisco, 2018: 42
[39]	Bonawitz K, Eichner H, Grieskamp W, et al. Towards federated learning at scale: System design // Proceedings of Machine Learning and Systems. Palo Alto, 2019, 1: 374
[40]	Kairouz P, McMahan H B, Avent B, et al. Advances and open problems in federated learning. FNT Machine Learning, 2021, 14(1-2): 1
[41]	McMahan H B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data [J/OL]. ArXiv Preprint (2017-02-28) [2022-09-26]. https://arxiv.org/abs/1602.05629
[42]	朱建明, 張沁楠, 高勝, 等. 基于區塊鏈的隱私保護可信聯邦學習模型. 計算機學報, 2021, 44(12):2464 Zhu J M, Zhang Q N, Gao S, et al. Privacy preserving and trustworthy federated learning model based on blockchain. Chin J Comput, 2021, 44(12): 2464
[43]	Wei S Y, Tong Y X, Zhou Z M, et al. Efficient and Fair Data Valuation for Horizontal Federated Learning. Berlin: Springer, 2020
[44]	Khan A, Thij M, Wilbik A. Communication-efficient vertical federated learning. Algorithms, 2022, 15(8): 273 doi: 10.3390/a15080273
[45]	Chen Y Q, Qin X, Wang J D, et al. FedHealth: A federated transfer learning framework for wearable healthcare. IEEE Intell Syst, 2020, 35(4): 83 doi: 10.1109/MIS.2020.2988604
[46]	Yang J, Zheng J, Zhang Z, et al. Security of federated learning for cloud-edge intelligence collaborative computing. Int J Intell Syst, 2022, 37(11): 9290 doi: 10.1002/int.22992
[47]	Zhang X J, Gu H L, Fan L X, et al. No free lunch theorem for security and utility in federated learning [J/OL]. ArXiv Preprint (2022-09-05) [2022-09-26].https://arxiv.org/abs/2203.05816
[48]	Deng S G, Zhao H L, Fang W J, et al. Edge intelligence: The confluence of edge computing and artificial intelligence. IEEE Internet Things J, 2020, 7(8): 7457 doi: 10.1109/JIOT.2020.2984887
[49]	Feng C, Han P C, Zhang X, et al. Computation offloading in mobile edge computing networks: A survey. J Netw Comput Appl, 2022, 202: 103366 doi: 10.1016/j.jnca.2022.103366
[50]	喬德文, 郭松濤, 何靜, 等. 邊緣智能: 研究進展及挑戰. 無線電通信技術, 2022, 48(1):34 Qiao D W, Guo S T, He J, et al. Edge intelligence: Research progress and challenges. Radio Commun Technol, 2022, 48(1): 34
[51]	Fortino G, Zhou M C, Hassan M M, et al. Pushing artificial intelligence to the edge: Emerging trends, issues and challenges. Eng Appl Artif Intell, 2021, 103: 104298 doi: 10.1016/j.engappai.2021.104298
[52]	Qiu X C, Fernández-Marqués J, Gusmão P, et al. ZeroFL: Efficient on-device training for federated learning with local sparsity [J/OL]. ArXiv Preprint (2022-08-04) [2022-09-26]. https://arxiv.org/abs/2208.02507
[53]	Long S Q, Long W F, Li Z T, et al. A game-based approach for cost-aware task assignment with QoS constraint in collaborative edge and cloud environments. IEEE Trans Parallel Distributed Syst, 2021, 32(7): 1629 doi: 10.1109/TPDS.2020.3041029
[54]	朱泓睿, 元國軍, 姚成吉, 等. 分布式深度學習訓練網絡綜述. 計算機研究與發展, 2021, 58(1):98 doi: 10.7544/issn1000-1239.2021.20190881 Zhu H R, Yuan G J, Yao C J, et al. Survey on network of distributed deep learning training. J Comput Res Dev, 2021, 58(1): 98 doi: 10.7544/issn1000-1239.2021.20190881
[55]	Rafique Z, Khalid H M, Muyeen S M. Communication systems in distributed generation: A bibliographical review and frameworks. IEEE Access, 2020, 8: 207226 doi: 10.1109/ACCESS.2020.3037196
[56]	Hsieh K, Harlap A, Vijaykumar N, et al. Gaia: Geo-distributed machine learning approaching LAN speeds // Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. New York, 2017: 629
[57]	Konečný J, McMahan H B, Yu F X, et al. Federated learning: Strategies for improving communication efficiency [J/OL]. ArXiv Preprint (2017-10-30) [2022-09-26]. https://arxiv.org/abs/1610.05492
[58]	Chen J M, Pan X H, Monga R, et al. Revisiting distributed synchronous SGD [J/OL]. ArXiv Preprint (2017-03-21) [2022-09-26]. https://arxiv.org/abs/1604.00981
[59]	Nishio T, Yonetani R. Client selection for federated learning with heterogeneous resources in mobile edge // ICC 2019–2019 IEEE International Conference on Communications (ICC). Shanghai, 2019: 1
[60]	Wang S Q, Tuor T, Salonidis T, et al. When edge meets learning: Adaptive control for resource-constrained distributed machine learning // IEEE INFOCOM 2018-IEEE Conference on Computer Communications. Honolulu, 2018: 63
[61]	Lian X R, Huang Y J, Li Y C, et al. Asynchronous parallel stochastic gradient for nonconvex optimization // Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, 2015: 2737
[62]	Zhang W, Gupta S, Lian X R, et al. Staleness-aware async-SGD for distributed deep learning [J/OL]. ArXiv Preprint (2014-04-05) [2022-09-26]. https://arxiv.org/abs/1511.05950
[63]	Lu X F, Liao Y Y, Lio P, et al. Privacy-preserving asynchronous federated learning mechanism for edge network computing. IEEE Access, 2020, 8: 48970 doi: 10.1109/ACCESS.2020.2978082
[64]	Chen Y J, Ning Y, Slawski M, et al. Asynchronous online federated learning for edge devices with non-IID data // 2020 IEEE International Conference on Big Data (Big Data). Atlanta, 2021: 15
[65]	Dutta S, Wang J Y, Joshi G. Slow and stale gradients can win the race. IEEE J Sel Areas Inf Theory, 2021, 2(3): 1012 doi: 10.1109/JSAIT.2021.3103770
[66]	Lu Y L, Huang X H, Zhang K, et al. Blockchain empowered asynchronous federated learning for secure data sharing in Internet of vehicles. IEEE Trans Veh Technol, 2020, 69(4): 4298 doi: 10.1109/TVT.2020.2973651
[67]	Wu W T, He L G, Lin W W, et al. SAFA: A semi-asynchronous protocol for fast federated learning with low overhead. IEEE Trans Comput, 2021, 70(5): 655 doi: 10.1109/TC.2020.2994391
[68]	Luehr N. Fast multi-GPU collectives with NCCL [J/OL]. NVIDIA Developer (2016-04-07) [2022-09-26]. https://developer.nvidia.com/blog/fast-multi-gpu-collectives-nccl
[69]	Lian X R, Zhang W, Zhang C, et al. Asynchronous decentralized parallel stochastic gradient descent [J/OL]. ArXiv Preprint (2018-09-25) [2022-09-26]. https://arxiv.org/abs/1710.06952
[70]	Lalitha A, Kilinc O C, Javidi T, et al. Peer-to-peer federated learning on graphs [J/OL]. ArXiv Preprint (2019-01-31) [2022-09-26].https://arxiv.org/abs/1901.11173
[71]	Blot M, Picard D, Cord M, et al. Gossip training for deep learning [J/OL]. ArXiv Preprint (2016-11-29) [2022-09-26]. https://arxiv.org/abs/1611.09726
[72]	Jin P H, Yuan Q C, Iandola F, et al. How to scale distributed deep learning? [J/OL]. ArXiv Preprint (2016-11-14) [2022-09-26]. https://arxiv.org/abs/1611.04581
[73]	Daily J, Vishnu A, Siegel C, et al. GossipGraD: Scalable Deep Learning using Gossip Communication based asynchronous gradient descent [J/OL]. ArXiv Preprint (2018-03-15) [2022-09-26]. https://arxiv.org/abs/1803.05880
[74]	Vanhaesebrouck P, Bellet A, Tommasi M. Decentralized collaborative learning of personalized models over networks // Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Florida, 2017: 509
[75]	He C Y, Tan C H, Tang H L, et al. Central server free federated learning over single-sided trust social networks [J/OL]. ArXiv Preprint (2020-08-01) [2022-09-26]. https://arxiv.org/abs/1910.04956
[76]	Colin I, Bellet A, Salmon J, et al. Gossip dual averaging for decentralized optimization of pairwise functions[J/OL]. ArXiv Preprint (2016-06-08) [2022-09-26]. https://arxiv.org/abs/1606.02421
[77]	Nedi? A, Olshevsky A. Stochastic gradient-push for strongly convex functions on time-varying directed graphs. IEEE Trans Autom Control, 2016, 61(12): 3936 doi: 10.1109/TAC.2016.2529285
[78]	Assran M, Loizou N, Ballas N, et al. Stochastic gradient push for distributed deep learning // Proceedings of the 36th International Conference on Machine Learning. California, 2019: 344
[79]	Koloskova A, Stich S, Jaggi M. Decentralized stochastic optimization and gossip algorithms with compressed communication // Proceedings of the 36th International Conference on Machine Learning. California, 2019: 3478
[80]	Hu C H, Jiang J Y, Wang Z. Decentralized federated learning: A segmented gossip approach [J/OL]. ArXiv Preprint (2019-08-21) [2022-09-26]. https://arxiv.org/abs/1908.07782
[81]	Ruder S. An overview of gradient descent optimization algorithms [J/OL]. ArXiv Preprint (2017-06-15) [2022-09-26]. https://arxiv.org/abs/1609.04747
[82]	Chahal K S, Grover M S, Dey K, et al. A hitchhiker’s guide on distributed training of deep neural networks. J Parallel Distributed Comput, 2020, 137: 65 doi: 10.1016/j.jpdc.2019.10.004
[83]	Chai Z, Ali A, Zawad S, et al. TiFL: A tier-based federated learning system // Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. Stockholm, 2020: 125
[84]	Li X Y, Qu Z, Tang B, et al. Stragglers are not disaster: A hybrid federated learning algorithm with delayed gradients[J/OL]. ArXiv Preprint (2021-02-12) [2022-09-26]. https://arxiv.org/abs/2102.06329
[85]	Xu Z R, Yang Z, Xiong J J, et al. ELFISH: Resource-aware federated learning on heterogeneous edge devices[J/OL]. ArXiv Preprint (2021-03-01) [2022-09-26]. https://arxiv.org/abs/1912.01684
[86]	Agarwal A, Duchi J C. Distributed delayed stochastic optimization // Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, 2011: 873
[87]	Sahu A N, Dutta A, Tiwari A, et al. On the convergence analysis of asynchronous SGD for solving consistent linear systems [J/OL]. ArXiv Preprint (2020-04-05) [2022-09-26]. https://arxiv.org/abs/2004.02163
[88]	Dean J, Corrado G S, Monga R, et al. Large scale distributed deep networks // Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, 2012: 1223
[89]	Zhang S X, Choromanska A, LeCun Y. Deep learning with elastic averaging SGD // Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, 2015: 685
[90]	Xie C, Koyejo S, Gupta I. Asynchronous federated optimization [J/OL]. ArXiv Preprint (2020-12-05) [2022-09-26]. https://arxiv.org/abs/1903.03934
[91]	Odena A. Faster asynchronous SGD [J/OL]. ArXiv Preprint (2016-01-15) [2022-09-26]. https://arxiv.org/abs/1601.04033
[92]	Chan W, Lane I. Distributed asynchronous optimization of convolutional neural networks // Proceedings of Fifteenth Annual Conference of the International Speech Communication Association. Singapore, 2014: 1073
[93]	Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning // Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, 2013: 1139
[94]	Hakimi I, Barkai S, Gabel M, et al. Taming momentum in a distributed asynchronous environment [J/OL]. ArXiv Preprint (2020-10-14) [2022-09-26]. https://arxiv.org/abs/1907.11612
[95]	Chen M, Mao B C, Ma T Y. FedSA: A staleness-aware asynchronous federated learning algorithm with non-IID data. Future Gener Comput Syst, 2021, 120: 1 doi: 10.1016/j.future.2021.02.012
[96]	Li X, Huang K X, Yang W H, et al. On the convergence of FedAvg on non-IID data [J/OL]. ArXiv Preprint (2020-06-25) [2022-09-26]. https://arxiv.org/abs/1907.02189
[97]	Khaled A, Mishchenko K, Richtárik P. First analysis of local GD on heterogeneous data [J/OL]. ArXiv Preprint (2020-03-18) [2022-09-26]. https://arxiv.org/abs/1909.04715
[98]	Hsu T M H, Qi H, Brown M. Measuring the effects of non-identical data distribution for federated visual classification [J/OL]. ArXiv Preprint (2019-09-13) [2022-09-26]. https://arxiv.org/abs/1909.06335
[99]	Karimireddy S P, Kale S, Mohri M, et al. SCAFFOLD: Stochastic controlled averaging for on-device federated learning [J/OL]. ArXiv Preprint (2021-04-09) [2022-09-26]. https://arxiv.org/abs/1910.06378
[100]	Li T, Sahu A K, Zaheer M, et al. Federated optimization in heterogeneous networks [J/OL]. ArXiv Preprint (2020-04-21) [2022-09-26]. https://arxiv.org/abs/1812.06127
[101]	Wang J Y, Liu Q H, Liang H, et al. Tackling the objective inconsistency problem in heterogeneous federated optimization [J/OL]. ArXiv Preprint (2020-07-15) [2022-09-26]. https://arxiv.org/abs/2007.07481
[102]	Hsu T M H, Qi H, Brown M. Federated visual classification with real-world data distribution [J/OL]. ArXiv Preprint (2020-07-17) [2022-09-26]. https://arxiv.org/abs/2003.08082
[103]	Zhao Y, Li M, Lai L Z, et al. Federated learning with non-IID data [J/OL]. ArXiv Preprint (2022-07-21) [2022-09-26]. https://arxiv.org/abs/1806.00582
[104]	Yoshida N, Nishio T, Morikura M, et al. Hybrid-FL for wireless networks: Cooperative learning mechanism using non-IID data // ICC 2020–2020 IEEE International Conference on Communications (ICC). Dublin, 2020: 1
[105]	Shoham N, Avidor T, Keren A, et al. Overcoming forgetting in federated learning on non-IID data [J/OL]. ArXiv Preprint (2019-10-17) [2022-09-26]. https://arxiv.org/abs/1910.07796
[106]	Huang Y T, Chu L Y, Zhou Z R, et al. Personalized cross-silo federated learning on non-IID data. Proc AAAI Conf Artif Intell, 2021, 35(9): 7865
[107]	Wu Q, He K W, Chen X. Personalized federated learning for intelligent IoT applications: A cloud-edge based framework. IEEE Open J Comput Soc, 2020, 1: 35 doi: 10.1109/OJCS.2020.2993259
[108]	Günther S, Ruthotto L, Schroder J B, et al. Layer-parallel training of deep residual neural networks [J/OL]. ArXiv Preprint (2019-07-25) [2022-09-26]. https://arxiv.org/abs/1812.04352
[109]	Mayer R, Jacobsen H-A. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Comput Surv, 2020, 53(1): 1
[110]	Jia Z H, Zaharia M, Aiken A. Beyond data and model parallelism for deep neural networks [J/OL]. ArXiv Preprint (2018-07-14) [2022-09-26]. https://arxiv.org/abs/1807.05358
[111]	Harlap A, Narayanan D, Phanishayee A, et al. PipeDream: Fast and efficient pipeline parallel DNN training [J/OL]. ArXiv Preprint (2018-06-08) [2022-09-26]. https://arxiv.org/abs/1806.03377
[112]	Chen C C, Yang C L, Cheng H Y. Efficient and robust parallel DNN training through model parallelism on multi-GPU platform [J/OL]. ArXiv Preprint (2019-10-28) [2022-09-26]. https://arxiv.org/abs/1809.02839
[113]	Huang Y P, Cheng Y L, Bapna A, et al. GPipe: Efficient training of giant neural networks using pipeline parallelism [J/OL]. ArXiv Preprint (2019-07-25) [2022-09-26]. https://arxiv.org/abs/1811.06965
[114]	Mirhoseini A, Pham H, Le Q V, et al. Device placement optimization with reinforcement learning // Proceedings of the 34th International Conference on Machine Learning. Sydney, 2017: 2430
[115]	Shoeybi M, Patwary M, Puri R, et al. Megatron-LM: Training multi-billion parameter language models using model parallelism [J/OL]. ArXiv Preprint (2020-03-13) [2022-09-26]. https://arxiv.org/abs/1909.08053
[116]	Frankle J, Carbin M. The lottery ticket hypothesis: Finding sparse, trainable neural networks [J/OL]. ArXiv Preprint (2019-03-04) [2022-09-26]. https://arxiv.org/abs/1803.03635
[117]	Wang Z D, Liu X X, Huang L, et al. QSFM: Model pruning based on quantified similarity between feature maps for AI on edge. IEEE Internet Things J, 2022, 9(23): 24506 doi: 10.1109/JIOT.2022.3190873
[118]	Wang J, Zhang J G, Bao W D, et al. Not just privacy: Improving performance of private deep learning in mobile cloud // Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, 2018: 2407
[119]	Zhang L F, Tan Z H, Song J B, et al. Scan: A scalable neural networks framework towards compact and efficient models // 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, 2019: 32
[120]	Gou J P, Yu B S, Maybank S J, et al. Knowledge distillation: A survey [J/OL]. ArXiv Preprint (2021-03-20) [2022-09-26]. https://arxiv.org/abs/2006.05525
[121]	Phuong M, Lampert C H. Towards understanding knowledge distillation [J/OL]. ArXiv Preprint (2021-03-27) [2022-09-26].https://arxiv.org/abs/2105.13093
[122]	Anil R, Pereyra G, Passos A, et al. Large scale distributed neural network training through online distillation [J/OL]. ArXiv Preprint (2020-08-20) [2022-09-26]. https://arxiv.org/abs/1804.03235
[123]	Jeong E, Oh S, Kim H, et al. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-IID private data [J/OL]. ArXiv Preprint (2018-11-28) [2022-09-26]. https://arxiv.org/abs/1811.11479
[124]	Shen T, Zhang J, Jia X K, et al. Federated mutual learning [J/OL]. ArXiv Preprint (2020-09-17) [2022-09-26]. https://arxiv.org/abs/2006.16765
[125]	Sattler F, Marban A, Rischke R, et al. Communication-efficient federated distillation [J/OL]. ArXiv Preprint (2020-12-01) [2022-09-26]. https://arxiv.org/abs/2012.00632
[126]	Ahn J H, Simeone O, Kang J. Wireless federated distillation for distributed edge learning with heterogeneous data // 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC). Istanbul, 2019: 1