彈性自組織多集群管理系統設計與實現
網絡安全與數據治理
夏令明, 周俊,趙鋒
網絡通信與安全紫金山實驗室 未來網絡研究中心, 江蘇南京211111
摘要: Kubernetes等云原生技術在業界應用時,承載能力有限,無法滿足更高可用性要求,且易被云供應商鎖定;東數西算等戰略的實施運行,需以多集群管理技術為基礎,但是傳統的云管平臺難以滿足跨多云應用的服務部署和治理的挑戰。提出軟件定義的自組織基礎設施管理、冪等的分層調度新理念,實現以集群為最小單位的彈性基礎設施管理架構,將多個Kubernetes集群組成中心式、去中心式、樹狀等任意拓撲結構,進行應用的跨云調度及管理。方案基于樹狀集群結構進行了測試驗證,并與其他方案對比,測試結果表明該方案能夠滿足未來分布式云場景下海量集群組織管理需求,且保持接入新集群不超過1 s,應用的調度延遲不超過200 ms。
中圖分類號:TP393文獻標識碼:ADOI:10.19358/j.issn.2097-1788.2023.12.014
引用格式:夏令明,周俊,趙鋒.彈性自組織多集群管理系統設計與實現[J].網絡安全與數據治理,2023,42(12):84-89.
引用格式:夏令明,周俊,趙鋒.彈性自組織多集群管理系統設計與實現[J].網絡安全與數據治理,2023,42(12):84-89.
Design and implementation of a elastic self organizing multi cluster management system
Xia Lingming, Zhou Jun, Zhao Feng
Future Network Research Center, Network Communication and Security Purple Mountain Laboratory, Nanjing 211111, China
Abstract: When cloud native technologies such as Kubernetes are applied in the industry, their carrying capacity is limited, they cannot meet higher availability requirements, and are easily locked in by cloud providers. The implementation and operation of strategies such as Eastern Data and Western Computing need to be based on multi cluster management technology. However, traditional cloud management platforms cannot meet the challenges of service deployment and governance across multi cloud applications. Aiming at the above problems, this paper puts forward a new concept of softwaredefined selforganizing infrastructure management and idempotent hierarchical scheduling. An elastic infrastructure management architecture with clusters as the smallest unit is designed and implemented, which can make multiple Kubernetes clusters into a multicluster organization scheme with any topology structure such as central, decentralized and tree, and carry out cross cloud scheduling and management of applications. The tree structure is tested and compared with other solutions, which can well meet the huge number clusters organization and management requirements in the future distributed cloud scenario while keep the registration latency of cluster limit to 1 s, scheduler latency limit to 200 ms.
Key words : self organizing infrastructure; distributed cloud; idempotent hierarchical scheduling
引言
單Kubernetes[1]集群無法滿足邊緣、地域、資源管理等需求,因此在東數西算等典型多集群場景中[2],將不得不解決集群的接入控制、集群資源抽象、權限管理、應用管理、多集群調度、服務維持、多租戶以及多集群服務發現等問題[3-5],這大大增加了多集群方案的復雜性和難度。目前社區和業界,集群拓撲均以父子兩層架構為主,父集群作為主控集群,其余集群為子集群,用于承載工作負載,其中主流的有Kubefed[6-7]聯邦方案、Karmada[8]、Clusternet[9]、Admiralty[10]四種。Kubefed和 Karmada是一類,它們通過Template、Overide、Propgation 等定義負載的通用配置、專有配置和調度策略。Karmada 自Kubefederation發展而來,但是支持更豐富的插件化調度能力以及多集群服務(Multi cluster service)等特性,Karmada 也順利成為CNCF基金會孵化項目。但是這二者僅支持中心式的兩層架構,擴展性和承載力都存在理論瓶頸。Clusternet 項目是一個踐行了OCM模型的多集群方案,也入選了CNCF沙箱項目,子集群通過受控的Token,在子集群啟動時,接入到父集群之中。
作者信息
夏令明, 周俊,趙鋒
(網絡通信與安全紫金山實驗室 未來網絡研究中心, 江蘇南京211111)
文章下載地址:http://www.viuna.cn/resource/share/2000005882
此內容為AET網站原創,未經授權禁止轉載。