We’ve scaled Kubernetes clusters to 7,500 nodes, producing a scalable infrastructure for large models like GPT-3, CLIP, and DALL·E, but also for rapid small-scale iterative research such as Scaling Laws for Neural Language Models.
Another reason we’ve switched to using alias-based IP addressing is that on our largest clusters, we could possibly have approximately 200,000 IP addresses in use at any one time.
Holy Cluster IP, batman! 200k IPs!
Small note: This article is from early 2021, so they’ve probably gone beyond that!
Holy Cluster IP, batman! 200k IPs!
Small note: This article is from early 2021, so they’ve probably gone beyond that!