Nvidia has made its KAI Scheduler, a Kubernetes-native graphics processing unit (GPU) scheduling tool, available as open source under the Apache 2.0 licence.
KAI Scheduler, which is part of the Nvidia Run:ai platform, is designed to manage artificial intelligence (AI) workloads on GPUs and central processing units (CPUs). According to Nvidia, KAI is able to manage fluctuating GPU demands and reduced wait times for compute access. It also offers resource guarantees or GPU allocation.
The GitHub repository for KAI Scheduler said it supports the entire AI lifecycle, from small, interactive jobs that require minimal resources to large training and inference, all in the same cluster. Nvidia said it ensures optimal resource allocation while maintaining resource fairness between the different applications that require access to GPUs.
The tool allows administrators of Kubernetes clusters to dynamically allocate GPU resources to workloads, and can run alongside other schedulers installed on a Kubernetes cluster.
“You might need only one GPU for interactive work (for example, for data exploration) and then suddenly require several GPUs for distributed training or multiple experiments,” Ronen Dar, vice-president of software systems at Nvidia, and Ekin Karabulut, an Nvidia data scientist, wrote in a blog post. “Traditional schedulers struggle with such variability.”
They said the KAI Scheduler continuously recalculates fair-share values, and adjusts quotas and limits in real time, automatically matching the current workload demands. According to Dar and Karabulut, this dynamic approach helps ensure efficient GPU allocation without constant manual intervention from administrators.
They also said that for machine learning engineers, the scheduler reduces wait times by combining what they call “gang scheduling”, GPU sharing and a hierarchical queuing system that enables users to submit batches of jobs. The jobs are launched as soon as resources are available and in alignment with priorities and fairness, Dar and Karabulut wrote.
To optimise for fluctuating demand of GPU and CPU resources, Dar and Karabulut said that KAI Scheduler uses what Nvidia calls bin packing and consolidation. They said this maximises compute utilisation by combating resource fragmentation, and achieves this by packing smaller tasks into partially used GPUs and CPUs.
Dar and Karabulut said it also addresses node fragmentation by reallocating tasks across nodes. The other technique used in KAI Scheduler is spreading workloads across nodes or GPUs and CPUs to minimise the per-node load and maximise resource availability per workload.
In a further practice, Nvidia said KAI Scheduler also handles when shared clusters are deployed. According to Dar and Karabulut, some researchers secure more GPUs than necessary early in the day to ensure availability throughout. This practice, they said, can lead to underutilised resources, even when other teams still have unused quotas.
Nvidia said KAI Scheduler addresses this by enforcing resource guarantees. “This approach prevents resource hogging and promotes overall cluster efficiency,” Dar and Karabulut added.
KAI Scheduler provides what Nvidia calls a built-in podgrouper that automatically detects and connects with tools and frameworks such as Kubeflow, Ray, Argo and the Training Operator, which it said reduces configuration complexity and helps to speed up development.

