Reference

Overprovisioning

Overview

Fast startup times are important to providing a positive user experience with Pixel Streaming applications. However, a number of factors related to cloud infrastructure and application performance may lead to significant startup times, which could lead to users losing interest and navigating away from the application before the streaming session even starts. To help address these factors, Scalable Pixel Streaming provides support for maintaining always ready nodes, known as overprovisioning.

Overprovisioning allows allocating more computational resources than are required for the currently active streaming sessions by having a certain number of nodes prepared at all times, which can be immediately used by any new sessions without needing to wait for these resources to be provisioned on demand. This provides a direct trade-off between speed and cost, drastically reducing startup times of new Pixel Streaming sessions at the expense of increased cloud infrastructure costs.

The major factors that contribute to Pixel Streaming application startup times are:

  • Scheduling latency: Time required by the Kubernetes cluster to schedule the container for a new instance of a Pixel Streaming application once it has been allocated. It is determined by the underlying infrastructure of the cloud platform and may vary based on resource availability.

  • Application startup time: Time required by your Pixel Streaming application to load and become ready for streaming once the container is running. Application startup times are mostly influenced by its graphic intensity, the size and number of assets it needs to load, included plugins, etc.

Scalable Pixel Streaming currently supports a node overprovisioning mechanism that addresses the scheduling latency component.

Understanding node overprovisioning

Node overprovisioning only applies to cloud platforms that use virtual machines to power their Kubernetes clusters. This includes AWS, Azure and GCP. Node overprovisioning is not necessary on cloud platforms that provide bare metal Kubernetes clusters (such as CoreWeave) since scheduling latency on these clusters is typically negligible.

Kubernetes clusters schedule workloads across a set of one or more worker nodes, each of which can be either a bare metal machine or a virtual machine (VM). On cloud platforms where worker nodes are VMs, the pool of available worker nodes will typically grow and shrink dynamically to accommodate the current workload size. This reduces costs by ensuring VMs are only provisioned as needed, but introduces significant delays when scheduling new Pixel Streaming applications due to the need to wait for a fresh VM to be created and configured. On most popular cloud platforms, new worker node VMs with GPUs take several minutes to be created and become ready for scheduling workloads, which can be unacceptable for certain Pixel Streaming use cases.

To help mitigate the scheduling latency inherent to clusters using VM worker nodes, the Kubernetes cluster itself can be overprovisioned at the node level. This involves configuring a fixed number of additional worker nodes that should be provisioned at all times. As the pool of available worker nodes grows and shrinks in response to changes in workload size, Kubernetes will maintain the specified buffer of overprovisioned nodes in addition to the nodes that are actually required for the current workload. This ensures new Pixel Streaming application instances can be scheduled to a worker node immediately, so long as the number of new streaming sessions being established at any given time does not exceed the available number of overprovisioned worker nodes.

How much to overprovision?

Overprovisioning can drastically increase the costs of your infrastructure, so approach this decision carefully.

The extent to which you are comfortable incurring increased costs to improve stream startup times may vary significantly between different applications or even throughout the lifecycle of a single application. There are a number of potential factors to consider:

  • What is the audience for the application? What kind of stream startup times are users likely to accept? If there are multiple distinct groups of users (e.g. free users and paid subscribers), how do acceptable startup times vary between these groups?

  • Will the application receive different levels of user traffic at different times? Are these patterns consistent? If the application provides access to a live event, what will traffic patterns look like throughout the duration of the event?

  • Will the application be deployed across multiple geographic regions or multiple cloud platforms? If so, are different levels of overprovisioning appropriate for each region or platform depending on different levels of traffic patterns in some geographic regions, differences in scheduling latency across different cloud platforms, etc.?

You can consult best practices guides for both AWS Elastic Kubernetes Service (EKS) and Google Kubernetes Engine (GKE) for examples of calculating overprovisioning capacity based on estimated traffic rates. The general guidance is to measure scheduling latency and instance provisioning times on your selected cloud platform and calculate overprovisioning capacity based on the rate of new user connections you want to support.

Example calculations

Below are some examples to illustrate the main principles of calculating your overprovisioning needs:

  • If you anticipate one new user connection every 120 seconds, and your instance provisioning time is also 120 seconds, then you only need 1 overprovisioned node, as it will replenish at the same rate that the users will require it.

  • Following the same logic, if you anticipate one new user connection every 60 seconds and your instance provisioning time is 120 seconds, then you should overprovision 2 nodes to keep up with the demand.

  • And if your demand is very high, say 10 new user connections every second, then at the same instance provisioning time of 120 seconds, you will require 1200 overprovisioned nodes. Note that such high overprovisioning will drastically increase the costs of your infrastructure.

Configuring overprovisioning on SPS

Note that overprovisioning is capped at the maximum node pool size that is configured during SPS installation to prevent accidental uncapped charges.

Navigate to Settings in the side menu on your dashboard and find the Overprovisioning tab. Simply set up the desired number of overprovisioned nodes and save the settings. The nodes will become available to all applications across the cluster and will be used up by instances on a first come first served basis.

Minimum node size group

Note that in your SPS settings, you are also able to change the minimum size of a node group, which is slightly different from overprovisioning. The key difference between them is that in case of overprovisioning, the framework will immediately spin up a new node as soon as one of the “ready” pre-provisioned nodes is taken. This ensures that every new instance will have a ready node waiting for it. Whereas the minimum size of the node group allows you to reserve a certain number of nodes to be always running, if they get taken up, no new nodes will be created to replace them provisionally, and only on demand by instances instead.

Instance overprovisioning

Node overprovisioning only mitigates the scheduling latency, which means that for deployments where application startup times play a big role in the spin-up times, such as with graphically intensive Pixel Streaming applications, overprovisioning at the node level might not help solve the problem. In some cases, the UE project itself can be optimised to reduce the spin-up times, but if this is not possible, then the only way to reduce stream startup times to acceptable levels is to maintain a ready buffer of additional application instances, already loaded and ready to receive connections immediately. We are currently working on bringing this feature to you.