Infrastructure Management

Sphere Engine services rely on an internal infrastructure of execution units. These units play a crucial role in executing code within a scalable, secure, and independent sandbox environment.

Depending on the type of service being used, these execution units can take the form of either checker machines, workspace machines, or a combination of both. Regardless of the type, each unit operates as an independent machine, offering resources to facilitate the execution of end-user code.

In the case of checker machines, these resources are allocated as "slots" used for batch executions (meaning: non-interactive code runs). On workspace machines, the resources are also organized into "slots", but these slots are specifically dedicated to supporting continuous, interactive work within the runtime environment.

A single checker machine provides 4 "slots" for batch executions.

Infrastructure composition

By default, customer's infrastructure utilizes a certain number of base slots assigned to their account. That number is constant and depends on the customer's subscription plan. There are actually separate pools of base slots:

a common pool for submissions in the Compilers and Problems modules,
(soon!) a pool for submissions in the Containers module,
a pool for workspaces in the Containers module.

That base part of the infrastructure usually serves for handling the base traffic of the customer's service. Yet, this may not be sufficient to smoothly handle intense events like big coding competitions, exams, certifications, or other cases of simultaneous use of the service by a large number of end users.

For the cases of handling occasional events there is a possibility for dynamic scaling of the infrastructure of the checker machines. That allows for temporary increasing the number of "slots" available for batch executions. For each module (i.e., Problems, Compilers, and (soon!) Containers) it is possible to have an independent private cluster of machines hosted by Sphere Engine.

No separate cluster is required for workspace machines. The Sphere Engine service ensures that there are always sufficient resources available for workspaces, unless the customer's hard limit has been reached.

How to get access to dynamically scalable cluster?

If you wish to enhance your account's capabilities with a dynamically scalable infrastructure, please reach out to us through our open support channel at hello@sphere-engine.com or directly contact your dedicated account manager.

After Sphere Engine team sets up a cluster (or clusters), you can manage it through the dedicated panel, which can be accessed from the Sphere Engine Dashboard. Simply navigate to Menu > Infrastructure > Clusters or use this direct link.

Dynamic cluster parameters

As mentioned previously, each Sphere Engine module, including Problems, Compilers, and (coming soon!) Containers, can benefit from the support of a dynamic cluster. This cluster has the capability to adjust the number of available checker machines, allowing for scalability both upwards and downwards. The initial primary parameter denotes the active module, signifying that each dynamic cluster functions according to a specific module—such as Problems, Compilers, or (coming soon!) Containers.

The second key parameter is associated with the scaling strategy, which comes in two flavors: "manual" or "automatic." In the manual scenario, customers have the option to utilize the clusters panel to establish the number of checker machines according to their requirements. In contrast, the "automatic" strategy allows the infrastructure of checker machines to scale automatically within a predefined range. In this case, the clusters panel provides options for configuring the number of hot standby machines, along with a parameter related to latency that determines how quickly new machines are started or stopped.

Parameters for a manual cluster

For a manual cluster, there is just one parameter to be managed. The remaining parameters are managed by Sphere Engine Team.

Parameter	Description
Name	the cluster name that corresponds to the Sphere Engine module
Min	the minimum number of dynamic checker machines
Max	the maximum number of dynamic checker machines
Desired	the desired number of dynamic checker machines that should be running
Active machines	the current number of running checker machines
Strategy	the scaling strategy of the cluster

The name of editable field is in bold.

Parameters for an automatic cluster

Parameter	Description
Name	the cluster name that corresponds to the Sphere Engine module
Min	the minimum number of dynamic checker machines
Max	the maximum number of dynamic checker machines
Active machines	the current number of running checker machines
Extra	the number of hot standby checker machines (read more)
Inertia	the delay in making decisions about increasing or decreasing the number of checker machines (read more)
Strategy	the scaling strategy of the cluster

The names of editable fields are in bold.

The extra parameter defines the number of free machines expected to be ready to execute submissions at any moment. Sphere Engine will:

try to maintain the exact number of extra machines at all times,
launch new machines if the number of submissions in a given time increases,
turn the surplus machines off if the flow of submissions decreases.

For regular traffic and to keep the infrastructure costs as low as possible, the recommended value of extra is 1. For high-profile events with thousands of participants and hundreds of submissions per minute, it's recommended to increase the value of extra to 2-5.

A higher value of the extra parameter means:

greater number of submissions can be handled by the cluster in a given moment,
if the flow of submissions increases, new submissions can be handled immediately, giving Sphere Engine time to launch new extra machines,
queues are less likely to form,
infrastructure costs are higher.

The inertia parameter defines how many checks should Sphere Engine perform before launching new machines. The inertia set to 0 means that Sphere Engine will launch a new machine immediately after confirming that number of concurrently executed submissions is high enough to use up the extra machines (refer to the extra parameter).

Sphere Engine performs the said checks every few seconds. The inertia set to 1 or more means that Sphere Engine will, after the check interval, perform one or more additional checks before making the decision to launch a new machine.

For regular traffic and to keep the infrastructure costs as low as possible, the recommended inertia value is 2-4. For high-profile events with thousands of participants and hundreds of submissions per minute, it's recommended to decrease the inertia value to 0-1.

A lower value of the inertia parameter means:

Sphere Engine will launch new machines more eagerly,
queues are less likely to form,
infrastructure costs are higher.

Examples

Let's examine an example where a customer has 16 base slots allocated within their subscription plan. These slots serve as a shared resource for the Compilers and Problems modules, forming a static part of the infrastructure that enables concurrent execution of up to 16 submissions.

However, the customer's service often encounters periods of high demand, such as during regular programming events (e.g., examination sessions), which require a significantly larger infrastructure for a short duration of time. This increased demand affects both the Problems and Compilers modules of Sphere Engine. Moreover, the demand for "slots" in the Problems module exceeds the demand for "slots" in the Compilers module.

To effectively manage such events, the customer has access to dynamically scalable clusters. There are two separate clusters, one for the Problems module and another for the Compilers module. Each cluster can be independently configured, but for the sake of simplicity, let's assume that both clusters are configured as either manual or automatic clusters.

Manual clusters

In the case of the manual strategy, for each programming event, the customer needs to adjust cluster parameters to align with the anticipated demand.

For an expected demand of approximately 100 concurrent submissions in the Compilers module and roughly 160 concurrent submissions in the Problems module, a reasonable configuration just before the event might resemble the following:

Name	Min	Max	Desired	Strategy
Compilers	0	50	23	manual
Problems	0	50	38	manual

This configuration allocates 23*4 = 92 dedicated slots for the Compilers module and 38*4 = 152 dedicated slots for the Problems module. In combination with the 16 base slots shared between both modules, the requirements are met.

Automatic clusters

In the case of the automatic strategy, it may not be necessary to adjust cluster parameters manually. A general configuration that could be suitable is as follows:

Name	Min	Max	Extra	Inertia	Strategy
Compilers	0	50	2	1	automatic
Problems	0	50	3	1	automatic

With this configuration, the system will maintain the infrastructure prepared for submission surges:

for the Compilers module, it will aim to keep 2 * 4 = 8 additional dedicated slots,
for the Problems module, it will strive to maintain 3 * 4 = 12 additional dedicated slots.

The inertia parameter introduces a slight delay before each decision, which helps filter out short peaks. Nevertheless, with the value set to 1, the system remains highly responsive.

Alternatively, the cluster can be optimized to align with the predicted resource demand using the following configuration:

Name	Min	Max	Extra	Inertia	Strategy
Compilers	0	23	1	0	automatic
Problems	0	38	2	0	automatic

In this setup, the system ensures that the infrastructure remains prepared as follows:

for the Compilers module, it strives to maintain an additional 1 * 4 = 4 dedicated slots but will not exceed the maximum of 23 checker machines, which results in 23 * 4 = 92 dedicated slots in total,
for the Problems module, it aims to maintain an additional 2 * 4 = 8 dedicated slots but will not exceed the maximum of 38 checker machines, resulting in 38 * 4 = 152 dedicated slots in total.

The inertia parameter has been reduced to 0 to maximize responsiveness. This reduction can be considered a tradeoff for the lower values of the extra parameters, where it has decreased from 2 to 1 for the Compilers module and from 3 to 2 for the Problems module.