The argument for AWS Spot Instances

A graphic representing a complex technical environment of AWS

In a spot market, buyers and sellers immediately exchange financial securities for cash. The spot price is what buyers and sellers agree to pay or receive for an asset. When applying this concept to the pool of unused resources in public clouds, the resulting market lists resources at their lowest possible price. In AWS, Spot Instances are the lowest price AWS will offer for a fixed amount of computing resources over an unspecified period. AWS Spot Instances provide a cost-effective way to access resources, significantly reducing prices. However, like all ‘bargains,’ there’s a catch. Before we delve into that, let’s first consider why teams might choose Spot Instances.

Explaining Spot Instances

  • Spot Instances are a type of Amazon Elastic Compute Cloud (EC2) instance that utilizes spare capacity within the AWS cloud.
  • These instances are available at significantly reduced rates compared to On-Demand prices.
  • AWS offers this excess capacity to users, allowing them to leverage it for their workloads.

Why use Spot

The primary reason teams opt for Spot Instances is their cost-effectiveness. While AWS Spot Instances can reduce your initial purchase price, they may also lead to higher expenses in other areas, as explained below:

Good Spot Instance use cases

Spot Instances are well-suited for workloads that are flexible in terms of timing and can tolerate interruptions. Here are some common use cases:

  • Data Analysis: Spot Instances are ideal for running large-scale data analysis jobs where you can take advantage of available capacity during off-peak hours.
  • Batch Processing: Background tasks, batch jobs, and other non-time-sensitive workloads can benefit from Spot Instances.
  • Optional Tasks: If you have tasks that can be interrupted without causing critical issues, Spot Instances are a good fit.

Key differences between Spot Instances and On-Demand Instances

Key differences between Spot Instances and On-Demand Instances

How does spot work?

Spot functions with a very simple principle and AWS has stated how it works very clearly. To use Spot Instances, you create a Spot Instance request that includes the desired number of instances, the instance type, and the Availability Zone. If capacity is available, Amazon EC2 fulfills your request immediately. Otherwise, Amazon EC2 waits until your request can be fulfilled or until you cancel the request.

The following illustration shows how Spot Instance requests work.

how Spot Instance requests work

Notice that the request type (one-time or persistent) determines whether the request is opened again when Amazon EC2 interrupts a Spot Instance or if you stop a Spot Instance. If the request is persistent, the request is opened again after your Spot Instance is interrupted. If you have a persistent request and stop your Spot Instance, the request will only reopen once you restart your Spot Instance. Essentially, you place a bid, and if it’s accepted, you receive the resource. If it’s rejected, you continue to wait until you decide to cancel the request.

SpotQuakes and the downside of using AWS Spot Instances

A few years back we touched on Spot Quakes, our affectionate name for the event in which you lose all your spot instances in one go; things get a little bumpy.  When a spot instance is about to be taken away you receive an event, this event is your 2-minute warning; shutdown and evacuate. This inherent characteristic means Spot Instances are better suited to workloads that can handle interruptions gracefully.

In addition, the following aspects of Spot instance should be observed:

  1. Price Volatility
    • Spot Instance prices fluctuate based on supply and demand. While they are generally much lower than On-Demand prices, sudden spikes can occur.
    • If the price increases significantly, your Spot Instances may be terminated automatically. It’s essential to monitor prices and set appropriate bid prices.
  2. Bid Strategy
    • When launching Spot Instances, you need to specify a bid price. If your bid is below the current market price, your instances may be terminated.
    • Choosing the right bid strategy (e.g., bidding at the On-Demand price or slightly above) is crucial to avoid frequent interruptions.
  3. Workload Compatibility
    • Not all workloads are suitable for Spot Instances. Real-time applications, databases, and mission-critical services may not tolerate interruptions.
    • Analyze your workload’s characteristics and determine if it aligns with Spot Instance behavior.
  4. Capacity Availability
    • While Spot Instances are usually available, there might be times when capacity is scarce due to high demand.
    • If your workload relies heavily on Spot Instances, consider diversifying across multiple instance types or regions.
  5. Stateful Workloads
    • Stateful applications (those that maintain internal state or data) may face challenges with Spot Instances.
    • If an instance is terminated, any unsaved data could be lost. Ensure your application handles state appropriately.

How can we manage the variability or potential loss of  AWS Spot Instances?

Initially AWS offered Spot Blocks or Spot Instances with a defined duration, unfortunately these are no longer. EC2 Spot Blocks & Spot Instances with a defined duration were deprecated and EOL’d  In July 2021 by AWS.

Update July 2021 – Spot Instances with a defined duration (also known as Spot blocks) are no longer available to new customers as of July 1, 2021. For customers that have previously used the feature, we will continue to support Spot Instances with a defined duration until December 31, 2022. If your workload is interruption tolerant, we recommend that you use Spot Instances without setting a defined duration. If your workload is not interruption tolerant we recommend that you use On-Demand instances for the required duration of your workload. For the most up to date information please see our documentation here.

This means you need to build or implement a third party solution to automate the bidding to help combat price fluctuations and orchestrate your infrastructure. Below are some common options:

Karpenter

  • Karpenter’s primary responsibility is to provision compute capacity for your Kubernetes clusters.
  • You define a NodePool configuration in Karpenter, specifying instance types, Availability Zones, and capacity types.
  • Karpenter can handle Spot Instance interruptions gracefully.
  • Starting from version 0.19.3, it is recommended to use Karpenter’s native interruption handling rather than a standalone Node Termination Handler1.

Spot Ocean

  • It automatically handles instance interruptions, ensuring your workloads meet service level agreements (SLAs).

kOps Toolbox Instance Selector

  • If you’re using kOps to manage your Kubernetes clusters, consider leveraging the kOps Toolbox Instance Selector.
  • This tool simplifies the creation of Instance Group configurations adhering to Spot Instances best practices.
  • It allows flexibility in choosing instance types and optimizes allocation strategies for efficient Spot usage.

What’s the best alternative?

If you’re wondering what else can be done then you are not alone. AWS, and specifically EKS can create a massive amount of hard to remove waste and Sysdig and Datadog have the data to prove it, (check out the 2023 report).

Our approach leverages AWS Bare Metal and virtualization to provide a new type of computing engine inside your AWS account. Our latest product innovation, Elastic Machine Pool removes the need to use spot instances, automates infrastructure and removes the need to constantly change resource requests.

You may also enjoy

Mastering the operational model challenge for distributed AI/ML infrastructure

By Kamesh Pemmaraju

Kubernetes FinOps: Basics of cluster utilization

By Joe Thompson

The browser you are using is outdated. For the best experience please download or update your browser to one of the following:

State of Kubernetes FinOps Survey – Win 13 prizes including a MacBook Air.Start Now
+