Cost Effective and Flexible Provisioning

– Overview

Hello, I’m a lead developer at MeFriend.ai. We are running our service on AWS EKS, and with a rapid increase in users, it became essential to ensure stable, without server down. At the same time, we looked for ways to manage it flexibly and cost-effectively.

MeFriend.ai – Click Here!

We use Karpenter for node provisioning, an open-source, high-performance auto-provisioning tool.

In addition to provisioning nodes, we apply pod provisioning through Horizontal Pod Autoscaling (HPA). This setup enables a flexible and reliable structure where both nodes and pods can autoscale if they reach risky thresholds, ensuring stable operation.

Moreover, to optimize costs, we set our provisioning principle:

Spot Instances are always prioritized. But at least one pod must be run on on-demand instance.
If unavoidable, On-Demand Instances are also used.

Our ideal setup is for only one pod to run on an On-Demand Instance whenever possible. Any additional pods that are provisioned would ideally be allocated to Spot Instances. This policy strikes the right balance, ensuring cost efficiency while maintaining stability.

Now, let’s dive into how we can run our service in a flexible and cost-effective way.

– Steps

How are we going to make it?
Install Karpenter
Define Provisioning Rule
Install HPA
Result

– How are we going to make it?

For stability, at least one pod must run on an On-Demand Instance, while any additional pods should be allocated to Spot Instances as much as possible.

How can we achieve this?

Karpenter doesn’t offer a direct feature for this setup, so we approached the solution from a different perspective.

1. Separate deployments in Kubernetes for On-Demand and Spot Instances

First, in Kubernetes, we defined separate deployments for On-Demand and Spot instances when creating our Deployment. This setup ensures that specific instance types are provisioned for each deployment.

With this configuration, one pod runs on an On-Demand Instance and another on a Spot Instance by default.

2. Configure HPA to more easily and frequently autoscale the Spot Deployment.

To prioritize allocation to Spot Instances, we aimed for new pods to be created under the Spot Deployment rather than the On-Demand Deployment.

This was achieved by configuring the Horizontal Pod Autoscaler (HPA) to more readily scale pods under the Spot Deployment, ensuring new workloads are handled cost-effectively.

Horizontal Pod Autoscaler

When defining a Deployment in Kubernetes, resource limits for the container must be specified. For simplicity, we based provisioning on metrics like CPU utilization and memory usage.

Both the On-Demand Deployment and Spot Deployment share the same selector, allowing the load balancer to distribute traffic evenly between them.

Since they have the same resource limits, CPU and memory usage will increase at similar rates.

By setting a lower autoscaling threshold for the Spot Deployment’s HPA, we ensure that Spot Deployment scales first, creating new pods as needed. This approach helps us achieve our goal of prioritizing Spot Instances for cost-effective scaling.

Karpenter

HPA and Karpenter share similarities yet differ in important ways. Both support flexible scaling and assist with autoscaling.

What is difference?

The key difference is that HPA monitors the resource usage of pods, scaling them up when they exceed a defined threshold.

On the other hand, Karpenter checks for any pending pods unable to be allocated due to there are no node to be provisioned.

In other words, Karpenter is responsible for creating the servers (nodes) on which pods run, while HPA manages the scaling of the pods within those servers.

Karpenter <-> Server (i.e. instance, node)
HPA <-> Pod

Understanding the principles, roles, and differences between these two services is essential for designing an effective autoscaling strategy.

– Install Karpenter

We manage all infrastructure setup as code, known as IaC (Infrastructure as Code). Among the IaC tools, we use Terraform to build our infrastructure.

The following code is part of our Terraform configuration for setting up Karpenter in the AWS infrastructure.

_{(A code below only contains the section for installing Karpenter using Helm, not policy configurations, so on. )}

resource "helm_release" "karpenter" {
  depends_on = [
    aws_iam_service_linked_role.spot,
    null_resource.kube_config_update,
    aws_iam_role_policy_attachment.attach_ec2_spot_service_linked_role_policy
  ]

  namespace        = "karpenter"
  create_namespace = true

  name             = "karpenter"
  repository       = "oci://public.ecr.aws/karpenter"
  repository_username = data.aws_ecrpublic_authorization_token.token.user_name
  repository_password = data.aws_ecrpublic_authorization_token.token.password
  chart            = "karpenter"
  version          = "v0.34.0"

  values = [
    <<-EOT
    settings:
      clusterName: ${module.eks.cluster_name}
      clusterEndpoint: ${module.eks.cluster_endpoint}
      interruptionQueue: ${module.karpenter.queue_name}
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: ${module.karpenter.iam_role_arn}
    EOT
  ]
}

There are several ways to install Karpenter in a cluster, and you can easily find Karpenter installation guides online. Typically, it’s installed using one of the following two methods:

Installing with Terraform
Installing with Helm Chart

If you install Karpenter successfully, you can check a status of Karpenter pods in your cluster running.

– Define Provisioning Rule

We manage provisioning-related policies using Helm charts.

Below are the Kubernetes YAML configurations related to Karpenter. Karpenter defines resources by separating them into node classes and node pools.

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: {{ .Values.ec2NodeClass.role }}
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: mefriend-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: mefriend-cluster
  tags:
    karpenter.sh/discovery: mefriend-cluster
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 10Gi
        volumeType: gp3
        deleteOnTermination: true

Node Class

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  limits:
    cpu: 256
  template:
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values:
{{- range .Values.nodePool.default.instanceTypes }}
            - "{{ . }}"
{{- end }}
        - key: "karpenter.sh/capacity-type"
          operator: In
          values:
          - "on-demand"
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s

Node Pool

Additionally, here is the Kubernetes YAML file for HPA. By setting the threshold to 85% for the On-Demand Deployment and 70% for the Spot Deployment, we ensure that Spot Instances are prioritized and created first.

As mentioned earlier, HPA monitors pod resource usage, so a metrics server must also be installed in the cluster.

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mai-on-demand-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mai-deployment-on-demand
  minReplicas: 1
  maxReplicas: 3
  behavior:
    scaleUp:
      policies:
      - type: Pods
        value: 1
        periodSeconds: 30
    scaleDown:
      policies:
      - type: Pods
        value: 1
        periodSeconds: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 85
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 85
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mai-spot-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mai-deployment-spot
  minReplicas: 1
  maxReplicas: 10
  behavior:
    scaleUp:
      policies:
      - type: Pods
        value: 1
        periodSeconds: 30
    scaleDown:
      policies:
      - type: Pods
        value: 1
        periodSeconds: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

HPA

Additionally, to ensure that a Deployment is assigned to the desired type of node, you need to configure affinity and tolerations.

affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "karpenter.sh/capacity-type"
                operator: In
                values:
                - "spot"
      tolerations:
        - key: "spot-only"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"

Spot Deployment Affinity / Tolerations Settings

– Install HPA

We don’t install HPA manually. All infra resources are installed by terraform.

resource "helm_release" "provisioner" {
  depends_on = [
    aws_iam_service_linked_role.spot
  ]

  name             = "provisioner"
  chart            = "../../helm/provisioner"

  values = [
    <<-EOT
    ec2NodeClass:
      role: ${module.karpenter.node_iam_role_name}
      discoveryTag: ${module.eks.cluster_name}
    EOT
  ]
}

Managing infrastructure with Terraform offers the advantage of setting roles, tags, and even Spot Instance roles dynamically, allowing for a more streamlined and flexible setup.

– Results

To check if your provisioning settings are successfully installed, check below pods are all running state.

Karpenter

Node Pool and Node Class

Metrics Server

HPAs

Now, we can run our service in a cost-effective and flexible manner.

However, to optimize costs and ensure fault tolerance, it’s also essential to monitor various factors—such as application bugs and whether the provisioned node types are appropriate for specific use cases.

Spending time to fine-tune parameters during actual operations is crucial for finding the optimal setup.

Thank you for reading this detailed post!

_{(The k8s resources are examples created} _{for illustrative purposes and do not reflect our actual environment.)}

If you see our other cost-effective strategy, let’s dive in my other posting!

AWS Cost optimization by reducing image cost – Click Here!