Introduction

In today’s cloud-native ecosystem, deploying applications across multiple AWS Availability Zones (AZs) is a common practice to achieve high availability and fault tolerance. However, this approach can inadvertently lead to increased Inter-AZ data transfer charges, impacting your cloud budget. Amazon Elastic Kubernetes Service (EKS) offers a solution with Topology Aware Routing (TAR), a feature that optimizes network traffic by preferring endpoints within the same AZ. This not only reduces cross-zone data transfer costs but also enhances application performance by minimizing latency.

In this blog post, we’ll explore how Topology Aware Routing works in Amazon EKS, its benefits, implementation steps, and best practices to help you optimize both cost and performance in your Kubernetes clusters.

What is Topology Aware Routing?

Topology Aware Routing is a Kubernetes feature that improves network traffic efficiency by considering the physical or logical topology of the cluster when routing service requests. It uses Topology Aware Hints (TAH) to guide traffic to service endpoints that are topologically “close” to the source pod, such as those in the same AZ or region.

By leveraging TAR, Kubernetes can:

  • Reduce cross-zone network traffic.
  • Lower data transfer costs.
  • Decrease network latency.
  • Improve overall application performance.

How Does Topology Aware Routing Work in Amazon EKS?

In a standard multi-AZ Amazon EKS cluster, when a pod sends a request to a Kubernetes Service, the default behavior is to route the request to any available endpoint, regardless of its location. This randomness can result in increased Inter-AZ traffic and higher latency.

With Topology Aware Routing enabled, the process changes as follows:

  1. Topology Labels: Nodes and pods are labeled with topology keys like topology.kubernetes.io/zone, indicating their AZ.
  2. Service Annotations: Services are annotated to enable TAR, instructing Kubernetes to consider topology when routing.
  3. Endpoint Selection: When a pod makes a request, Kubernetes prefers endpoints in the same AZ, reducing cross-zone traffic.

Example Scenario

  • A pod in us-east-1a sends a request to a service.
  • TAR directs the request to an endpoint also in us-east-1a.
  • If no endpoints are available in us-east-1a, the request falls back to other AZs to maintain availability.

Benefits of Using Topology Aware Routing in Amazon EKS

1. Cost Savings

By minimizing Inter-AZ data transfers, TAR significantly reduces associated charges. This is especially beneficial for applications with heavy intra-cluster communication, such as microservices architectures and distributed databases.

2. Improved Performance

Routing traffic within the same AZ decreases network latency, enhancing the responsiveness of your applications.

3. Enhanced Network Efficiency

Reducing cross-zone traffic alleviates the load on network links between AZs, leading to more efficient network utilization.

4. High Availability

TAR maintains high availability by falling back to endpoints in other AZs if none are available locally, ensuring that service disruptions are minimized.

Considerations and Limitations

While TAR offers substantial benefits, it’s important to be aware of certain considerations:

  • Scaling Dynamics: In environments using EC2 Spot Instances, Horizontal Pod Autoscaling (HPA), or Cluster Autoscaler, the distribution of pods across AZs can become uneven. This may cause the endpoint capacity in an AZ to drop below the default threshold (80%), prompting Kubernetes to ignore TAH and route traffic across AZs.
  • Configuration Complexity: Enabling TAR requires proper labeling and annotation of nodes, pods, and services, which adds configuration overhead.
  • Service Availability: If no endpoints are available in the same AZ, TAR will route traffic to other AZs to ensure service availability, potentially incurring Inter-AZ charges.

How to Enable Topology Aware Routing in Amazon EKS

Step 1: Verify Kubernetes Version

Ensure your EKS cluster is running Kubernetes version 1.21 or later, where Topology Aware Routing is generally available.

Step 2: Enable Feature Gates (if necessary)

For versions where TAR is not enabled by default, you may need to enable the TopologyAwareHints feature gate.

Step 3: Annotate Your Services

Update your Kubernetes Service manifests to include the following annotation:

yaml

Copy code

This tells Kubernetes to automatically use topology hints for routing.

Step 4: Ensure Nodes and Pods are Labeled

Verify that your nodes and pods have the correct topology labels:

  • Nodes should have topology.kubernetes.io/zone labels.
  • Pods inherit these labels when scheduled.

Step 5: Deploy and Monitor

After applying the changes, deploy your services and monitor network traffic to ensure that TAR is functioning as expected.

Best Practices for Implementing TAR

1. Uniform Pod Distribution

Maintain an even distribution of pods across AZs to ensure local endpoints are available, maximizing the effectiveness of TAR.

2. Monitor Endpoint Capacity

Regularly check the capacity of service endpoints in each AZ to prevent falling below the threshold where TAR would be ignored.

3. Consider Autoscaling Impacts

When using HPA or Cluster Autoscaler, configure them to maintain balanced scaling across AZs.

Use Cases for Topology Aware Routing

Microservices Architectures

Applications with numerous inter-service communications benefit from reduced latency and lower data transfer costs.

Latency-Sensitive Applications

Real-time applications, such as streaming services or online gaming platforms, can achieve better performance.

Additional Strategies to Reduce Inter-AZ Charges

While TAR is effective, consider combining it with other strategies:

  • Data Compression: Compress data before transfer to reduce bandwidth usage.
  • Caching: Implement caching mechanisms to minimize repeated data transfers.

Conclusion

Topology Aware Routing in Amazon EKS is a powerful feature that helps you optimize network traffic, reduce Inter-AZ data transfer charges, and improve application performance. By intelligently routing requests to endpoints within the same AZ, TAR minimizes unnecessary cross-zone communication.

However, successful implementation requires careful planning and monitoring, especially in dynamic environments with autoscaling or Spot Instances. By following best practices and considering the unique needs of your applications, you can leverage TAR to achieve both cost efficiency and high performance in your Kubernetes clusters. We, at NimbusStack can help you do exactly that!