How to monitor LKE in Akamai Cloud

Site24x7 monitors your Akamai Cloud Linode Kubernetes Engine (LKE) cluster availability in real time, continuously tracking status and outage events so platform teams can respond to cluster disruptions before workloads and services are impacted.

Use case

Cluster health: The Summary tab shows the status and availability of your LKE clusters in real time, helping the platform and DevOps teams quickly detect when a cluster goes down and start recovery before workloads are affected.

Downtime visibility: Downtime count and duration provide a clear timeline of cluster outages, making it easier to connect issues to deployments, infrastructure changes, or node failures and identify the root cause.

SLA assurance: Availability data over any selected period helps teams assess whether cluster uptime meets reliability expectations, providing clear insights for internal SLA tracking and performance reviews.

Setup and configuration

LKE resources are auto-discovered and monitored during the Akamai integration. To enable monitoring, follow the steps below:

Navigate to Cloud > Akamai > Add Akamai Cloud Monitor. Follow the steps to add an Akamai monitor.
While adding or editing an Akamai monitor, select LKE from the Service/Resource Types drop-down and click Save.
Go to Cloud > Akamai Cloud, select the created Akamai monitor, and then click LKE.

Note

LKE will be discovered during the next discovery cycle as per the discovery frequency you selected during Akamai monitor creation.

Data collection frequency

Based on the configured poll interval, performance metrics of your Akamai LKE are collected every five minutes and are updated in the Site24x7 portal every five minutes by default.

Supported metrics

Summary

The Summary tab tracks the overall availability status of each LKE cluster along with downtime incidents, downtime duration, and SLA compliance, giving your team an at-a-glance picture of whether your Kubernetes control plane and associated services are running without interruption.

For teams running production containerized workloads, this availability data is the first signal to act on when a cluster becomes unresponsive or enters a degraded state.