What are Kubernetes StatefulSets?: A Complete Guide

One of the reasons Kubernetes has become the cornerstone of modern IT infrastructures is that it’s truly an all-in-one orchestration platform. It offers API objects, controllers, and resources that cater to most – if not all – deployment use cases.

For instance, its robust support for stateful applications is a rare find in the open-source container orchestration space. Unlike stateless applications, which are more straightforward to manage, stateful applications – like financial systems and collaborative platforms – require specific considerations to maintain data integrity and consistency.

In this article, we explore Kubernetes StatefulSets, the API objects/controllers that allow us to run and manage Stateful applications in a Kubernetes cluster. We define them, share the best ways to create and manage them, and mention some of their limitations.

Understanding StatefulSets

When it comes to state management, applications are categorized into two types: stateful and stateless. Stateless applications, like web servers, are characterized by their ephemeral nature, which means that their state is not preserved across restarts or failures.

In contrast, stateful applications, like databases and message brokers, maintain persistent data that must be preserved even if the application restarts or fails. Kubernetes StatefulSets are purpose-built to orchestrate stateful applications in a Kubernetes cluster.

They ensure that each pod in a StatefulSet has a unique identity and maintains persistent storage. This is crucial for data consistency and application reliability in the event of pod disruptions.

Applications that require StatefulSets

The decision to use StatefulSets should be made after carefully considering the nature of the deployed applications. If any application requires persistent storage, a stable network identity, ordered deployment, and stateful scaling, then StatefulSets are the preferred (if not your only) choice. Here are a few examples:

Databases: Both relational and NoSQL databases store and manage critical data that must be preserved across restarts. Moreover, they often manage transactions, which require seamless continuity to maintain the integrity of data. These requirements warrant the deployment of databases in a StatefulSet setup.
Message brokers: Message brokers enable asynchronous communication between different components of a distributed architecture. Their primary responsibility is to ensure that every message gets delivered to the consumers, even in cases of crashes or failures. Using StatefulSets for message brokers guarantees that when an instance restarts, it retains all its queues and message logs so that data streaming can continue without any loss of messages.
Key-value stores: Systems like Redis and Consul that contain mission-critical key-value pairs can also benefit from StatefulSets. Consistent identities deliver data consistency and availability across instances.
Logging and monitoring tools: Applications like Elasticsearch, Zabbix, or New Relic, used for logging, monitoring, and visualization, can leverage StatefulSets for effective management of their stateful components.
Machine learning workloads: StatefulSets are ideal for deploying and scaling machine learning applications, distributed computing frameworks (like Apache Spark), and other resource-intensive workloads that need consistent storage and ordered deployment.

Components of StatefulSets

Here are the building blocks of a StatefulSet deployment in Kubernetes:

Headless service

A headless service is used to enable communication between pods in a StatefulSet. This service assigns a unique network identity to each pod, which is crucial for maintaining state and data across restarts.

Note: it’s the user’s responsibility to configure a headless service in a StatefulSet configuration; Kubernetes doesn’t do it by default.

Pod identity

Pod identity is a unique identifier for each pod that remains the same across restarts. It’s determined by using a combination of the pod’s ordinal, stable storage order, and the network identity assigned by the headless service. The ordinal represents a pod’s position within the StatefulSet. It starts from 0 (representing the first pod) and is incremented sequentially.

Pod selector

Much like other controllers, StatefulSets use a pod selector to determine which pods they manage. The pod selector allows the controller to target pods based on the assigned labels. This ensures that only the relevant pods are affected by scaling or update operations.

Volume claim templates

Volume claim templates define the persistent storage requirements for StatefulSet pods, including the storage class, access modes, and size of the persistent volumes for each pod.

Minimum ready seconds

A Minimum ready seconds parameter specifies the minimum duration for which a newly created pod must be ready, without any errors, before it can be considered available. This setting guarantees that pods have enough time to initialize and stabilize before they are put to use.

StatefulSets vs Deployments

StatefulSets and Deployments are both controllers used to orchestrate workloads in Kubernetes, but they serve different use cases. StatefulSets are used to deploy stateful applications, whereas deployments are primarily used to run stateless applications. Here’s a table summarizing the comparison between the two:

Aspect	StatefulSets	Deployments
Purpose	For stateful apps that require persistent storage and stable identities	For stateless applications that don’t need persistent identities or storage
Pod identity	Unique and stable across rescheduling	Dynamic, changes upon restart
Complexity	More complex and requires manual effort in setting up a headless service	Relatively easier and self-sufficient
Use cases	Databases, key-value stores, collaborative platforms	Web servers, microservices
Pod interchangeability	Not possible	Possible
Ordered deployment	Supported	Not supported

StatefulSets vs ReplicaSets

A ReplicaSet is primarily used with stateless applications to make sure that a specified number of identical pod replicas are always running. Here’s a table comparing ReplicaSet with StatefulSet.

Aspect	StatefulSets	ReplicaSets
Purpose	For stateful apps that need persistent storage and identity	For stateless applications, particularly when a specific number of identical replicas are required
Pod identity	Unique and stable across restarts	Replaceable identity
Complexity	More complicated to set up. Also includes extra manual work for configuring a headless service	Relatively easier to configure
Use cases	Databases, message brokers, machine learning workloads	Web servers, microservices
Pod interchangeability	Not possible	Possible
Ordered deployment	Supported	Not supported

Creating and managing StatefulSets

To learn how to set up a StatefulSet deployment, let’s look at a sample configuration file:


apiVersion: v1
kind: Service
metadata:
  name: headless-svc
  labels:
	app: headless-app
spec:
  ports:
  - port: 80
	name: http
  clusterIP: None
  selector:
	app: headless-app
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sample-statefulSet
spec:
  serviceName: "headless-svc"
  replicas: 4
  selector:
	matchLabels:
  	app: headless-app
  template:
	metadata:
  	labels:
    	app: headless-app
	spec:
  	containers:
  	- name: headless-container
    	image: registry.k8s.io/sample-image:0.11
    	ports:
    	- containerPort: 80
      	name: http
    	volumeMounts:
    	- name: www-volume
      	mountPath: /usr/share/headless
  volumeClaimTemplates:
  - metadata:
  	name: www-volume
	spec:
  	accessModes: [ "ReadWriteMany" ]
  	resources:
    	requests:
      	storage: 500Mi

The above YAML file starts by specifying the headless service at the top. We set the “clusterIP” field to “none” to indicate that it’s a headless service. We also define its name, selector, and label which are referenced later in the configuration.

Next, we define our StatefulSet named “sample-statefulSet”. We associate it with the headless service using the “serviceName” parameter, set the desired replicas to 4, and specify the container configuration and pod labels within the “template” section.

Finally, we use “volumeClaimTemplates” to define the Persistent Volume Claims (PVCs) that will be used by our StatefulSet. This includes the spec of the PVC, the access mode, and the resources (500Mi of storage).

Applying the above configuration will start the headless service and create the StatefulSet. Here’s the command to do so: (Replace sample.yaml with the actual name of your YAML file)


kubectl apply -f sample.yaml

Successful application should display an output like this:


service/headless-svc created 
statefulset.apps/sample-statefulSet created

To view more information about the service, you can run this command:


kubectl get service headless-svc

Expect an output like the following:


NAME 			TYPE		CLUSTER-IP	 EXTERNAL-IP	PORT(S)	AGE
headless-svc		ClusterIP	None     	<none>    80/TCP	  8s

To check the status of the StatefulSet, execute this:


kubectl get statefulset sample-statefulSet

The output should show you the number of replicas and age. For example:


NAME			DESIRED	CURRENT	AGE
sample-statefulSet 	      4    2     12s

As the pods are created, you should also be able to see a sequential order appended with their name. Run this command:


kubectl get pods app=headless-app

You should be able to see pods with names like sample-statefulSet-0, sample-statefulSet-1, sample-statefulSet-2, and sample-statefulSet-3 in the output. These names serve as sticky identities for these pods, and will persist for as long as the StatefulSet stays up.

Scaling a StatefulSet

Stateful applications often require manual intervention for scaling, mainly due to their reliance on persistent storage and identity. With that said, here are two ways you can scale a StatefulSet in a Kubernetes cluster:

Scale using kubectl

The kubectl utility makes it easy to scale a StatefulSet on the fly. Follow these steps to scale up:

Open a terminal window and run the following command. This will help you watch as new pods are created.
```
kubectl get pods -w -l app=headless-app 
```
In another terminal window, run the following command to scale:
```
kubectl scale sts sample-statefulSet --replicas=10 
```
If the scaling is successful, you should get this output:
```
statefulset.apps/sample-statefulSet scaled 
```
If you go back to the first terminal window, you should now be able to see pods being created. Make sure that the new pods have the correct sequential order appended to their names.

To scale down, follow these steps:

Open a terminal window and run the following command. This will help you watch as pods are terminated.
```
kubectl get pods -w -l app=headless-app 
```
In another terminal, run the following command to bring the replica count down to the desired amount:
```
 kubectl patch sts sample-statefulSet -p '{"spec":{"replicas":2}}' 
```
If the scaling operation is successful, you should get this output:
```
 statefulset.apps/sample-statefulSet patched 
```
Go back to the first terminal window to see pods switching to the “Terminating” status.

(Note: caling down will not work if any pod is in an unhealthy state. To scale down successfully, ensure that all the stateful pods are in a healthy and active state)

Scale by modifying the configuration file

You can also scale a StatefulSet by modifying the manifest (YAML) file. Follow these steps:

Create a backup of the existing manifest file so that you can revert in case of any issues.
Open the manifest file in your preferred editor and change the value of the .spec.replicas parameter as needed. To scale up, specify a value higher than the current value. Conversely, to scale down, define a value lower than the current value.

Apply the new configuration:


 kubectl apply -f sample-statefulSet.yaml

Updating StatefulSets

StatefulSet deployments can be updated automatically. The update strategy is specified by the “spec.updateStrategy” field of the StatefulSet configuration. Kubernetes supports the following strategies:

RollingUpdate

This is the default strategy to update pods in a StatefulSet. It sequentially updates all the pods in reverse ordinal order while ensuring data integrity and persistence throughout the process. To configure rolling updates, you can update the manifest file as follows:


 kubectl patch statefulset sample-statefulSet -p 
'{"spec":{"updateStrategy":{"type":"RollingUpdate"}}}'

You can also add the “partition” parameter to the “RollingUpdate” strategy to perform a rolling upgrade in distinct phases. Each update phase will then target a subset of pods, which allows for a gradual and controlled transition. To configure this rollout plan, add the partition parameter as follows:


 kubectl patch statefulset sample-statefulSet -p 
'{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2} 
}}}'

You can tweak the value of the partition based on your preferences.

onDelete

You can also set the update strategy of a StatefulSet to be OnDelete. With this strategy, the StatefulSet will not update pods by default. It will only create new pods when the user deletes any existing pods. For most practical use cases, this strategy is not recommended.

Deleting StatefulSets

To delete a StatefulSet, run this command:


 kubectl delete statefulset sample-statefulSet

It performs a cascading delete, which ensures that the StatefulSet and all its pods are deleted. If you run the following command while performing the delete, you will see the pods terminating one by one.


 kubectl get pods -w -l app=headless-app

StatefulSet best practices

Follow these best practices to get the most out of your StatefulSet deployments:

Create separate namespaces for each stateful application and its associated components, including ConfigMaps, Secrets, and Services. This isolation ensures that resources belonging to different applications are segregated from each other.
Exercise caution when scaling StatefulSets up or down. Avoid abrupt or simultaneous changes to preserve data integrity and application stability.
Formulate a refined update strategy that aligns with your business goals. Rolling upgrades are the default and recommended way to upgrade as they avoid downtime and risks of data loss. Consider using phased rollouts when making any significant updates.
Perform regular monitoring of your StatefulSet to track its performance and detect any security-related issues. You can leverage Site24x7’s Kubernetes monitoring tool for this purpose.
Use Horizontal Pod Autoscaling (HPA) to automatically adjust the number of stateful pod replicas based on CPU, memory usage, or any other application-specific metrics. Dynamic auto-scaling improves resource utilization and responsiveness to fluctuating workloads.
If you are using manual scaling strategies, make sure that all existing pods are in Ready state before scaling down a StatefulSet. This prevents scaling operations from failing or causing any service disruptions.
Use version control to manage StatefulSet configuration files. Version control helps you track changes and revert to older versions if needed.
Use built-in Kubernetes security features like role-based access control (RBAC), encryption at rest and in transit, and secure secret storage to protect your StatefulSet cluster from unauthorized access.
Develop a comprehensive rollback plan in case of unexpected issues during updates or scaling operations. The plan should outline the steps to revert to the previous state and minimize downtime.

Limitations

StatefulSets are a powerful tool to orchestrate stateful applications within a Kubernetes cluster. However, there are a few limitations to consider when using them:

When you delete or scale down a StatefulSet, the persistent volumes associated with the pods must be explicitly deleted using kubectl or other tools, as these volumes are not deleted automatically. Kubernetes does this to guarantee data safety, but it can lead to unnecessary storage consumption if not managed properly.
Deleting a StatefulSet doesn’t guarantee that the pods will be terminated in a predictable order. If ordered termination is needed, one workaround is to scale the StatefulSet down to zero before deleting it. This works because when you scale it down, Kubernetes deletes the pods in reverse ordinal order.
StatefulSets require the user to manually configure a headless service to manage the network identity of the application pods. This makes it harder to set up and manage compared to other controllers.
Due to a known issue in StatefulSets, if you configure rolling updates with the default pod management policy (i.e., OrderedReady), the cluster may go into a broken or inconsistent state that would require manual remediation.
StatefulSets have limited scalability compared to Deployments. Although both can be configured to auto-scale using HPA, manual scaling of a StatefulSet can be slower and more complex because you have to maintain ordered deployment and persistent storage.

Conclusion

Despite their limitations, StatefulSets are a great way to deploy, manage, and scale stateful applications inside a Kubernetes cluster. They support automated updates, guarantee persistent storage and sticky identities, and enable predictable scaling. In this article, we discussed how StatefulSets work, learned how to configure, deploy, upgrade, and scale them, and explored some best practices and limitationsg. We hope you found it useful.

Was this article helpful?

Sorry to hear that. Let us know how we can improve the article.

What are Kubernetes StatefulSets? How to deploy, troubleshoot, and scale