Durable Workflows in Kubernetes
When you enable State Persistence, Quarkus Flow stores enough information in the database to restore paused or abruptly ended workflows, resuming execution from the last successfully completed task.
However, to safely resume a workflow, the engine needs a deterministic "worker identity" (WorkflowApplication ID) so it knows which instance owns the execution.
In Kubernetes, Pod names and IPs are ephemeral. If a WorkflowApplication ID is derived from a Pod name, a rolling update or node drain will permanently destroy that ID, leaving the paused workflows "orphaned" in the database because no surviving worker claims that exact ID.
This guide explains how to solve this using Kubernetes Lease-based Coordination to maintain stable worker identities across pod disruptions.
1. The Architecture: Lease-Based Coordination
To decouple the workflow engine’s identity from the ephemeral Pod identity, Quarkus Flow binds the WorkflowApplication ID to a Kubernetes Lease name (e.g., flow-pool-member-durable-flow-00).
Unlike Pods, Leases are stable Kubernetes resources (coordination.k8s.io/v1).
The architecture works as follows:
-
The Leader: One Pod in the deployment acts as the leader. It monitors the
Deploymentreplica count and creates an exact matching number of empty "Member Leases". -
The Members: Every Pod in the deployment attempts to acquire exactly one Member Lease.
-
The Identity Binding: Once a Pod acquires a Lease, it sets its internal
WorkflowApplicationID to that Lease’s name. It continuously sends heartbeats to renew the lease. -
The Failover: If a Pod crashes, its heartbeat stops. The Lease expires, and when a new Pod spins up to replace the crashed one, it claims the abandoned Lease. The new Pod adopts the exact same
WorkflowApplicationID, allowing it to seamlessly resume the orphaned workflows from the database.
2. Add the Durable Kubernetes Extension
To enable this architecture, add the following dependency to your pom.xml:
<dependency>
<groupId>io.quarkiverse.flow</groupId>
<artifactId>quarkus-flow-durable-kubernetes</artifactId>
</dependency>
(You will likely also want quarkus-kubernetes to auto-generate your deployment manifests).
3. Configure the Pool and Readiness Probes
Define the name of your Lease pool in application.properties.
Crucially, you should gate your Kubernetes Readiness Probe on the lease acquisition. If a Pod hasn’t acquired a lease, it doesn’t have an identity, and therefore shouldn’t receive network traffic or pull workflows from the database.
# The pool name used to generate the Lease resources
quarkus.flow.durable.kube.pool.name=durable-flow
# Gate the Pod's Readiness status on successfully acquiring a lease
quarkus.flow.durable.kube.health.readiness.require-lease=true
4. Configure Kubernetes Manifests
4.1 Expose Pod Identity (Downward API)
The lease mechanism uses the physical Pod name as the holderIdentity inside the Lease resource. You must expose this to the application using the Kubernetes Downward API.
If you write your own YAML manifests, ensure your container environment variables include:
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
4.2 Configure RBAC Permissions
Your Pods need permission to read Deployments (to count replicas) and manage Leases.
If you use Quarkus to auto-generate your Kubernetes manifests, you can automatically generate the correct ServiceAccount, Role, and RoleBinding by adding these properties:
# Create and use a custom ServiceAccount
quarkus.kubernetes.rbac.service-accounts.durable-flow-sa.use-as-default=true
# Allow managing Leases
quarkus.kubernetes.rbac.roles.durable-flow-sa.policy-rules.0.api-groups=coordination.k8s.io
quarkus.kubernetes.rbac.roles.durable-flow-sa.policy-rules.0.resources=leases
quarkus.kubernetes.rbac.roles.durable-flow-sa.policy-rules.0.verbs=get,list,watch,create,update,patch,delete
# Allow reading Pods
quarkus.kubernetes.rbac.roles.durable-flow-sa.policy-rules.1.api-groups=
quarkus.kubernetes.rbac.roles.durable-flow-sa.policy-rules.1.resources=pods
quarkus.kubernetes.rbac.roles.durable-flow-sa.policy-rules.1.verbs=get,list,watch
# Allow reading Deployments and ReplicaSets
quarkus.kubernetes.rbac.roles.durable-flow-sa.policy-rules.2.api-groups=apps
quarkus.kubernetes.rbac.roles.durable-flow-sa.policy-rules.2.resources=deployments,replicasets
quarkus.kubernetes.rbac.roles.durable-flow-sa.policy-rules.2.verbs=get,list,watch
# Bind the Role to the ServiceAccount
quarkus.kubernetes.rbac.role-bindings.durable-flow-sa.subjects.durable-flow-sa.kind=ServiceAccount
quarkus.kubernetes.rbac.role-bindings.durable-flow-sa.role-name=durable-flow-sa
5. Verify the Architecture in Production
Once deployed, you can use kubectl to verify the leader election and sharding is working correctly.
View the Leases
kubectl get lease -l io.quarkiverse.flow.durable.k8s/pool=durable-flow -o wide
You should see one leader lease, and exactly as many member leases as you have Pod replicas.
Check Readiness
Execute a health check directly against a running Pod:
kubectl exec -it deploy/my-flow-app -- curl -s localhost:8080/q/health/ready
If configured correctly, the JSON output will include "leaseAcquired": true and "leaseName": "flow-pool-member-durable-flow-00".
Test Failover
To see the durability in action, manually delete a Pod holding a lease:
# Find which Pod holds lease '00'
POD=$(kubectl get lease flow-pool-member-durable-flow-00 -o jsonpath='{.spec.holderIdentity}')
# Delete that Pod
kubectl delete pod "$POD"
# Watch the lease seamlessly transfer to the new replacement Pod
kubectl get lease flow-pool-member-durable-flow-00 -w
See also
-
Configure state persistence — the database layer that makes this durability possible.
-
Quarkus: Generating RBAC Resources — full documentation on Quarkus Kubernetes manifest generation.