Java

Blue-Green Deployments on Kubernetes with Spring Boot and Gateway

Learn blue-green deployments on Kubernetes with Spring Boot and Spring Cloud Gateway for zero-downtime releases and instant rollback.

Blue-Green Deployments on Kubernetes with Spring Boot and Gateway

I remember the exact moment I realized our deployment process had to change. It was 2:47 AM on a Wednesday, and I was staring at a rollback that took twenty minutes while our API gateway returned 503 errors to paying customers. The new version of our order-service had a memory leak that only surfaced under production traffic, and by the time Kubernetes finished the rolling update, half the pods were unhealthy. We lost revenue, we lost trust, and I promised myself I’d never let a bad deployment take down the system again.

That’s why I turned to blue-green deployments. The idea is simple: run two identical environments—call them blue and green—and keep only one live at a time. When you need to release, you deploy the new version to the inactive environment, validate it thoroughly, and then flip the traffic router in an instant. If something breaks, you flip back just as quickly. Zero downtime, instant rollback, no scrambling with rolling updates.

But the magic doesn’t happen by itself. You need a proper Kubernetes topology, a smart gateway, and a clear strategy for health checks, database migrations, and session affinity. In this article, I’ll walk you through a production-grade implementation using Spring Boot microservices, Kubernetes, and Spring Cloud Gateway. I’ll share the exact manifests, configuration, and code I use, plus the mistakes I made so you don’t have to repeat them.

Let’s start with the foundational block: the dual deployment model. You must treat blue and green as separate Kubernetes Deployments, each with its own Service. That way, the gateway can route traffic to either slot by simply changing a route predicate. Here’s a minimal example of a blue deployment for a Spring Boot application called order-service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service-blue
  labels:
    app: order-service
    slot: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
      slot: blue
  template:
    metadata:
      labels:
        app: order-service
        slot: blue
    spec:
      containers:
        - name: order-service
          image: myregistry/order-service:1.4.2
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8080

Notice the slot: blue label. The green deployment will use slot: green and a different image tag. The most important part here is the readiness probe: Kubernetes will not send traffic to a pod until it passes. This gives your application time to warm up caches, connect to databases, and register itself in any service mesh. I once skipped the readiness probe and watched the gateway route traffic to a pod that was still loading its configuration—within seconds, all requests failed.

Now, how do you actually switch traffic? You can’t rely on Kubernetes Services alone because they load-balance across pods with matching selectors. Instead, you need an intelligent router. Spring Cloud Gateway is perfect for this because you can define routes programmatically or via configuration. The key is to have a route that points to either the blue Service or the green Service based on an external signal.

Here’s a simple way to do it using a custom RouteLocator:

@Bean
public RouteLocator customRouteLocator(RouteLocatorBuilder builder, @Value("${active.slot:blue}") String activeSlot) {
    return builder.routes()
        .route("order-service", r -> r.path("/orders/**")
            .uri("http://order-service-" + activeSlot + ".production.svc.cluster.local:8080"))
        .build();
}

The active.slot property is read from a ConfigMap or environment variable. When I change it from blue to green, the Gateway immediately starts routing new requests to the green backend. No restarts, no downtime. But wait—what about in-flight requests? Spring Cloud Gateway uses a non-blocking model, so existing connections continue to the old backend while new ones go to the new. That’s the beauty of stateless routing.

Of course, health checks are not optional. Your gateway should only route to a slot if the backend is healthy. I use Spring Boot Actuator with a custom health indicator that reports the slot’s version and readiness. Then I poll these endpoints from a small orchestrator service that decides when to promote the green environment.

The orchestrator is a simple Spring Boot app with a scheduled task that calls /actuator/health on a representative green pod. If all checks pass for a configurable duration (say 30 seconds), it updates the ConfigMap that the Gateway watches. If any check fails, it rolls back by setting active.slot back to blue.

Here’s the core logic:

@Component
public class BlueGreenOrchestrator {

    private final KubernetesClient kubernetesClient;
    private final RestTemplate restTemplate;

    public boolean validateGreenSlot() {
        Pod greenPod = kubernetesClient.pods().inNamespace("production").withLabel("slot", "green").list().getItems().get(0);
        String healthUrl = "http://" + greenPod.getStatus().getPodIP() + ":8080/actuator/health";
        ResponseEntity<Health> response = restTemplate.getForEntity(healthUrl, Health.class);
        return response.getStatusCode().is2xxSuccessful() && "UP".equals(response.getBody().getStatus());
    }

    public void promote() {
        ConfigMap configMap = kubernetesClient.configMaps().inNamespace("production").withName("gateway-config").get();
        configMap.getData().put("active.slot", "green");
        kubernetesClient.configMaps().inNamespace("production").replace(configMap);
    }
}

I built this after a painful night when a green deployment had a database migration that failed silently. The health check caught it because the migration failure caused the app to report a 500 on the health endpoint. Without this validation, we would have promoted a broken version into production.

Now, what about database schema changes? This is the trickiest part of blue-green deployments. If your new version expects a different schema, you must ensure backward compatibility. The blue version must continue to work with the new schema because it’s still live during the transition. The standard pattern is: first, apply schema changes that are backward-compatible (e.g., add new columns with defaults, rename tables by adding views). Then, switch traffic. Finally, in a subsequent release, remove the old columns or tables.

Another concern is sticky sessions. Blue-green works best with stateless services. If your application relies on HTTP sessions stored in memory, flipping the gateway will break those sessions. The fix is to externalize session state to Redis or a database. I learned this the hard way when users started getting logged out after a promotion. Now every service in our architecture uses a centralized session store.

You might ask: why not use canary deployments instead? Canary releases are excellent for gradual rollouts and A/B testing, but they don’t give you the same instant rollback guarantee. With blue-green, you can revert in less than a second by switching the route back. With canaries, you have to wait for traffic to drain from the old pods. Each strategy has its place. For major version upgrades involving breaking changes, I always choose blue-green.

Let me show you a complete integration test using Testcontainers and WireMock. This test spins up a real Gateway, a mock backend for blue and green, and verifies that the route changes when the active slot changes:

@Testcontainers
class BlueGreenRouteTest {

    @Container
    static GenericContainer<?> gateway = new GenericContainer<>("springcloud/gateway:4.0.0")
        .withEnv("ACTIVE_SLOT", "blue")
        .withExposedPorts(8080);

    @Test
    void shouldRouteToBlueInitially() {
        String response = RestTemplate.getForEntity("http://localhost:" + gateway.getMappedPort(8080) + "/orders/1", String.class);
        assertThat(response.getBody()).contains("blue");
    }

    @Test
    void shouldRouteToGreenAfterPromotion() {
        // Simulate promotion by updating env
        gateway.withEnv("ACTIVE_SLOT", "green");
        gateway.start(); // in practice, you'd use ConfigMap
        String response = RestTemplate.getForEntity("http://localhost:" + gateway.getMappedPort(8080) + "/orders/1", String.class);
        assertThat(response.getBody()).contains("green");
    }
}

Testing the deployment pipeline before it hits production saves you from the worst kinds of surprises. I schedule these tests in CI before every release.

One final piece of advice: never hardcode the active slot. Use a ConfigMap that can be updated without restarting the Gateway. Spring Cloud Gateway supports dynamic route refreshing via @RefreshScope or by implementing a RouteDefinitionRepository that watches for changes. I prefer the latter because it’s fully reactive:

@Component
public class ConfigMapRouteDefinitionRepository implements RouteDefinitionRepository {

    private final KubernetesClient client;

    @Override
    public Flux<RouteDefinition> getRouteDefinitions() {
        ConfigMap config = client.configMaps().inNamespace("production").withName("gateway-routes").get();
        return Flux.fromIterable(config.getData().entrySet())
            .map(entry -> {
                RouteDefinition def = new RouteDefinition();
                def.setId(entry.getKey());
                def.setUri(URI.create(entry.getValue()));
                return def;
            });
    }
}

This approach scales to any number of microservices. When I add a new service, I just add an entry to the ConfigMap. The Gateway picks it up without a restart.

After implementing this pattern, our deployment windows shrank from forty-five minutes to under ten seconds of actual traffic switch. The last major release we did—upgrading the order-service from Spring Boot 2.7 to 3.3, including a database schema change—went flawlessly. We flipped traffic at 2 PM on a Tuesday. Nobody noticed. That’s the feeling every engineer should target: boring, reliable, zero-downtime deployments.

Now I’m curious: what’s the biggest deployment failure you’ve had? For me, it was that 2:47 AM memory leak. I fixed it by adopting blue-green, and I’ve never looked back. If this article helped you see a path to safer releases, hit like, share it with a colleague who’s still doing rolling updates, and leave a comment with your own war story. Let’s make production deployments a non-event for everyone.


As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!


📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!


Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

// Similar Posts

Keep Reading