I remember the night my team pushed a minor update to our order service and watched half the requests fail because the new API contract broke an internal dependency. Users saw “500” errors for two minutes before we could roll back. That was the moment I decided rolling updates were no longer acceptable for any service under my watch. We needed a deployment strategy that gave us surgical control, instant rollback, and zero impact on traffic. This article walks through exactly how I built that system using Spring Boot, Kubernetes, Istio, and Spring Cloud Gateway.
Why This Idea Came to Mind
The incident happened at 2 AM. A simple field rename in a JSON response caused a downstream service to crash. Rolling updates keep two versions alive simultaneously, which means during the transition, some requests hit the old version and some hit the new. If there’s a breaking change, you risk partial failures. I needed a way to deploy the new version, verify it completely in isolation, and then switch traffic atomically. Blue-green deployment became my answer. And to make it production-ready, I combined Kubernetes for infrastructure, Istio for traffic management, and Spring Cloud Gateway for intelligently steering requests.
The Problem with Standard Rolling Updates
Have you ever deployed a new version and seen database connection errors appearing on the old pods? That’s because rolling updates don’t isolate environments. When two versions run under the same Kubernetes Service, they share the load. If version 2 changes the database schema, old pods break. If version 2 introduces a bug in a new REST endpoint, some of your users experience it. Rolling back requires another rolling update, which takes time. Meanwhile, your users are already affected.
Blue-green deployment solves this. You keep the stable version (blue) running unchanged. You spin up the new version (green) in a separate set of pods. You validate green thoroughly with internal traffic. Only when you’re confident do you flip the router to send all traffic to green. If something goes wrong, you flip back in seconds.
The Architecture: Three Layers of Control
The system I built uses three layers: the application layer (Spring Cloud Gateway), the service mesh layer (Istio), and the orchestration layer (Kubernetes). Each layer adds a different kind of intelligence. Let me explain how they work together.
At the top, external traffic enters through a load balancer and reaches Spring Cloud Gateway. The Gateway can inspect HTTP headers, cookies, or user attributes to decide where to route. For example, internal testers get sent to the green environment while external users continue hitting blue. This is useful for gradual rollout and A/B testing.
Below the gateway, Istio’s VirtualService handles the actual traffic split. Istio gives you weighted routing: send 100% of traffic to blue, then shift 10% to green, then 50%, then 100%. The switch is instantaneous. And because Istio works at the mesh level, it also manages retries, timeouts, and circuit breaking.
Kubernetes does the heavy lifting of managing pods. I create two separate Deployments—one for blue, one for green—each with its own label and version. They exist simultaneously but only one receives live traffic at any moment.
Building the Spring Boot Microservice
Every service needs health endpoints that Kubernetes can probe. Spring Boot Actuator provides this out of the box. I configure management.endpoint.health.probes.enabled=true and let Spring expose /actuator/health/readiness and /actuator/health/liveness.
management:
endpoints:
web:
exposure:
include: health,info,prometheus
endpoint:
health:
probes:
enabled: true
The readiness probe tells Kubernetes when the pod is ready to serve traffic. I customize it to check if the database connection pool is warmed up and any caches are loaded. Once the probe passes, the pod is added to the Service’s endpoints.
Here’s a snippet of the readiness check in code:
@Component
public class DatabaseHealthIndicator implements HealthIndicator {
@Autowired
private DataSource dataSource;
@Override
public Health health() {
try (Connection conn = dataSource.getConnection()) {
if (conn.isValid(1)) {
return Health.up().build();
}
} catch (SQLException e) {
return Health.down().withDetail("error", e.getMessage()).build();
}
return Health.down().build();
}
}
Why is this important? Because during a blue-green switch, the new pods must be fully healthy before they receive traffic. The readiness probe is the gatekeeper.
Deploying the Blue and Green Environments
Each environment gets a Kubernetes Deployment and a Service. But here’s the key: I use labels to differentiate them, and I only expose one Service at a time to the outside world. The blue deployment has label version: blue, green has version: green. Both have the same app label, say app: order-service.
But I don’t use a single Service with a selector that matches both. Instead, I create two Services: one named order-service-blue and one order-service-green. This allows Istio to route traffic selectively.
apiVersion: v1
kind: Service
metadata:
name: order-service-blue
spec:
selector:
app: order-service
version: blue
ports:
- port: 8080
Now I can update the blue Service when I do a switch. But actually, I don’t change the Service at all. Istio’s VirtualService points to both services and uses weights.
Configuring Istio VirtualService and DestinationRule
The Istio VirtualService is the core of the traffic switch. Here’s a typical configuration:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: order-service
spec:
hosts:
- order-service
http:
- route:
- destination:
host: order-service-blue
port:
number: 8080
weight: 100
- destination:
host: order-service-green
port:
number: 8080
weight: 0
Initially, 100% goes to blue, 0% to green. When I’m ready to test green internally, I change the weights to 90‑10 or 50‑50. I can also add match rules based on headers. For example, any request with header X-Canary: internal goes to green, while all others stay on blue.
The DestinationRule defines how each subset behaves. I can set connection pool sizes, outlier detection, or TLS.
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: order-service
spec:
host: order-service
subsets:
- name: blue
labels:
version: blue
- name: green
labels:
version: green
Now I have full control. I can shift traffic gradually, monitor for errors, and if something goes wrong, I just set the green weight back to zero.
Spring Cloud Gateway: Intelligent Traffic Steering
Spring Cloud Gateway adds application-level intelligence. I can inspect JWT tokens, user roles, or session cookies to route internal users to the green environment without affecting external users.
@Bean
public RouteLocator customRouteLocator(RouteLocatorBuilder builder) {
return builder.routes()
.route("order-service", r -> r
.header("X-Canary", "enabled")
.uri("lb://order-service-green"))
.route("order-service", r -> r
.uri("lb://order-service-blue"))
.build();
}
But wait—if I use Spring Cloud Gateway as the ingress point, I lose the ability to do percentage-based shifts because Gateway doesn’t natively support weighted routing. That’s why I use Istio for the actual traffic split, and Gateway for additional intelligence. The gateway routes to Istio’s ingress gateway, which then applies the VirtualService.
Alternatively, I can configure Spring Cloud Gateway with a custom filter that randomly selects the environment based on a probability. But I prefer Istio because it’s more robust and gives me observability.
Database Schema Compatibility: The Tricky Part
The hardest part of zero-downtime deployments is database changes. If the green version adds a new column or changes an existing one, the blue version must continue to work on the same schema. This means every migration must be backward-compatible.
I follow these rules:
- Only add columns (not remove or rename).
- New columns have default values or allow NULL.
- Code changes must handle both old and new shapes of data.
- Use Flyway with separate migration scripts for each version, and run them only after the switch is fully completed.
In practice, I run the database migration after the green environment is validated but before the traffic switch. This ensures the new schema is ready. But what about the blue pods? If the migration adds a column, blue pods must tolerate it—either by ignoring extra columns in the result set or by having code that reads only known fields. Using @DynamicInsert and @DynamicUpdate in JPA helps.
Here’s an example of a migration that adds a column safely:
ALTER TABLE orders ADD COLUMN IF NOT EXISTS discount_code VARCHAR(50) DEFAULT NULL;
This is safe because IF NOT EXISTS prevents errors if the column already exists, and the default NULL means existing code won’t break.
Automating Rollback Triggers
I define a Kubernetes liveness probe that checks an internal health endpoint. If the green environment starts returning 500s or latency spikes, the probe fails. But Kubernetes will restart the pod, which doesn’t solve the problem. Instead, I use a custom metric in Prometheus to trigger an automated rollback via a webhook.
For example, I monitor the error rate on the green service. If it exceeds 1% for more than 30 seconds, a script updates the Istio VirtualService to send 100% traffic back to blue. This rollback happens in seconds.
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: order-service
spec:
hosts:
- order-service
http:
- route:
- destination:
host: order-service-blue
port:
number: 8080
weight: 100
- destination:
host: order-service-green
port:
number: 8080
weight: 0
EOF
This is instant. No waiting for pod restarts.
Observability: Knowing When to Flip
You can’t manage zero-downtime deployments without metrics. I expose Prometheus metrics from Spring Boot (Micrometer) and use Grafana dashboards to compare error rates, latencies, and request volumes between blue and green.
I also set up Istio’s Kiali dashboard to visualize traffic flows. When I shift 10% traffic to green, I watch the traces to confirm green is handling requests without errors.
An important question: how do you know green is ready? I run a set of automated tests inside the cluster using a headless service or a sidecar that sends test requests with the X-Canary header. If all tests pass, I proceed to full switch.
Personal Touch: The First Time I Did This
The first time I performed a blue-green switch in production, I was terrified. I had the rollback script open in another terminal. I shifted 100% to green. For a moment, everything was silent. Then the Grafana dashboard showed a flat line—no errors, no latency increase. I exhaled. Since then, I’ve done dozens of switches without a single user-facing incident. That confidence is invaluable.
Conclusion: Your Turn
If you’re still using rolling updates for critical services, stop. Blue-green deployment with Istio and Spring Cloud Gateway is simpler to implement than you think, and the peace of mind is worth the effort. Start with a non-critical service, set up the health probes, configure the VirtualService, and test the switch many times in staging. When you’re ready, deploy your first zero-downtime release.
I’d love to hear about your experience. If this article helped you, please like it, share it with your team, and leave a comment below with your thoughts or questions. What other deployment strategies have you tried? Let’s keep the conversation going.
As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva