Complete Guide: Implementing Distributed Tracing in Microservices with Spring Cloud Sleuth and Zipkin

java

Complete Guide: Implementing Distributed Tracing in Microservices with Spring Cloud Sleuth and Zipkin

Learn how to implement distributed tracing in microservices using Spring Cloud Sleuth, Zipkin & OpenTelemetry. Complete guide with examples.

Aug 15, 2025

Complete Guide: Implementing Distributed Tracing in Microservices with Spring Cloud Sleuth and Zipkin

I recently faced a challenge in our production environment that made me rethink our observability approach. We had a customer complaint about slow order processing, but with eight microservices involved in the workflow, pinpointing the bottleneck felt like searching for a needle in a haystack. That’s when I realized we needed proper distributed tracing. Let me share how I implemented it using Spring Cloud Sleuth, Zipkin, and OpenTelemetry.

In microservices architectures, traditional debugging methods fall short. A single request might hop through multiple services, each with its own logs and metrics. Without correlation between these events, identifying why a checkout process takes five seconds becomes guesswork. How can we see the full journey without manual effort?

Distributed tracing solves this by creating a connected timeline of events. Each request gets a unique trace ID, and every service operation becomes a span with timing data. These spans form a hierarchical tree showing exactly where time is spent. The magic happens through context propagation - passing trace IDs between services via HTTP headers or messaging systems.

Let’s set up a practical example. Imagine an order processing flow with three Spring Boot services:

// OrderController.java
@RestController
public class OrderController {
    @PostMapping("/orders")
    public ResponseEntity<Order> createOrder(@RequestBody OrderRequest request) {
        // Business logic
        return ResponseEntity.ok(order);
    }
}

To enable tracing, add these dependencies to each service’s pom.xml:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>

With just this, Sleuth automatically instruments web requests, messaging, and database calls. It adds trace IDs to your logs like this:

2023-05-15 INFO [order-service,8e35b7f1a2c83b2d,8e35b7f1a2c83b2d] Creating order

But logs alone aren’t enough - we need visualization. That’s where Zipkin comes in. Run it via Docker:

docker run -d -p 9411:9411 openzipkin/zipkin

Configure your services to send traces to Zipkin:

# application.yml
spring:
  zipkin:
    base-url: http://localhost:9411
  sleuth:
    sampler:
      probability: 1.0 # Sample all traces for dev

Now when you make requests, you’ll see full trace diagrams in Zipkin’s UI. Each span shows service names, timings, and errors. Ever wondered exactly how much time that database query takes within the overall request?

While Sleuth works great, the industry is moving toward OpenTelemetry. Migrating is straightforward:

<!-- Replace Sleuth with OpenTelemetry -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-zipkin</artifactId>
</dependency>

For custom instrumentation, create spans manually:

// PaymentService.java
@Autowired private Tracer tracer;

public void processPayment(Order order) {
    Span span = tracer.nextSpan().name("validate-credit").start();
    try (Scope scope = span.makeCurrent()) {
        // Validation logic
        span.tag("card.last4", order.getCardLastFour());
    } finally {
        span.end();
    }
}

In production, sample selectively to manage costs:

spring.sleuth.sampler.rate: 100 # Sample 100 requests per second

When we implemented this, we discovered our inventory service was making synchronous calls to a legacy system, adding 300ms to checkout times. Without tracing, we’d never have spotted this.

Here are key lessons from our implementation:

Correlate traces with logs using MDC.put(“traceId”, currentTraceId)
Add custom span tags for business context (user_id, order_type)
Use sampling wisely - 100% in dev, but throttle in production
Monitor trace collection performance

What surprises might your system reveal when you see the full picture? I encourage you to try this setup - start with one service and expand. The visibility gains are worth the effort. If you found this useful, share it with your team or leave a comment about your tracing experiences!

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

java

Complete Guide: Implementing Distributed Tracing in Microservices with Spring Cloud Sleuth and Zipkin

Our Creations

We are on Medium

Similar Posts

Event Sourcing with Spring Boot and Kafka: Complete Implementation Guide with CQRS

Apache Kafka Spring Cloud Stream Integration: Build Scalable Event-Driven Microservices Architecture Guide

How to Integrate Legacy SOAP Services with Spring Boot and Apache CXF

Spring WebFlux R2DBC Guide: Master Non-Blocking Database Operations with Performance Optimization

Build Event-Driven Microservices: Apache Kafka, Spring Cloud Stream, and Transactional Outbox Pattern Tutorial

Complete Guide to Event Sourcing with Spring Boot, Axon Framework, and EventStore Database