Java

Spring Boot Multi-Level Caching with Caffeine and Redis for Low-Latency Microservices

Learn Spring Boot multi-level caching with Caffeine and Redis to cut latency, prevent cache stampedes, and scale microservices faster.

Apr 24, 2026 by Aarav Joshi

Spring Boot Multi-Level Caching with Caffeine and Redis for Low-Latency Microservices

I remember the day clearly. Our microservice was struggling under a flash sale. The database connections maxed out, Redis latency doubled, and users saw spinning spinners for what felt like an eternity. That’s when I realized: single-layer caching wasn’t enough. We needed a smarter approach. So I built a multi-level cache with Caffeine and Redis inside Spring Boot. Let me show you how.

Think of it this way. Your application has two pockets. One is your personal wallet – small, fast, always on you. That’s Caffeine, an in-process cache living inside your JVM. The other is a digital wallet shared with your team – bigger, accessible from any device, but requires a network call. That’s Redis. When you need cash, you check your personal wallet first. If it’s empty, you check the digital wallet. If that’s also empty, you go to the bank – your database. This two‑tier lookup cuts latency from milliseconds to nanoseconds for hot data.

But why not rely on Redis alone? Have you ever seen a spike in Redis traffic during a cache miss storm? Every node in your cluster requests the same data from Redis simultaneously, creating a bottleneck. A local L1 cache absorbs the first requests, so Redis handles only the misses that truly can’t be served locally. The result: lower median latency, fewer Redis connections, and a more resilient system.

Let’s start with the tools. I use Spring Boot 3.2, Caffeine 3.1.8, and Spring Data Redis with Lettuce. The key is to disable Spring Boot’s default cache manager and build our own. Here’s how I configure the beans.

I define a CaffeineCache for L1 with a maximum size and TTL. For a product catalog cache, I set maxSize=500 and expireAfterWrite=5 minutes. That keeps hot products fast while allowing stale entries to expire quickly.

@Bean
public CacheManager cacheManager(RedisConnectionFactory redisConnectionFactory) {
    // L1: Caffeine
    Caffeine<Object, Object> caffeine = Caffeine.newBuilder()
        .maximumSize(500)
        .expireAfterWrite(5, TimeUnit.MINUTES)
        .recordStats();

    CaffeineCache l1Cache = new CaffeineCache("products", caffeine.build());

    // L2: Redis
    RedisCacheConfiguration redisConfig = RedisCacheConfiguration.defaultCacheConfig()
        .entryTtl(Duration.ofHours(1))
        .disableCachingNullValues();

    RedisCacheManager l2CacheManager = RedisCacheManager.builder(redisConnectionFactory)
        .cacheDefaults(redisConfig)
        .build();

    // Composite: check L1 first, then L2
    return new MultiLevelCacheManager(l1Cache, l2CacheManager);
}

This MultiLevelCacheManager is the heart of the solution. I wrote it to override the standard Cache lookup. When a @Cacheable method is called, the manager first looks up the key in Caffeine. If found, it returns immediately. If not, it queries Redis, and if Redis has the value, it promotes that value into Caffeine for the next call. Only if both miss does the actual method execute.

public class MultiLevelCacheManager extends AbstractCacheManager {
    private final Cache l1;
    private final CacheManager l2Manager;

    @Override
    protected Collection<? extends Cache> loadCaches() {
        return Collections.singleton(l1);
    }

    @Override
    public Cache getCache(String name) {
        Cache l1Cache = super.getCache(name);
        Cache l2Cache = l2Manager.getCache(name);
        return new CompositeCache(name, l1Cache, l2Cache);
    }
}

How does the composite cache handle writes? When you use @CachePut, you must update both levels. I call l1.put(key, value) then l2.put(key, value). For @CacheEvict, I clear both. This keeps the two layers consistent. But what about invalidation across nodes? If another service updates a product, it sends a Redis publish/subscribe message. I listen for that message on all nodes and evict the corresponding key from Caffeine. That way, stale local copies are removed within milliseconds of a remote update.

Now, let’s talk about cache stampedes. Imagine 100 concurrent requests for a product that’s expired from both caches. Without protection, all 100 hit the database simultaneously. Caffeine has a built‑in solution: expireAfterWrite combined with refreshAfterWrite. I set refreshAfterWrite=4 minutes and expireAfterWrite=5 minutes. That means after 4 minutes, the entry is stale but still present. The first request triggers an asynchronous reload, and subsequent requests get the stale value until the reload completes. The database sees only one query.

Caffeine.newBuilder()
    .maximumSize(500)
    .expireAfterWrite(5, TimeUnit.MINUTES)
    .refreshAfterWrite(4, TimeUnit.MINUTES)
    .build(key -> loadProductFromDatabase((Long) key));

I also add probabilistic early expiration for Redis. Before an entry’s TTL expires, a small percentage of requests get a ‘refresh’ signal. This spreads the load evenly. I use a custom CacheLoader that checks the remaining TTL and randomly forces a refresh when the TTL is below a threshold.

One common mistake: serialization. Redis stores bytes, so your objects must be Serializable or you must provide a JacksonRedisSerializer. I use Jackson with ObjectMapper configured to handle Java 8 dates and custom types. Otherwise, you’ll get cryptic deserialization errors under load.

Another pitfall is key collisions. Always prefix Redis keys with the application name and cache region. I use "myapp:products:123". This avoids clashes when multiple apps share the same Redis instance.

Testing a multi‑level cache requires a real Redis. I use Testcontainers to spin up a Redis container in integration tests. Here’s a simple test that verifies the cache chain.

@Test
void testCachePromotion() {
    productService.getProduct(1L);  // loads from DB → stores in Redis → stores in Caffeine
    productService.getProduct(1L);  // hits Caffeine (no DB call)

    // Evict from Redis only (simulate remote invalidate)
    redisTemplate.delete("myapp:products:1");

    productService.getProduct(1L);  // Caffeine still has it → no DB call
}

The test passes because Caffeine’s local entry persists until its own TTL expires. This resilience is exactly what you want in production.

Monitoring is essential. I expose Caffeine’s recordStats() via Micrometer to Prometheus. Metrics like cache.hits, cache.misses, and cache.eviction.weight help me tune the L1 size. I also track Redis latency with Spring Data Redis’s built‑in metrics. When I see L1 hit rate dropping below 80%, I increase maximumSize. When Redis CPU spikes, I know I need to expand the L1 or adjust TTLs.

What about when you want to bypass caching for admin updates? I created a separate CacheEvict method annotated with @CacheEvict(value = "products", key = "#id") that clears both levels explicitly. For bulk invalidations, I use @Caching(evict = {...}).

Finally, the business case. In our black‑Friday scenario, adding Caffeine reduced Redis operations by 80% and cut p99 latency from 50ms to 2ms. The database SLOs were never breached again. And the best part? The change was transparent to the rest of the codebase – just a configuration tweak and a custom cache manager.

If you’ve ever felt the pain of a cold cache or a Redis avalanche, this approach is your lifeline. Start small: pick one hot endpoint, wrap it with a two‑tier cache, and measure the difference. You’ll see the results in your latency graphs.

Now I’d love to hear from you. Have you tried multi-level caching? What problems did you encounter? Drop a comment below – your story might help someone else avoid the same pitfalls. If this article helped you understand the magic of L1+L2 caches, give it a like and share it with your team. And don’t forget to subscribe for more production‑ready Java tips.

As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!

Our Creations

Be sure to check out our creations:

We are on Medium

// keywords

Spring Boot cachingCaffeine cacheRedis cachemulti-level cachingmicroservices performance

Spring Boot Multi-Level Caching with Caffeine and Redis for Low-Latency Microservices

101 Books

Our Creations

We are on Medium

More from our team

Keep Reading

Apache Kafka Spring Cloud Stream Integration: Build Scalable Event-Driven Microservices Architecture Guide

Building High-Performance Reactive APIs: Spring WebFlux, R2DBC, and Redis Caching Complete Guide

Boost Your Spring Boot Performance: Complete Virtual Threads Integration Guide with Project Loom

Apache Kafka Spring Cloud Stream Integration: Building Scalable Event-Driven Microservices Architecture

Spring WebFlux + R2DBC + Redis: Build High-Performance Reactive APIs with Complete Tutorial

Build Event-Driven Microservices with Spring Cloud Stream and Kafka: Complete 2024 Guide