Java

How to Build a Two-Tier Cache in Spring Boot with Caffeine, Redis, and Pub/Sub

Learn how to build a two-tier cache in Spring Boot using Caffeine, Redis, and Pub/Sub to cut latency and fix cache consistency.

May 8, 2026 by Aarav Joshi

How to Build a Two-Tier Cache in Spring Boot with Caffeine, Redis, and Pub/Sub

I’ve spent the last few months wrestling with a beast that many developers know too well: the cache consistency problem. Our microservice was serving product recommendations to millions of users daily, and we had three instances running behind a load balancer. Each instance held its own in‑memory cache using Caffeine – fast, yes, but isolated. When instance A updated a product’s popularity score, instances B and C continued to serve stale data until their local TTL expired. Users saw inconsistent recommendations, and our NPS took a hit. That’s when I realised: a single‑layer cache forces you to choose between speed and consistency. I wanted both. So I built a two‑tier caching architecture that combines the blistering speed of an L1 Caffeine cache with the shared consistency of an L2 Redis cache, all orchestrated by Spring’s cache abstraction. Was it worth it? The response times dropped by 40%, and the inconsistency complaints vanished. Let me show you exactly how I did it.

But first, ask yourself: have you ever deployed a service with multiple replicas and watched your local caches drift apart? If you have, you already know the pain – and the opportunity.

The core idea is simple: every read first hits the L1 Caffeine cache inside the same JVM. If it’s there (a cache hit), we return the value in under 0.1 milliseconds. If not, we go to the L2 Redis store, which is shared across all instances. A Redis lookup takes 1–5 milliseconds – still far faster than a database query. If it’s also a miss there, we fetch from the database (10–100 ms), populate both L2 and L1, and return. This layered approach gives you the best of both worlds: hot data lives in L1 and is accessed at near‑zero latency; cold data is served from Redis; and the database is only hit when absolutely necessary.

I started by setting up the dependencies. My pom.xml includes spring-boot-starter-cache, spring-boot-starter-data-redis, Caffeine itself, and Micrometer for metrics. The configuration properties map to yaml settings for cache sizes, TTLs, and the Redis connection. I use Lettuce as the Redis client with a connection pool to handle concurrency.

Next, I created a custom CacheManager that delegates to both Caffeine and Redis. Spring’s AbstractCacheManager is the base. Inside, I maintain a CaffeineCache and a RedisCache for each cache region. The real magic happens in the get() method of my custom MultiTierCache implementation. When a value is requested, I first check the Caffeine cache. If it’s present, I return it immediately. If not, I try Redis. On a Redis hit, I store the value in Caffeine (asynchronously, using a ScheduledExecutor to avoid blocking) and return it. On a double miss, I compute the value via the original Callable (which Spring passes to the cache), store it in both tiers, and return.

But this alone doesn’t solve the consistency problem across nodes. Imagine instance A updates a product’s data. It evicts the entry from its own Caffeine cache and writes the new value to Redis. Instances B and C have no idea – they still have the old value in their Caffeine caches. To fix this, I implemented a cache invalidation broadcasting mechanism using Redis Pub/Sub. When any instance performs a put or evict, it publishes a CacheInvalidationEvent to a dedicated Redis channel (e.g., cache-invalidation-events). All instances subscribe to this channel. On receiving an event, the local listener removes the corresponding entry from the Caffeine cache. This ensures that within a few milliseconds, every instance’s L1 cache is consistent with the L2 store.

Here’s a snippet of the invalidation listener:

@Component
public class CacheInvalidationListener {

    private static final Logger log = LoggerFactory.getLogger(CacheInvalidationListener.class);
    private final CacheManager cacheManager;

    public CacheInvalidationListener(CacheManager cacheManager) {
        this.cacheManager = cacheManager;
    }

    @EventListener
    public void handleInvalidationEvent(CacheInvalidationEvent event) {
        Cache cache = cacheManager.getCache(event.getCacheName());
        if (cache != null) {
            cache.evict(event.getCacheKey());
            log.info("L1 cache invalidated for key {} in cache {}", event.getCacheKey(), event.getCacheName());
        }
    }
}

I also had to handle the case where the invalidation event arrives for the same instance that published it – to prevent unnecessary local evictions. I added an originNodeId field to the event and check it before evicting.

Now, what about cache stampedes and thundering herds? When many concurrent requests miss the cache simultaneously, they can all hit the database at once. I solved this by using Caffeine’s loadingCache with a refreshAfterWrite policy and a synchronous eviction listener. But for the Redis tier, I applied a simple lock around the compute operation (using Redis’ SET NX with a short TTL). This ensures only one instance hits the database for a given key during the first miss.

Monitoring is critical. I added Micrometer counters for L1 hits, L2 hits, and cache misses. Exposing them via Prometheus gave us a real‑time dashboard showing the ratio – typically around 85% L1 hits for our hot keys. When the L1 hit rate drops, we know the TTL is too short or the eviction policy too aggressive.

Let’s talk about eviction policies. For Caffeine, I used maximumSize with expireAfterWrite of 30 seconds. This keeps the L1 footprint small and ensures stale data is evicted quickly anyway. For Redis, I set a TTL of 10 minutes. This matches our business requirement: product recommendations rarely change that fast, but we don’t want to overload Redis with too much stale data. I also added a maxmemory-policy in Redis of allkeys-lru to handle memory pressure.

Now a question for you: how do you choose the right TTL values? Start by analyzing your access patterns. If data changes every second, your L1 TTL should be under a second – but then you might as well skip caching. For us, the sweet spot was 30 seconds for L1 and 10 minutes for L2. Experiment and measure.

One personal touch: I named my cache regions like product_recommendations, user_sessions, and pricing_tables. Each region can have its own configuration. I built a CacheRegionProperties class that holds per‑region L1 size and TTL, and populated it from a YAML map. This made it trivial to fine‑tune each region without redeploying.

The final piece is integration testing. I wrote a @SpringBootTest that starts a Redis test container and verifies the multi‑tier behavior. I simulate a miss, see the L2 populated, then verify the next read hits L1. Then I publish an invalidation event from another “instance” and confirm the L1 entry is evicted. This gave me the confidence to push to production.

So what did I gain? Our average response time for product recommendations dropped from 12 ms to under 2 ms. The inconsistency complaints vanished. And the system scaled effortlessly as we added more instances – because the invalidation channel kept every L1 cache in sync.

If you’ve ever struggled with cache drift or performance trade‑offs, this multi‑level approach will change the way you think about caching. Give it a try in your next Spring Boot project. And if you found this useful, hit the like button, share it with your team, and leave a comment with your own cache war stories. I’d love to hear how you solved the consistency puzzle.

As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!

Our Creations

Be sure to check out our creations:

We are on Medium

// keywords

Spring Boot cachingCaffeine cacheRedis cachecache consistencyRedis Pub/Sub

How to Build a Two-Tier Cache in Spring Boot with Caffeine, Redis, and Pub/Sub

101 Books

Our Creations

We are on Medium

More from our team

Keep Reading

Master Spring Data JPA: Advanced Cursor and Keyset Pagination for High-Performance Applications

Build Reactive Data Pipelines: Spring WebFlux, R2DBC & Kafka for High-Performance Applications

How Switching from REST to gRPC with Spring Boot Supercharged My Microservices

Apache Kafka Spring Cloud Stream Integration Guide: Build Scalable Event-Driven Microservices Architecture

Building Event-Driven Microservices: Spring Boot, Kafka and Transactional Outbox Pattern Complete Guide

Apache Kafka Spring WebFlux Integration: Building Scalable Reactive Event-Driven Microservices That Handle High-Throughput Data Streams