Skip to content

29 · Cache, Message Queue, and Event System Selection

The thesis in one line: cache is not a database, a queue is not magic, and an event system is not "just add Kafka." They solve three pressures: read hotspots, write spikes, and cross-boundary collaboration. Before choosing, ask whether you are reducing latency, smoothing peaks, or decoupling state progress.


🧰 Technology Stack Selection Track, Chapter 3 · One thing to practice

Chapter 28 covered truth sources and read models. This chapter covers the most common middle layers around them: cache, message queue, and event system. They can save a system, and they can also make it harder to reason about.


Opening: these three are often mixed up

Many diagrams show:

   App -> Redis -> MQ -> Kafka -> Worker

Then people say the architecture is advanced. Ask instead:

  • Is Redis storing disposable cache or business state that must not be lost?
  • Is the queue smoothing a spike or coordinating services?
  • Is Kafka carrying commands, or events that describe facts?
  • What happens on duplicate, out-of-order, failed, or backlogged messages?

Architectural judgment: the value of a middle layer is not its name. It is the quality attribute it changes: latency, throughput, availability, coupling, consistency, recovery cost.


1. Cache: accelerate, do not usurp truth

Cache solves read hotspots:

   request -> app -> cache hit -> return
                  \-> miss -> primary DB -> fill cache -> return

Three common mistakes:

MistakeResultBetter approach
Treat cache as source of truthLosing cache loses dataPrimary store is truth; cache is rebuildable
No invalidation strategyUsers see stale or wrong valuesTTL, active invalidation, versions
Hot key stampede on missPrimary DB gets crushedNull cache, request coalescing, rate limits, warmup

Types of cache:

Cache typeGood forWatch out
Local cacheConfig, dictionaries, rarely-changing dataMultiple instances can diverge
Distributed cacheHot objects, sessions, counters, rate limitsNetwork cost, capacity, eviction
CDNImages, videos, static resources, public pagesInvalidation delay and edge consistency

If cache disappears, the system should get slower, not wrong. If it gets wrong, you stored business truth in cache.


2. Queue: turn "do now" into "do reliably later"

Queues smooth peaks:

   spike -> admission/rate limit -> queue -> workers consume at safe rate -> DB

Good fits:

  • Ticket notification after seat lock.
  • Coupons, SMS, email after order success.
  • Parsing, chunking, indexing after document upload.
  • Video transcoding after upload.

But queues turn a synchronous world into an asynchronous one:

New problemRequired answer
Duplicate messagesIs the consumer idempotent?
Message lossHow do producer, storage, and ack work?
OrderingDo you need order per business key?
BacklogWhat does the user see? How do you degrade?
Dead letterWhere do failed messages go? Who repairs them?

A queue is not "add reliability." It trades request latency for asynchronous consistency and recovery work.


3. Events: say what happened, not what others must do

Commands and events differ:

TypeMeaningExampleWho owns the result
CommandPlease do somethingCreateOrder, SendEmailReceiver must perform or fail
EventSomething already happenedOrderPaid, TicketLockedSubscribers react as needed

Events propagate business facts:

   Order service: OrderPaid
      |-- Inventory: confirm deduction
      |-- Notification: send message
      |-- Data platform: update reports
      |-- Risk: record behavior

This decouples the publisher from downstream consumers. Costs:

  • Event schemas become contracts.
  • Downstream failure cannot simply roll back the fact.
  • Too many small events flood the system; overly broad events hide meaning.
  • Tracing is mandatory as the flow spreads.

4. Understand products by semantics first

TypeCommon examplesFeels likeGood for
Task queueRabbitMQ, Celery, SidekiqWork assignmentBackground jobs, email, image processing
Log/event streamKafka, PulsarReplayable fact logEvent bus, data sync, audit, stream processing
Lightweight messaging/streamRedis Streams, NATSFast internal channelSmall/mid async workflows, low-latency messaging
Managed cloud queueSQS, Pub/SubReliable queue with less opsCloud-native teams avoiding self-ops

Ask:

  1. Do messages need replay?
  2. Do you need complex routing and delivery guarantees?
  3. Can the team operate the cluster?
  4. Is this message part of audit truth?

5. Outbox: do not split DB commit from event publish

Classic failure:

   1. Order write succeeds
   2. OrderCreated event publish fails
   result: DB has the order, downstream never hears about it

Outbox pattern:

   one local transaction:
   write business row + write outbox row
          |
          v
   relay scans outbox -> publishes message -> marks delivered

It adds a table, relay, retries, and idempotency, but it keeps fact writing and event publishing controllable. This is core Chapter 11 consistency engineering.


🎯 Quick check

🤔Which statement about cache best matches this chapter?
  • AMore cache is always better; avoid the database whenever possible
  • BIf cache disappears, the system should get slower, not wrong; the primary store remains the source of truth
  • CCache can replace all database transactions
🤔An order service writes an order, then must notify inventory, notification, and reporting systems. What should you watch out for?
  • AEvent names not being pretty enough
  • BThe database write and message publish are not atomic; use patterns such as Outbox to avoid DB truth with missing messages
  • CKafka naturally gives exactly-once business correctness

Chapter summary

  • Cache solves read hotspots: it should be rebuildable and not become source of truth.
  • Queues solve spikes and background work: they introduce async consistency, backlog, and recovery.
  • Events propagate facts: "what happened," not "please do this."
  • Choose semantics before products: task queue, event stream, lightweight messaging, managed queue solve different problems.
  • Outbox is a consistency basic for crossing service boundaries safely.

Next: caches, queues, and events handle pressure behind services. Chapter 30 looks at how services speak directly: REST, gRPC, GraphQL, Webhook, and event APIs.


💬 Comments