· Technology  · 5 min read

Event-Driven Architecture (Without the Hype): When Queues Help, When They Hurt, and How Agencies Ship It Safelys

Event-driven systems can unlock scale and resilience, or create invisible failure modes. Here’s the practical way to use queues, topics, and consumers in real products.

The Honest Definition

Event-driven architecture (EDA) means your system communicates by publishing events (“something happened”) that other parts of the system react to asynchronously.

Instead of:

  • Service A calls Service B and waits.

You do:

  • Service A publishes OrderPlaced
  • Service B consumes it when it can
  • The system stays responsive even if parts are slow

EDA isn’t “microservices.” It’s a coordination style.

When You Should Use EDA (and when you shouldn’t)

Use event-driven architecture when:

  • you can tolerate eventual consistency
  • you need to absorb traffic spikes
  • downstream work is best effort (emails, analytics, indexing)
  • you want teams to release independently

Avoid it when:

  • the user must see an immediate, consistent result
  • you can’t afford duplicates (no idempotency)
  • you don’t have monitoring for queues/consumers yet

The one-sentence rule

If you can’t explain an event in one sentence (who did what, and what changed), it’s probably not an event. It’s a command in disguise.

Why Teams Reach For It

You usually reach for events because one of these becomes painful:

  • Latency: synchronous calls chain together and users wait.
  • Reliability: one flaky dependency breaks the whole request.
  • Throughput: you need to smooth spikes (bursts) without melting.
  • Team boundaries: different domains need to evolve independently.

Queues and topics can help. But they also change your failure modes.

The Tradeoff Nobody Says Out Loud

With synchronous calls, failures are obvious. With events, failures are often silent.

Your system can be “up” and still be wrong.

That’s why EDA is less about the broker and more about the rules around it.

The Building Blocks (In Plain English)

Events

An event is a fact: InvoicePaid, UserCreated, ShipmentDispatched.

Good events:

  • are past tense (“happened”)
  • are immutable
  • include enough context to be useful, but not everything

Producers

A producer emits events. It should not care who consumes them.

Consumers

Consumers react. They must handle:

  • duplicates
  • out-of-order delivery
  • retries
  • partial failure

Broker / Queue / Topic

This is Kafka, RabbitMQ, SQS, SNS, etc. The tech matters, but less than people think.

Four Rules That Prevent Distributed Chaos

1. Make consumers idempotent

Idempotent means: processing the same event twice doesn’t cause harm.

In practice:

  • use a unique event id
  • store “already processed” state, or
  • write operations that are naturally safe to repeat

If you skip this, retries become a bug factory.

// Example: idempotent consumer (pseudo-code)
async function handleOrderPlaced(event: { id: string; orderId: string }) {
  if (await db.processedEvents.exists(event.id)) return;

  await db.transaction(async (tx) => {
    await tx.orders.markAsPlaced(event.orderId);
    await tx.processedEvents.insert({ id: event.id, processedAt: new Date() });
  });
}

2. Treat “at least once” as default

Most systems deliver messages at least once. So you must assume duplicates.

If you need exactly-once semantics, you’ll pay for it in complexity.

3. Keep events stable (version them)

Events are contracts. If you change fields casually, you break downstream systems quietly.

Use:

  • versioned event names, or
  • schema evolution rules, or
  • backward compatible changes only

4. Observability is not optional

If events are the bloodstream, you need a pulse.

Minimum:

  • consumer lag metrics
  • dead-letter queue (DLQ) volume
  • retry counts
  • a way to trace one business action across services

When EDA Is A Great Idea

  • Sending emails, notifications, webhooks
  • Analytics pipelines and auditing
  • Media processing (images/video), document parsing
  • Syncing data between bounded contexts
  • Handling spikes (queues as shock absorbers)

If the user can wait a few seconds, async often wins.

When EDA Is A Bad Idea

  • You need immediate consistency across multiple operations
  • You don’t have monitoring maturity yet
  • You can’t tolerate duplicated side effects
  • Your team is already struggling with basic deploys/testing

Sometimes the best architecture is “a single service with a job queue”.

Kafka vs RabbitMQ vs SQS (The Practical Take)

This is intentionally high-level because the best choice depends on constraints, not vibes.

  • Kafka: great when you need high throughput + durable logs + multiple consumers + replay
  • RabbitMQ: great for flexible routing patterns and classic work queues
  • SQS/SNS (cloud-managed): great when you want operational simplicity and are okay with managed limits/tradeoffs

The broker is rarely the main problem. The main problem is designing the workflow around it.

ToolBest forWatch out for
Kafkahigh throughput, replayable event logops complexity, schema discipline needed
RabbitMQwork queues, routing patternsscaling patterns vary, message semantics matter
SQS/SNSmanaged simplicity on AWSlimits/quotas, visibility timeout + retries need care

A Small Agency Anecdote (Short)

We’ve seen a pattern repeat: a team adds a queue to “fix performance,” but they don’t add idempotency or DLQs.

The system looks fast until a consumer crashes. Then messages pile up, retries multiply, and what should’ve been a 5-minute incident turns into a weekend.

The fix was never “switch brokers”. It was adding the boring pieces:

  • idempotent handlers
  • DLQ + alerting
  • one traceable workflow

After that, the queue actually delivered what it promised.

A Decision Framework You Can Use Today

If you’re considering event-driven architecture, answer these:

  1. Can we accept eventual consistency for this workflow?
  2. What happens if the consumer is down for 2 hours?
  3. What happens if the same event is processed twice?
  4. How will we detect stuck messages in under 10 minutes?
  5. What is our rollback plan if the event schema breaks?

If you can’t answer 3 and 4 confidently, don’t scale the architecture yet. Scale the fundamentals.

Conclusion

Event-driven architecture isn’t “advanced”. It’s just a different set of defaults:

  • async by default
  • retries are normal
  • duplicates are normal
  • monitoring is mandatory

Used carefully, it makes systems more resilient and teams faster. Used casually, it creates invisible problems.

FAQ

Is event-driven architecture the same as microservices?

No. EDA is a communication style. You can do EDA in a monolith, and you can do microservices without events.

What’s the most common EDA failure?

Non-idempotent consumers. Duplicates + retries turn into double side effects.

Do I need Kafka to do EDA?

No. Many teams succeed with a simple job queue first, then graduate when scale and replay needs are real.

Back to Blog