· Technology · 5 min read
Event-Driven Architecture (Without the Hype): When Queues Help, When They Hurt, and How Agencies Ship It Safelys
Event-driven systems can unlock scale and resilience, or create invisible failure modes. Here’s the practical way to use queues, topics, and consumers in real products.
The Honest Definition
Event-driven architecture (EDA) means your system communicates by publishing events (“something happened”) that other parts of the system react to asynchronously.
Instead of:
- Service A calls Service B and waits.
You do:
- Service A publishes
OrderPlaced - Service B consumes it when it can
- The system stays responsive even if parts are slow
EDA isn’t “microservices.” It’s a coordination style.
When You Should Use EDA (and when you shouldn’t)
Use event-driven architecture when:
- you can tolerate eventual consistency
- you need to absorb traffic spikes
- downstream work is best effort (emails, analytics, indexing)
- you want teams to release independently
Avoid it when:
- the user must see an immediate, consistent result
- you can’t afford duplicates (no idempotency)
- you don’t have monitoring for queues/consumers yet
The one-sentence rule
If you can’t explain an event in one sentence (who did what, and what changed), it’s probably not an event. It’s a command in disguise.
Why Teams Reach For It
You usually reach for events because one of these becomes painful:
- Latency: synchronous calls chain together and users wait.
- Reliability: one flaky dependency breaks the whole request.
- Throughput: you need to smooth spikes (bursts) without melting.
- Team boundaries: different domains need to evolve independently.
Queues and topics can help. But they also change your failure modes.
The Tradeoff Nobody Says Out Loud
With synchronous calls, failures are obvious. With events, failures are often silent.
Your system can be “up” and still be wrong.
That’s why EDA is less about the broker and more about the rules around it.
The Building Blocks (In Plain English)
Events
An event is a fact: InvoicePaid, UserCreated, ShipmentDispatched.
Good events:
- are past tense (“happened”)
- are immutable
- include enough context to be useful, but not everything
Producers
A producer emits events. It should not care who consumes them.
Consumers
Consumers react. They must handle:
- duplicates
- out-of-order delivery
- retries
- partial failure
Broker / Queue / Topic
This is Kafka, RabbitMQ, SQS, SNS, etc. The tech matters, but less than people think.
Four Rules That Prevent Distributed Chaos
1. Make consumers idempotent
Idempotent means: processing the same event twice doesn’t cause harm.
In practice:
- use a unique event id
- store “already processed” state, or
- write operations that are naturally safe to repeat
If you skip this, retries become a bug factory.
// Example: idempotent consumer (pseudo-code)
async function handleOrderPlaced(event: { id: string; orderId: string }) {
if (await db.processedEvents.exists(event.id)) return;
await db.transaction(async (tx) => {
await tx.orders.markAsPlaced(event.orderId);
await tx.processedEvents.insert({ id: event.id, processedAt: new Date() });
});
}2. Treat “at least once” as default
Most systems deliver messages at least once. So you must assume duplicates.
If you need exactly-once semantics, you’ll pay for it in complexity.
3. Keep events stable (version them)
Events are contracts. If you change fields casually, you break downstream systems quietly.
Use:
- versioned event names, or
- schema evolution rules, or
- backward compatible changes only
4. Observability is not optional
If events are the bloodstream, you need a pulse.
Minimum:
- consumer lag metrics
- dead-letter queue (DLQ) volume
- retry counts
- a way to trace one business action across services
When EDA Is A Great Idea
- Sending emails, notifications, webhooks
- Analytics pipelines and auditing
- Media processing (images/video), document parsing
- Syncing data between bounded contexts
- Handling spikes (queues as shock absorbers)
If the user can wait a few seconds, async often wins.
When EDA Is A Bad Idea
- You need immediate consistency across multiple operations
- You don’t have monitoring maturity yet
- You can’t tolerate duplicated side effects
- Your team is already struggling with basic deploys/testing
Sometimes the best architecture is “a single service with a job queue”.
Kafka vs RabbitMQ vs SQS (The Practical Take)
This is intentionally high-level because the best choice depends on constraints, not vibes.
- Kafka: great when you need high throughput + durable logs + multiple consumers + replay
- RabbitMQ: great for flexible routing patterns and classic work queues
- SQS/SNS (cloud-managed): great when you want operational simplicity and are okay with managed limits/tradeoffs
The broker is rarely the main problem. The main problem is designing the workflow around it.
| Tool | Best for | Watch out for |
|---|---|---|
| Kafka | high throughput, replayable event log | ops complexity, schema discipline needed |
| RabbitMQ | work queues, routing patterns | scaling patterns vary, message semantics matter |
| SQS/SNS | managed simplicity on AWS | limits/quotas, visibility timeout + retries need care |
A Small Agency Anecdote (Short)
We’ve seen a pattern repeat: a team adds a queue to “fix performance,” but they don’t add idempotency or DLQs.
The system looks fast until a consumer crashes. Then messages pile up, retries multiply, and what should’ve been a 5-minute incident turns into a weekend.
The fix was never “switch brokers”. It was adding the boring pieces:
- idempotent handlers
- DLQ + alerting
- one traceable workflow
After that, the queue actually delivered what it promised.
A Decision Framework You Can Use Today
If you’re considering event-driven architecture, answer these:
- Can we accept eventual consistency for this workflow?
- What happens if the consumer is down for 2 hours?
- What happens if the same event is processed twice?
- How will we detect stuck messages in under 10 minutes?
- What is our rollback plan if the event schema breaks?
If you can’t answer 3 and 4 confidently, don’t scale the architecture yet. Scale the fundamentals.
Conclusion
Event-driven architecture isn’t “advanced”. It’s just a different set of defaults:
- async by default
- retries are normal
- duplicates are normal
- monitoring is mandatory
Used carefully, it makes systems more resilient and teams faster. Used casually, it creates invisible problems.
FAQ
Is event-driven architecture the same as microservices?
No. EDA is a communication style. You can do EDA in a monolith, and you can do microservices without events.
What’s the most common EDA failure?
Non-idempotent consumers. Duplicates + retries turn into double side effects.
Do I need Kafka to do EDA?
No. Many teams succeed with a simple job queue first, then graduate when scale and replay needs are real.