← Deep Cuts
Systems Integration·5 min read·8 May 2024

API-First vs Event-Driven Integration: A Practical Decision Framework

Most integration decisions default to API because it's familiar. Event-driven gets chosen when someone has heard it's more scalable. Both defaults lead to the wrong architecture for the wrong reasons.

The choice between APIs and event streams is one of those architectural decisions that looks tactical in the moment and turns out to be structural for years. Systems built on REST APIs develop patterns of tight coupling that make them hard to extend without coordination. Systems built on event streams develop patterns of asynchronous complexity that make simple queries harder than they should be. Neither pattern is universally right. Picking one because it's what your team knows, or because a conference talk made the other one sound impressive, is how you build yourself into a corner.

The decision has a logic. Understanding that logic means understanding what each pattern actually does and what it costs.

How API Integration Works

API-first integration is synchronous and request-response. System A needs something from System B — data, a confirmation, a result — so it calls System B's API and waits. System B processes the request and returns a response. System A continues with that response in hand.

The model is simple to reason about because it maps to how humans think about asking for things: you ask, you wait, you get an answer. It's easy to debug because the call stack is visible — when something goes wrong, you can trace exactly which request failed and what the downstream system returned. It works well for low-frequency operations and for any situation where the outcome of the call needs to inform what happens next.

The costs are equally clear. The caller depends on the callee being available. If System B is down or slow, System A waits or fails. Fan-out is expensive — if one operation in System A needs to notify five downstream systems, System A now makes five API calls and its latency is the sum of however long each one takes. And adding a new system that needs to know about something that happens in System A means modifying System A to add another call. The graph of dependencies grows in System A's codebase.

How Event-Driven Integration Works

Event-driven integration is asynchronous and decoupled. When something happens in System A — an order is placed, a user signs up, a record changes — System A publishes an event to a message broker. Systems B, C, and D subscribe to relevant events and react independently. System A doesn't know who's listening. It doesn't wait for anyone. It publishes and continues.

The coupling is inverted. Rather than System A knowing about all its downstream consumers, the consumers know about the events they care about and subscribe accordingly. Adding a new consumer means deploying a new subscriber — no change to System A. The broker absorbs the gap when a downstream system is unavailable; events queue up and the subscriber processes them when it recovers. For high-frequency operations where multiple systems react to the same thing, event-driven handles fan-out cleanly.

The costs are different in character but equally real. Asynchronous systems are harder to debug because the execution is not sequential. An event published by System A might be processed by System B seconds later, on a different server, in a different context. Tracing a failure requires correlating events across multiple logs. And there is no immediate result — if System A needs to know whether System B processed something correctly, you've now built a request-response interaction on top of an asynchronous system, which is more complex than just using an API.

The Criteria That Actually Matter

The choice is not about scale or modernity. It's about the nature of the operation and the relationship between the systems involved.

  • Does the caller need an immediate result? Use an API. Payment processing requires knowing whether the payment succeeded before telling the user. Authentication requires knowing whether the credentials are valid before granting access. Real-time lookups — pricing, inventory, eligibility checks — require an answer synchronously. Events don't work here because there's no mechanism to wait for a downstream system's response and continue based on it without building complexity that cancels out the benefits.
  • Is the operation fire-and-forget? Use events. Sending a welcome email after a user signs up doesn't need a synchronous response. Updating a search index after a record changes doesn't need one. Syncing data to an analytics system doesn't need one. These operations have a result, but the caller doesn't need to know about it before continuing.
  • Do multiple systems need to react to the same thing? Use events. An order being placed should trigger fulfilment, send a confirmation email, update inventory, and record the event in analytics. Chaining five API calls from the order service means the order service needs to know about all five consumers, its latency grows with each one, and a failure in any downstream call complicates the transaction semantics. One event, five independent subscribers, is cleaner in every dimension.
  • Is the downstream system sometimes unavailable? Use events. If the system you're calling has maintenance windows, has deployment downtime, or is simply unreliable, synchronous API calls will fail during those windows. Events absorb the gap — the broker holds the events, the subscriber processes them when it comes back up, and the publisher never blocked.

The Failure Modes Worth Knowing

API fan-out is one of the most common architectural mistakes in integration work. An operation occurs — a record is saved, a transaction completes — and the handling code makes sequential API calls to five downstream systems. The operation's total latency is now the sum of five round trips. If any downstream system is slow, the whole operation is slow. If any downstream system is down, the whole operation fails, even though the core work succeeded. This pattern should almost always be replaced with an event, published once, consumed by as many subscribers as needed.

The inverse mistake is equally damaging: using events when you need a result. This manifests as systems that publish a command event and then poll another topic waiting for a response event — essentially building request-response semantics on top of an event bus. The result is asynchronous complexity with none of the decoupling benefits, because the producer is now implicitly coupled to the consumer's processing time. If you need a result synchronously, use an API.

Using Both

Healthy architectures have both patterns, assigned to the operations they suit. Operational queries and commands that need an immediate result use APIs. Side effects, notifications, cache invalidations, and cross-system synchronisation use events. The mistake is not picking one over the other — it's picking one and applying it everywhere because it's simpler to have a single rule.

The architecture that ages well is the one where each integration touchpoint was chosen based on what that specific interaction actually requires, not based on what the team knew how to build or what sounded like the right answer in a planning meeting. That judgement is worth applying carefully at the start, because reversing it later — untangling a system of tightly coupled API calls into an event-driven architecture, or unwinding an event-driven system where synchronous results were always needed — is expensive work that compounds with every system added on top of the original decision.

Written by ATHING

We design and build data infrastructure, automation pipelines, and AI systems for organisations that need them to work.

Talk to Us