Intelligent Automation·5 min read·3 September 2024

Orchestration vs Choreography: Two Patterns, One Hard Choice

Both patterns coordinate distributed work. They fail in very different ways, and most teams only discover which one they chose when something goes wrong at 11pm on a Friday.

The distinction is about control. Orchestration has a central conductor that knows the entire workflow and tells each service what to do and when. Choreography has no conductor — each service listens for events, does its work, and emits events for the next service to react to. The overall workflow exists, but it's implicit and distributed rather than defined in one place.

Both patterns can coordinate the same work. The difference is where the intelligence lives, where the failure points are, and what it costs to understand what's happening at any given moment.

Orchestration: Visibility at the Cost of Coupling

In an orchestrated system, the orchestrator — Airflow, Prefect, Dagster, Temporal — holds a complete model of the workflow. It knows step one must complete before step two begins. It knows that steps three and four can run in parallel. It knows that if step two fails, step one should not be retried. That knowledge is expressed explicitly, in code, in one place.

This gives you something genuinely valuable: observability. You can look at the orchestrator's UI and see exactly where a workflow is, which steps succeeded, which failed, what the error was, and how long each step took. When something goes wrong, you have a single pane of glass. Retrying a failed step is usually a button click. Debugging a complex multi-step failure means reading one log in one place, not correlating events across five services.

The cost is coupling. The orchestrator becomes the centre of gravity for every workflow it manages. Every service that participates in an orchestrated workflow has an implicit dependency on the orchestrator — not a service-to-service dependency, but a dependency on the orchestration layer itself. When the orchestrator changes — when you migrate platforms, when you reorganise workflows, when the orchestration framework releases a breaking change — every workflow is potentially affected. The more workflows you have, the heavier that dependency becomes.

The orchestrator can also become an operational bottleneck. A single orchestrator managing hundreds of concurrent workflows is a single point of failure for a significant portion of your automation estate. High availability configurations help, but they add complexity. The simplicity of having everything in one place comes with the risk of everything being in one place.

Choreography: Decoupling at the Cost of Visibility

In a choreographed system, no component knows the full picture. A payment service processes a payment and emits a "payment completed" event. An inventory service is listening for that event and decrements stock. An email service is also listening and sends a confirmation. Neither the payment service nor the inventory service knows the other exists. They only know their inputs and their outputs.

This decoupling is powerful. You can add a new consumer to the event stream without touching any existing service. You can replace the inventory service with a different implementation without the payment service knowing anything changed. Independent services can be deployed, scaled, and updated on their own schedules. For organisations running genuinely independent services at high throughput, this flexibility is worth a great deal.

What you give up is the ability to easily answer the question: what is the current state of this order? In an orchestrated system, that question has a clear answer — the orchestrator knows. In a choreographed system, the answer requires correlating events across multiple services, each of which only recorded its own portion of the story. Distributed tracing tools help. Correlation IDs help. But they require investment to implement well, and they recover visibility that you voluntarily gave up.

Debugging a choreographed workflow failure means finding the event that was produced but not consumed, or the consumer that failed silently, or the downstream service that received an out-of-order event and produced incorrect output. Without careful investment in observability tooling, this is significantly harder than reading an orchestrator log.

When Each Pattern Is Right

Orchestration earns its place when workflows are complex, well-defined, and need to be human-readable. Financial transaction pipelines, multi-step data processing jobs, compliance workflows with approval gates — these benefit from having an explicit, auditable record of every step and its outcome. If the business needs to ask "what happened to this record on Tuesday and why," an orchestrated workflow has a clear answer.

Orchestration is also the right choice when workflows change frequently. Adding a new step, reordering dependencies, changing retry behaviour — in an orchestrated system, these are changes to a workflow definition. In a choreographed system, they require coordinating changes across multiple independent services.

Choreography is right when you're integrating genuinely independent systems that should not know about each other. A new analytics consumer that wants to react to order events should not require a change to the order processing service. An event bus with independent consumers is the clean solution there, and choreography is the natural model.

It's also the right model at high throughput where a centralised orchestrator would become a bottleneck. Systems processing millions of events per hour don't want every event to pass through a central conductor.

The Mistake Most Teams Make

Teams reach for choreography because it sounds modern and because the decoupling story is genuinely appealing. Services that don't know about each other are easy to test, easy to deploy independently, and easy to replace. The architecture diagrams look clean.

Then something breaks and they spend three days correlating events across six systems to find out that a message was dropped because a consumer was temporarily down during a deployment. They add correlation IDs. They add distributed tracing. They build a service to aggregate event states so they can answer "what is the current status of this order." By the end, they have roughly the same visibility they would have had with an orchestrator, but they built the observability layer themselves, under pressure, after the fact.

Start with orchestration unless you have a specific reason not to. The visibility is free, the debugging story is straightforward, and the trade-offs are well-understood. Move to choreography where the use case demands it — high throughput, genuine service independence, additive consumer patterns — with a clear plan for how you'll maintain observability across the distributed flow.

Written by ATHING

We design and build data infrastructure, automation pipelines, and AI systems for organisations that need them to work.

Talk to Us