Custom Application Dev·5 min read·9 January 2024

The Operational Data Store: Bridging the Gap Between Transactional and Analytical Systems

Operational systems and analytical systems are optimised for fundamentally different workloads, and trying to run one type of query against a system designed for the other eventually produces a crisis. Usually it's a slow-motion crisis — queries that take longer and longer, application response times that degrade during business hours, on-call incidents that are traced back to an analytical query that locked the production database at the wrong moment.

The tension is structural. Your application database is built for transactional workloads: fast reads and writes of individual records, high concurrency, strict consistency guarantees, row-oriented storage optimised for fetching a customer record by ID. Your data warehouse is built for the opposite: columnar storage, sequential scans across billions of rows, aggregations at scale. Run a complex analytical query against an OLTP database and you're asking the wrong tool to do work it wasn't designed for. Run real-time operational reads against a warehouse and you find that the data is hours old, that latency is too high for an interactive application, and that the query model doesn't match what you need.

The operational data store — ODS — is the architectural pattern for handling what falls between them.

What the ODS Is

An ODS is a database optimised for mixed workloads: it holds near-real-time data from one or more source systems, lightly transformed, queryable at both the record level and the aggregate level. It is not a replacement for the data warehouse, which handles historical depth and complex analytical queries across large time ranges. It is not a replacement for the application database, which handles transactional integrity and application state. It is the layer between them that serves the use cases that neither handles well.

The key properties that distinguish an ODS: it is current, typically holding hours to days of data rather than years. It is integrated, combining data from multiple source systems that don't communicate directly with each other. And it is operationally accessible — low latency enough that applications and dashboards can query it without the staleness and latency constraints of the warehouse.

The Use Cases It Solves

The clearest indicator that you need an ODS is when an operational team needs to see current state across multiple systems and neither the source systems nor the warehouse gives them a clean answer.

Customer service is the canonical example. A support agent needs to see the current state of a customer account — their recent orders, open support tickets, billing status, and active subscriptions — in a single view. The data lives in four different source systems. Querying each one directly would require four separate integrations and a data join that doesn't happen at the application layer. Going to the warehouse would mean looking at data that's potentially hours old, which is a problem when the customer on the phone made a change twenty minutes ago. The ODS integrates those four sources in near-real-time and serves the combined view with low enough latency for an interactive application.

The same pattern applies to operational reporting that combines data from multiple systems, real-time inventory or capacity views that don't belong in either the source systems or the warehouse, and monitoring dashboards that need current state rather than historical trends.

What It Is Not

The ODS gets misused most often when teams try to use it to replace analytics infrastructure. It is not a data warehouse. Data warehouses hold years of historical data and are designed to answer complex analytical questions across large time ranges. An ODS holds current-state data — typically days to weeks — and is designed to answer operational questions about what is happening now. Trying to answer "what did our revenue look like by region over the last three years?" from an ODS is using the wrong tool. The query will be slow, the history won't be there, and the ODS wasn't designed to handle that workload efficiently.

Keep the ODS focused on its purpose: current state, operational use cases, integrated view across sources. The warehouse owns the historical depth and the complex analytics.

Implementation Patterns

The simplest ODS implementation is a separate relational database — Postgres or MySQL — fed by change data capture from the source systems. CDC captures every change event from the source databases in near-real-time and applies them to the ODS, keeping it current without polling. This is the right approach for most use cases where latency of seconds to minutes is acceptable.

For lower-latency requirements — sub-second data currency — a streaming approach makes more sense: Kafka or a similar message broker capturing change events, with a serving database consuming the stream and materialising the current state. The infrastructure is more complex and the operational overhead is higher, but the latency is proportionally lower.

Managed options like Materialize sit closer to the streaming end, offering view-based serving over streaming data with a SQL interface that looks familiar to engineers used to working with relational databases. For teams without the streaming infrastructure expertise, managed services reduce the operational burden at the cost of some flexibility.

When You Don't Need One

The ODS is a solution to a real pain point. It is not an architectural best practice that every system should adopt. If your analytical queries are genuinely performing well against your application database and application performance is unaffected, you don't need an ODS. If your warehouse data is fresh enough for all of your operational use cases, you don't need an ODS. If your operational questions can be answered by individual source systems without requiring a cross-system join, you don't need an ODS.

Adding architectural complexity before you have the problem it solves is a reliable way to create maintenance burden without value. The ODS earns its place when you have the specific tension it's designed to resolve: operational use cases requiring near-real-time, integrated data that neither your source systems nor your warehouse serves cleanly. If you have that problem, the ODS is the right answer. If you don't have it yet, build the simpler system and add complexity when the need is real.

Written by ATHING

We design and build data infrastructure, automation pipelines, and AI systems for organisations that need them to work.

Talk to Us