AI & Machine Learning·6 min read·19 August 2024

Feature Stores: The Missing Layer Between Your Data Team and Your ML Team

In most organisations with both a data engineering team and an ML team, the same underlying data gets transformed twice. Data engineers build transformation pipelines that serve analytics and reporting. Data scientists build feature pipelines that serve model training. Both pipelines read from the same source tables. Both produce aggregations and derived signals from the same raw events. And they do it separately, with separate code, maintained by separate people, with subtly different logic.

This duplication isn't just inefficient. It is actively dangerous for production ML. When the feature pipeline used for training and the feature pipeline used for serving diverge — even slightly — models fail in ways that are difficult to diagnose and expensive to fix. The gap between the two is where production ML reliability breaks down most consistently.

What a Feature Store Is

A feature store is a system that computes, stores, and serves the features that ML models consume. Features in this context are the transformed, aggregated, or derived data representations that a model actually trains on — not the raw events, but the processed signals derived from them. A customer's average order value over the last 90 days. The number of failed login attempts in the past hour. A product's click-through rate for a given user segment.

A feature store has two distinct components that serve fundamentally different purposes. The offline store holds historical feature values used for model training and experimentation. It needs to support point-in-time correct lookups — when you train a model, you need to know what the feature values were at the time of each training label, not what they are now. Backfilling a training dataset with current feature values is a form of data leakage that produces overly optimistic model performance and degrades in production.

The online store holds the current feature values needed for low-latency inference. When a model needs to score a record in real time, it queries the online store for the latest feature values for that entity. The online store is optimised for single-record lookups with millisecond latency, not for the bulk retrieval patterns the offline store handles.

The key architectural constraint is that both stores are populated from the same feature definitions. The computation logic is written once. The offline and online representations are kept in sync by the feature store infrastructure. This is the property that solves training-serving skew.

The Two Core Problems It Solves

Training-Serving Consistency

Training-serving skew is the most common silent failure mode in production ML. A model trains on features computed one way and is then served with features computed differently. The differences are often small — a timezone assumption, a rounding behaviour, a null handling edge case — but they compound across features and degrade model performance in ways that look like model drift when the cause is actually infrastructure inconsistency.

A feature store makes training-serving skew structurally difficult to introduce. There is one feature definition. It serves both contexts. The offline materialisation and the online serving path are generated from the same specification. If you change the definition, both stores update from the same code. You cannot accidentally compute a 30-day trailing average with business-day logic in training and calendar-day logic in serving when both use the same defined computation.

Feature Reuse Across Teams and Models

Without a feature store, every new model project starts with a feature engineering phase that often partially duplicates work already done elsewhere in the organisation. One team computed customer lifetime value for a churn model six months ago. Another team is about to compute a very similar signal for a propensity model, unaware that it already exists, or aware but unable to reuse it because the implementation is buried in a notebook attached to the previous project.

A feature store with a feature registry makes features discoverable. Teams can search for features by entity type, by data source, by the team that created them. When a feature already exists that meets your requirements, you consume it rather than rebuilding it. The compounding effect across a mature ML platform — where dozens of features are shared across multiple models — reduces the cost of new model development significantly and improves consistency across the model portfolio.

Who Actually Needs One

A feature store is not a universal requirement. For a single model in production, or for teams doing primarily offline analysis with no real-time serving requirements, the operational overhead of running a feature store infrastructure outweighs the benefits. The value scales with the number of models in production and the severity of the training-serving consistency problem.

The indicators that a feature store is warranted: multiple models in production that share entity types and therefore share potential features, active training-serving skew issues causing reliability problems, or significant duplication of feature engineering work across different model projects. If any of those are present at meaningful scale, a feature store pays for itself.

Build vs Buy

The managed feature store ecosystem is mature. Feast is open source and integrates with most existing data stacks. Tecton is a managed commercial platform built by the team that originally built Uber's Michelangelo feature store. Both AWS SageMaker Feature Store and Vertex AI Feature Store are available as first-party services on their respective clouds. All of them have solved the hard infrastructure problems — point-in-time correct retrieval, online-offline synchronisation, feature serving latency — at a level that would take a significant internal engineering effort to replicate from scratch.

The relevant decision is not whether to build a feature store from scratch. It is which managed solution integrates most cleanly with your existing data stack — your warehouse, your orchestration layer, your model training infrastructure. Evaluate on integration, not on infrastructure capability, because the infrastructure is largely commoditised at this point.

What It Does and Doesn't Fix

A feature store does not improve model accuracy. The features going into a model are the same features that existed before. What changes is the reliability guarantee around them. A model built on features served by a feature store will behave in production the way it behaved in evaluation, because the features are computed consistently. A model built on ad-hoc feature pipelines might not, and diagnosing the gap between offline and online performance in a system without a feature store is a significant forensic undertaking.

Reliably correct is a different and more important property than marginally more accurate in a production ML system. Most organisations invest heavily in improving model performance and underinvest in the infrastructure that ensures that performance persists after deployment. A feature store addresses the latter. It is the layer between your data team and your ML team that stops the same problem from being solved twice, differently, with the worse version ending up in production.

Written by ATHING

We design and build data infrastructure, automation pipelines, and AI systems for organisations that need them to work.

Talk to Us