ATHING
Deep Cuts.
Technical writing on data engineering, AI infrastructure, automation, and strategy. No surface-level takes. No sponsored opinions. Just the stuff that actually matters when you're building.
Data Architecture
Idempotency in Data Pipelines: The Property That Separates Reliable Systems from Fragile Ones
Most pipeline failures aren't caused by bad code — they're caused by code that wasn't designed to run twice. Idempotency is the property that makes reruns safe, and most teams underestimate how hard it is to get right.
Read article →Designing for Backfill: The Capability Your Pipeline Needs Before It Goes Live
Every data pipeline will eventually need to reprocess historical data. The ones that weren't designed for it make that day expensive, slow, and stressful.
Read article →Lakehouse vs Data Warehouse: When the Classic Architecture Isn't Enough
The data warehouse isn't dead. But for a growing number of organisations, it was never the right answer in the first place. Understanding when to move beyond it is one of the most important architectural decisions a data team makes.
Read article →Intelligent Automation
Orchestration vs Choreography: Two Patterns, One Hard Choice
Both patterns coordinate distributed work. They fail in very different ways, and most teams only discover which one they chose when something goes wrong.
Read article →Event-Driven vs Scheduled: Choosing the Right Trigger Model for Your Workflows
Scheduling is familiar. Event-driven is powerful. Choosing the wrong one for your context creates problems that compound over time and are surprisingly expensive to undo.
Read article →AI & Machine Learning
Why Most ML Models Never Reach Production
The model is rarely the problem. The infrastructure around it — pipelines, feature consistency, deployment tooling, monitoring — is where most AI projects quietly die.
Read article →RAG vs Fine-Tuning: A Practical Guide to Choosing the Right Approach
Both techniques make language models more useful for your specific domain. They solve different problems, and reaching for the wrong one wastes months and produces worse results.
Read article →Feature Stores: The Missing Layer Between Your Data Team and Your ML Team
Without a feature store, your data scientists recompute the same features in isolation. Your models train on data that doesn't match what they'll see in production. Both problems are expensive.
Read article →Business Intelligence
The Self-Serve BI Trap: Why Most Implementations Quietly Fail
Self-serve analytics sounds like the answer to every data team's capacity problem. In practice, most rollouts produce dashboards nobody trusts and questions nobody can answer.
Read article →The Metrics Layer: Why Your Business Logic Doesn't Belong in Your BI Tool
When the definition of 'revenue' lives inside a Looker model, a Tableau workbook, and a Metabase query — and they all disagree — you don't have a tooling problem. You have a metrics problem.
Read article →Systems Integration
The Strangler Fig Pattern: Migrating Legacy Systems Without the Big Bang
Big bang migrations fail more often than they succeed. The strangler fig pattern lets you replace a legacy system incrementally — without a single high-risk cutover date that keeps everyone awake.
Read article →API-First vs Event-Driven Integration: A Practical Decision Framework
APIs and event streams both move data between systems. The choice between them shapes your architecture for years and determines how your systems behave under failure.
Read article →Data Governance & Quality
Data Contracts: The Interface Between Teams That Nobody Wrote Down
When an upstream team changes a column name, the downstream pipeline breaks. Data contracts are how you prevent that — and how you assign accountability when it happens anyway.
Read article →Data Quality Checks Belong in the Pipeline, Not the Dashboard
Finding a data quality issue in a dashboard means the bad data is already in production, downstream systems have already consumed it, and the damage is already done.
Read article →Custom Application Dev
Build vs Buy for Data-Intensive Applications: A Framework for the Decision
Off-the-shelf tools work until they don't. Custom builds are expensive until they aren't. The decision is more nuanced than most teams make it, and the consequences last for years.
Read article →The Operational Data Store: Bridging the Gap Between Transactional and Analytical Systems
Your OLTP database can't handle analytical queries without slowing down. Your data warehouse can't handle real-time operational reads. The ODS sits between them — and most teams skip it until they can't.
Read article →Data Strategy Consulting
Your Data Strategy Needs a Cost Model, Not Just a Roadmap
A data roadmap tells you what you want to build. A cost model tells you what it will actually take. Most organisations have one and not the other, and the one they're missing is usually the one that matters.
Read article →The Data Maturity Model: Where Most Organisations Actually Are
Most companies rate themselves higher on the data maturity curve than the evidence supports. The gap between self-assessment and reality is where data budgets disappear.
Read article →