Blog — DoOperator Research

Essays

4 posts

Accessible takes on why rigorous experimentation matters — for researchers, builders, and anyone trying to understand what actually works.

EssayCausal Inference

Correlation Was Never the Problem

"Correlation is not causation" is one of the most-repeated phrases in empirical research. It is also, as usually understood, a dramatic understatement of the actual difficulty. The real challenge is not distinguishing correlation from causation — it is identifying which causal story is correct when several are consistent with the same data.

DoOperator ResearchMay 29, 2026Read →

EssayCausal Inference

The Illusion of Control: Why Most A/B Tests Mislead More Than They Inform

Organizations run thousands of A/B tests every year and congratulate themselves on being data-driven. Most of those tests are statistically invalid. Here is why — and what rigorous experimentation actually requires.

DoOperator ResearchMay 27, 2026Read →

EssayCausal Inference

What N-of-1 Trials Get Right That Population Studies Get Wrong

Randomized trials on populations measure average effects in heterogeneous groups. N-of-1 trials measure what actually happens to one specific person. For individual decision-making, the latter is usually more relevant.

DoOperator ResearchMay 26, 2026Read →

EssayCausal Inference

The Replication Crisis Is Your Problem Too

Most findings from nutrition, psychology, and medicine that shaped your beliefs about health and behavior have not replicated. This is not a footnote — it changes what you should believe and how you should act.

DoOperator ResearchMay 24, 2026Read →

Deep Dives

10 posts

Technical synthesis posts covering the methods landscape across causal inference, experimental design, and decision-making.

Deep DiveEvolutionary Methods

Evolutionary Strategies: Gradient-Free Optimization That Scales

DoOperator ResearchMay 22, 2026Read →

Deep DiveIndustry Experiments

Running Experiments at Scale: Lessons From Google, Microsoft, and Netflix

DoOperator ResearchMay 21, 2026Read →

Deep DiveReinforcement Learning

Reinforcement Learning for Decision-Making: From MDPs to Real-World Deployment

DoOperator ResearchMay 20, 2026Read →

Deep DiveOnline Learning

Online Learning: Prediction and Decision-Making When the World Changes

DoOperator ResearchMay 19, 2026Read →

Deep DiveSequential Decisions

Bandits and Sequential Decisions: When to Explore and When to Exploit

DoOperator ResearchMay 18, 2026Read →

Deep DiveTreatment Effect Heterogeneity

Beyond Average Treatment Effects: Finding Who Benefits From Your Interventions

DoOperator ResearchMay 17, 2026Read →

Deep DiveStatistical Foundations

The Statistical Foundations of Rigorous Experimentation

DoOperator ResearchMay 16, 2026Read →

Deep DiveCausal Estimation

Causal Estimation: Choosing the Right Method When You Can't Randomize

DoOperator ResearchMay 15, 2026Read →

Deep DiveExperiment Design

Experiment Design That Actually Works: From Power Calculations to Guardrail Metrics

DoOperator ResearchMay 14, 2026Read →

Deep DiveCausal Inference

Causal Inference: From Correlation to Causation in Five Steps

DoOperator ResearchMay 13, 2026Read →

Practitioner's Guides

11 posts

Concise, applied guides to specific methods and concepts — what they are, when to use them, and what the evidence says.

Why Your Organization's Experiments Are Probably Confounded

Most organizational experiments are confounded. Here is how to tell — and what to do about it.

DoOperatorJune 4, 2026Read →

Practitioner's GuideEvolutionary Methods

A Practitioner's Guide to Evolutionary Methods

You've spent three weeks tuning a 12-layer policy network for a continuous control task. The reward signal is sparse—your agent gets a non-zero reward maybe once every 200 steps. Policy gradients are producing gradient estimates with variance so high that your loss curve looks li

DoOperator ResearchMay 11, 2026Read →

Practitioner's GuideIndustry Experiments

A Practitioner's Guide to Industry Experiments

You've just launched a new feature on your platform. Your product team is confident it will increase engagement. The A/B test shows a statistically significant 0.3% lift in daily active users (p = 0.04, N = 500,000). Your VP wants to ship it tomorrow. But you've been burned befor

DoOperator ResearchMay 10, 2026Read →

Practitioner's GuideReinforcement Learning

A Practitioner's Guide to Reinforcement Learning

You're the CTO of a robotics startup. Your warehouse robot has logged 10,000 hours of pick-and-place trajectories, stored as state-action-reward-next_state tuples on a NAS drive. Your competitor is deploying a new fleet next quarter. You need a policy that outperforms your curren

DoOperator ResearchMay 9, 2026Read →

Practitioner's GuideOnline Learning

A Practitioner's Guide to Online Learning

You're the head of experimentation at a large e-commerce platform. Your team has just launched a new recommendation algorithm, but you can't run a standard A/B test because the algorithm needs to learn from user behavior in real time—and the optimal recommendation changes as user

DoOperator ResearchMay 8, 2026Read →

Practitioner's GuideSequential Decisions

A Practitioner's Guide to Sequential Decision-Making

Your team has been running A/B tests for six months on a recommendation system. Each week, you launch a new feature variant, monitor the p-value dashboard, and stop when it crosses 0.05. Your boss wants to know: how many of your "significant" results are real? The answer, from Jo

DoOperator ResearchMay 7, 2026Read →

Practitioner's GuideTreatment Effect Heterogeneity

A Practitioner's Guide to Heterogeneous Treatment Effects

You're the head of clinical analytics at a health system. Your team has just rolled out a new remote monitoring program for heart failure patients. The average treatment effect is positive—readmissions dropped 8% overall. Your CEO wants to expand it system-wide. But you have a na

DoOperator ResearchMay 6, 2026Read →

Practitioner's GuideStatistical Foundations

A Practitioner's Guide to Statistical Foundations

You've just finished collecting data from a randomized experiment with 1,200 participants. Your treatment effect estimate looks promising—a 3.2 percentage point reduction in the primary outcome. But when you compute the 95% confidence interval using the standard Wald formula, you

DoOperator ResearchMay 5, 2026Read →

Practitioner's GuideCausal Estimation

A Practitioner's Guide to Causal Estimation

You're a product leader at a health-tech company. Your team just ran an A/B test on a new onboarding flow: 10,000 users randomized, treatment gets the new flow, control gets the old one. The treatment group shows a 12% relative lift in 7-day retention. Your CEO wants to know: sho

DoOperator ResearchMay 4, 2026Read →

Practitioner's GuideExperiment Design

A Practitioner's Guide to Experiment Design

You're a product manager at a major e-commerce platform. Your team wants to test a new recommendation algorithm. Engineering can deploy it to 5% of users. Your analytics team runs the A/B test, gets a statistically significant 0.3% revenue lift (p=0.04, n=500,000), and recommends

DoOperator ResearchMay 3, 2026Read →

Practitioner's GuideCausal Inference

A Practitioner's Guide to Causal Inference

Your product team just ran an A/B test. The treatment group saw a 12% lift in conversion. You're about to ship it. But your data scientist says: "The lift only appeared in the last three days of the experiment, and those days had a holiday promotion running for the control group

DoOperator ResearchMay 2, 2026Read →

Monthly Digests

1 posts

Curated roundups of new papers in causal inference, experimentation, and decision science from arXiv and beyond.

Monthly Digest

test

DoOperator ResearchMay 1, 2026Read →