Overview Surveys Findings Blog Research Agenda Roadmap

DoOperator Research

Causal inference, experimental design, and decision-making under uncertainty

A curated knowledge base covering the mathematical foundations of causal reasoning, experiment design, reinforcement learning, evolutionary methods, and online learning.

137 canonical papers109 wiki summaries10 research areas

Experiment Design Causal Inference Causal Estimation Treatment Effect Heterogeneity Sequential Decisions Reinforcement Learning Statistical Foundations Industry Experiments Evolutionary Methods Online Learning

Curated corpus

Foundational papers

Causal Inference

A General Approach to Causal Mediation Analysis

Kosuke Imai, Luke Keele, Dustin Tingley · 2010 · Psychological Methods · 3,587 citations

Provides a canonical overview or reference point for the relevant DoOperator research area.

Treatment Effect Heterogeneity

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Stefan Wager, Susan Athey · 2017 · Journal of the American Statistical Association · 2,737 citations

Provides a canonical overview or reference point for the relevant DoOperator research area.

Causal Inference

Causal inference in statistics: An overview

Judea Pearl · 2009 · Statistics Surveys · 2,309 citations

Provides a canonical overview or reference point for the relevant DoOperator research area.

Treatment Effect Heterogeneity

Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning

Soren R. Kunzel, Jasjeet S. Sekhon, Peter J. Bickel +1 more · 2019 · Proceedings of the National Academy of Sciences · 1,243 citations

Provides a canonical overview or reference point for the relevant DoOperator research area.

Causal Inference

On Causal Inference in the Presence of Interference

Eric J. Tchetgen Tchetgen, Tyler J. VanderWeele · 2012 · Statistical Methods in Medical Research · 500 citations

Provides a canonical overview or reference point for the relevant DoOperator research area.

Experiment Design

Online controlled experiments at large scale

Ron Kohavi, Alex Deng, Brian Frasca +3 more · 2013 · Knowledge Discovery and Data Mining · 424 citations

This paper addresses the challenge of scaling online controlled experiments (A/B tests) from isolated, one-off studies to an organization-wide decision-making engine that can run hundreds of concurrent experiments across millions of users. The core problem is that as organizations grow and adopt agile development, they need to evaluate many product ideas simultaneously — often hundreds per quarter — but traditional experimental designs and statistical methods break down under the combinatorial complexity of overlapping experiments, massive data volumes, and the need for rapid, trustworthy decisions. Existing approaches to product evaluation (focus groups, expert reviews, observational data analysis) fail to establish causal relationships reliably, while single-experiment A/B testing frameworks cannot handle the operational demands of concurrent experimentation at web scale. The paper synthesizes lessons from Microsoft's Bing, where over 200 concurrent experiments run daily across ~100 million monthly active users, and provides a framework for building the cultural, engineering, and statistical infrastructure needed to make experimentation a core organizational capability rather than a niche analytical tool.

Experiment Design

Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology

Nicholas Larsen, Alex Deng, Jiheng Zhang +2 more · 2024 · The American Statistician · 70 citations

Provides a canonical overview or reference point for the relevant DoOperator research area.

Sequential Decisions

Designing Optimal Dynamic Treatment Regimes: A Causal Reinforcement Learning Approach

Junzhe Zhang · 2020 · International Conference on Machine Learning · 16 citations

Policy optimization in treatment settings can be invalid if it ignores how treatments were assigned and how confounders evolve.

Experiment Design

Controlled experiments on the web: survey and practical guide

Ron Kohavi, Roger Longbotham, Dan Sommerfield +1 more · 2009 · Data Mining and Knowledge Discovery

Provides a canonical overview or reference point for the relevant DoOperator research area.

Causal Inference

Causal Inference: What If

Miguel A. Hernan, James M. Robins · 2020 · Chapman & Hall/CRC

Provides a canonical overview or reference point for the relevant DoOperator research area.

Causal Estimation

Double/Debiased Machine Learning for Treatment and Causal Parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer +4 more · 2016

This paper addresses the fundamental challenge of performing valid inference on a low-dimensional causal or structural parameter (e.g., an average treatment effect, a regression coefficient in a partially linear model, or a policy parameter) when the nuisance functions—such as outcome regressions, propensity scores, or instrument propensity models—must be estimated using high-dimensional or flexible machine learning methods. Classical semiparametric theory assumes that nuisance parameters can be estimated at sufficiently fast rates (typically root-n) and that the parameter space has low complexity (e.g., Donsker properties hold). In modern settings with many covariates, ML methods like lasso, random forests, boosting, and neural nets are natural choices for nuisance estimation, but they introduce two sources of bias that destroy root-n consistency and valid inference when naively plugged into estimating equations: (1) **regularization bias** from shrinkage or penalization that prevents the nuisance estimator from converging at root-n rates, and (2) **overfitting bias** from using the same data to estimate both the nuisance functions and the parameter of interest. The paper shows that combining Neyman-orthogonal moment conditions with cross-fitting (sample splitting) eliminates both biases, yielding estimators that are root-n consistent, asymptotically normal, and admit valid confidence intervals under remarkably weak conditions on the nuisance estimators.

Causal Estimation

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Stefan Wager, Susan Athey · 2015

This paper addresses the challenge of estimating heterogeneous treatment effects (HTE) — how the causal effect of a treatment varies across individuals with different observed characteristics — in settings with many covariates and complex interactions. Classical nonparametric methods for HTE estimation, such as nearest-neighbor matching, kernel methods, and series estimation, perform well with few covariates but break down as the dimensionality increases due to the curse of dimensionality. Meanwhile, machine learning methods like random forests excel at high-dimensional prediction but lack the inferential guarantees needed for causal inference: researchers need confidence intervals and hypothesis tests for treatment effects, not just point predictions. The paper bridges this gap by developing causal forests — a modification of Breiman's random forests — that provide consistent, asymptotically normal estimates of heterogeneous treatment effects under unconfoundedness, along with valid confidence intervals. This enables researchers to explore treatment effect heterogeneity without pre-specifying subgroups, while avoiding the pitfalls of data dredging and false discovery.

Browse the full corpus →

From the blog

Latest research notes

Why Your Organization's Experiments Are Probably Confounded

Most organizational experiments are confounded. Here is how to tell — and what to do about it.

June 4, 2026

Causal Inference

Correlation Was Never the Problem

"Correlation is not causation" is one of the most-repeated phrases in empirical research. It is also, as usually understood, a dramatic understatement of the actual difficulty. The real challenge is not distinguishing correlation from causation — it is identifying which causal story is correct when several are consistent with the same data.

May 29, 2026

Causal Inference

The Illusion of Control: Why Most A/B Tests Mislead More Than They Inform

Organizations run thousands of A/B tests every year and congratulate themselves on being data-driven. Most of those tests are statistically invalid. Here is why — and what rigorous experimentation actually requires.

May 27, 2026

All posts →

Steady Practice Applied Science Series

Literature surveys

SP-1

The Science of Habit Formation

Automaticity, cues, repetition, and identity in habit formation. Covers habit measurement, context change, discontinuity and relapse, and the distinction between habit formation and maintenance.

SP-3

Sleep Science for Personal Practice

From sleep architecture and the two-process model to evidence-based interventions and consumer tracker accuracy. Covers chronotype, individual sleep need, clinical red flags, and N=1 experiment protocols.

SP-9

N=1 Experimentation and Personal Science

Why population research cannot predict individual responses, and how to design valid self-experiments. Covers crossover design, statistical methods, common threats to validity, and practical experiment planning.

All surveys →

Research areas

Problems we investigate

Causal Inference

Identification and Estimation of Causal Effects

We study the conditions under which causal quantities are identifiable from observational and interventional data. Our work extends the do-calculus (Pearl, 2000) to settings with partial compliance, unmeasured confounders, and time-varying treatments. We are particularly interested in the gap between identification theory — which establishes what is estimable in principle — and efficient estimation in finite samples using doubly-robust and semiparametric methods.

Pearl (2009)Richardson & Robins (2014)Chernozhukov et al. (2018)730 papers

Experimental Design

Optimal and Adaptive Experimental Design

Classical optimal design theory (Kiefer & Wolfowitz, 1960; Wald, 1947) characterizes designs that minimize the variance of estimators under fixed sample budgets. We extend this framework to sequential and adaptive settings where the allocation of experimental conditions can respond to accumulating data. Questions of interest include: optimal stopping rules, the tradeoff between exploration and exploitation in multi-armed designs, and the design of experiments robust to model misspecification.

Kiefer & Wolfowitz (1960)Chernoff (1959)Russo et al. (2018)586 papers

Sequential Decision-Making

Bandits, POMDPs, and Offline Reinforcement Learning

We investigate sequential decision problems where an agent must act under uncertainty about both the environment state and the causal structure of outcomes. This spans Bayesian bandit algorithms (Thompson sampling, information-directed sampling), model-based reinforcement learning in partially observed environments, and offline RL from fixed datasets — where distributional shift between behavior and evaluation policies creates fundamental challenges for policy evaluation and improvement.

Thompson (1933)Russo & Van Roy (2016)Levine et al. (2020)1270 papers

Heterogeneous Treatment Effects

Estimation and Inference for Conditional Average Treatment Effects

When treatment effects vary across individuals or contexts, population-average estimates are insufficient for decision-making. We study nonparametric and semiparametric methods for estimating conditional average treatment effects (CATEs), including meta-learner frameworks (T-, S-, X-, R-learners), honest causal forests, and doubly-robust AIPW estimators. A central concern is valid, assumption-light confidence intervals for effect heterogeneity in moderate-dimensional covariate spaces.

Wager & Athey (2018)Nie & Wager (2021)Kennedy (2023)765 papers

Partial Pooling

Hierarchical Models and Empirical Bayes Estimation

Individual-level experiments suffer from low power; pooling across units risks masking heterogeneity. We study Bayesian hierarchical models and empirical Bayes procedures that borrow strength across experimental units without imposing homogeneity. This includes Stein-type shrinkage estimators, posterior predictive checks for exchangeability, and computationally tractable approximate inference methods for large-scale hierarchical structures.

Efron & Morris (1977)Gelman & Hill (2007)Kucukelbir et al. (2017)673 papers

Causal Representation Learning

Learning Causal Structure from Data

Structure learning — recovering a directed acyclic graph (DAG) from observational or interventional data — is computationally hard in general and statistically challenging under latent confounding. We are interested in score-based and constraint-based discovery algorithms, identifiability conditions for linear non-Gaussian and nonlinear models, and the emerging intersection of causal structure learning with deep generative models.

Spirtes et al. (2000)Shimizu et al. (2006)Schölkopf et al. (2021)730 papers

Evolutionary Methods

Gradient-Free Optimization via Evolutionary Strategies

When the objective is non-differentiable, stochastic, or evaluated through a black-box simulator, gradient-based methods fail. Evolutionary strategies — particularly CMA-ES and natural evolution strategies — provide principled gradient-free alternatives with strong theoretical convergence properties. We are interested in the intersection of evolutionary methods with neural architecture search, quality-diversity algorithms that maintain behavioral repertoires, and population-based training as a hyperparameter optimization primitive.

Hansen & Ostermeier (2001)Salimans et al. (2017)Mouret & Clune (2015)558 papers

Online Learning

Prediction, Regret, and Adaptive Decision-Making

Online learning formalizes decision-making as a sequential game: at each round an agent selects an action, observes a loss, and updates its policy. The central quantity is regret — the gap between the agent's cumulative loss and the best fixed strategy in hindsight. We study Follow-the-Regularized-Leader and mirror descent as unified frameworks for deriving optimal algorithms, EXP3 and Hedge for adversarial settings, and adaptive gradient methods (AdaGrad, Adam) as instances of online-to-batch conversion with per-coordinate learning rates.

Shalev-Shwartz (2012)Hazan (2016)Cesa-Bianchi & Lugosi (2006)384 papers

Open problems

Questions we don't yet have answers to

Q1
Under what conditions is a CATE estimator asymptotically efficient, and does honest sample-splitting pay a meaningful price in finite samples?
Q2
Can Thompson sampling be extended to non-stationary reward processes with unknown drift, while maintaining sub-linear regret guarantees?
Q3
What is the minimax-optimal adaptive design for a crossover experiment when washout duration is unknown and carryover effects are plausible?
Q4
Is there a general semiparametric efficiency bound for causal quantities defined by the do-calculus in models with instrumental variables and unmeasured confounders?
Q5
When does offline RL with pessimistic value estimates yield policies that are safe to deploy, and how tight are existing coverage assumptions?
Q6
Can CMA-ES or natural evolution strategies achieve competitive sample efficiency with policy gradient methods on problems where the reward is non-differentiable or only partially observed?
Q7
Is there a unified regret bound that smoothly interpolates between the stochastic and adversarial bandit settings, without requiring prior knowledge of which regime applies?

See our full research agenda →

Causal inference, experimental design, and decision-making under uncertainty

Foundational papers

Latest research notes

Literature surveys

Problems we investigate

Identification and Estimation of Causal Effects

Optimal and Adaptive Experimental Design

Bandits, POMDPs, and Offline Reinforcement Learning

Estimation and Inference for Conditional Average Treatment Effects

Hierarchical Models and Empirical Bayes Estimation

Learning Causal Structure from Data

Gradient-Free Optimization via Evolutionary Strategies

Prediction, Regret, and Adaptive Decision-Making

Questions we don't yet have answers to

Browse by topic

Methods curriculum