Understanding Interaction Effects in Clinical Trials

Clinical Trial Scenario: AntiHyp Drug Study

A clinical trial is studying a new antihypertensive medication (AntiHyp) across different patient groups. Researchers want to understand if the drug works differently for male versus female patients.

Variables:

How We Encode the Categorical Variable

We code patient group as a number: Female = -1, Male = +1.

This ±1 encoding centers the variable at zero. The benefit: β₀ becomes the overall average response (not just the response for one group), and the math for interpreting interactions stays clean. You'll see two clusters of points in the plots below—one for each group.

Let's look at the raw data. Do these groups seem to respond differently to treatment?

We want to build a model to predict blood pressure reduction:

$$Y = \beta_0 + \beta_1 \text{Treatment} + \beta_2 \text{Group} + \text{???}$$

What should go in place of ??? to capture how treatment effect might differ by group?

First Attempt: Ignoring Interactions

Let's start simple and fit a model with just the main effects:

$$Y = \beta_0 + \beta_1 \text{Treatment} + \beta_2 \text{Group}$$

This model assumes treatment works equally well for both groups. The only difference between groups is a vertical shift (β₂ moves one line up or down).

Model Fit

Residuals

Look at the residuals. If the model fit well, they would scatter randomly around zero. Instead, notice how the colors separate—Female residuals trend one direction while Male residuals trend the other. The model is systematically wrong. It's missing something.

Second Attempt: Adding an "Interaction" Term

A colleague suggests: "We need to account for how Treatment and Group interact. Let's add a term that combines them: (Treatment + Group)."

What do you think—will this capture the interaction effect?
$$Y = \beta_0 + \beta_1 \text{Treatment} + \beta_2 \text{Group} + \beta_3(\text{Treatment} + \text{Group})$$

Model Fit

Residuals

The lines are still parallel. The residuals still show the same pattern. Adding the variables together didn't help. Why?

The Problem: Linear Dependence

We can rearrange the additive model:

$$Y = \beta_0 + \beta_1 \text{Treatment} + \beta_2 \text{Group} + \beta_3(\text{Treatment} + \text{Group})$$ $$= \beta_0 + (\beta_1 + \beta_3)\text{Treatment} + (\beta_2 + \beta_3)\text{Group}$$

The "interaction" term just gets absorbed into the existing coefficients! It's mathematically equivalent to the first model—no new information is added.

In matrix terms: the column for (Treatment + Group) is a linear combination of existing columns, making the design matrix rank-deficient.

The Key Insight: Multiplication

What if instead of adding Treatment and Group, we multiply them?

$$Y = \beta_0 + \beta_1 \text{Treatment} + \beta_2 \text{Group} + \beta_3(\text{Treatment} \times \text{Group})$$

Model Fit

Residuals

Now the lines diverge! The residuals scatter randomly—no more systematic pattern by group. The model finally captures the different treatment effects.

Why Multiplication Works

The slope of Y with respect to Treatment now depends on the group:

$$\frac{\partial Y}{\partial \text{Treatment}} = \beta_1 + \beta_3 \cdot \text{Group}$$

Multiplication creates a new piece of information that cannot be replicated by adjusting other coefficients. The product column is linearly independent.

Explore: Build Your Own Interaction

Now it's your turn. Adjust the sliders to set the true interaction strength and noise level, then generate data to see how well the multiplicative model captures it.

Model Fit

Residuals

Model Fit Statistics (Multiplicative Model)