Skip to main content

Choosing the Right Statistical Test

The choice depends on the type of variable measured and the nature of the populations compared.

Decision tree

SituationRecommended testUsageExample
Binary variable (yes/no): conversion rate, enrolment rateProportion difference test (Z-test or chi-squared)Compare two rates: Exposed vs Non-Exposed PopulationEnrolment rate: 6.5% vs 4.0%
Continuous variable, normal distribution: average basket, revenue per customerStudent’s t-test (two independent samples)Compare two means on normal distributionsAverage basket: €85 vs €72
Continuous variable, non-normal distribution: frequency, LTVMann-Whitney test (non-parametric)Compare two medians without normality assumptionPurchases/year: median 3 vs 2
Control variables to integrate (customer profile, history)Linear or logistic regression with control variablesIsolate Reelevant’s effect while controlling for other factorsEffect of exposure controlling for tenure and segment
Observational data without possible randomisationPropensity Score Matching (PSM) or Difference-in-Differences (DiD)Quasi-experimentation to approximate a causal effectComparison of Exposed/Non-Exposed with matched profiles

Essential Formulas

Test 1 — Proportion Difference Test (Z-test)

The most common test for binary metrics such as enrolment rates, conversion rates, or reactivation rates. Test statistic:
Z = (p̂_E − p̂_C) / √[ p̂(1 − p̂) × (1/n_E + 1/n_C) ]

Where:
  p̂_E = observed rate in the Exposed Population
  p̂_C = observed rate in the Non-Exposed Population
  p̂   = pooled rate = (x_E + x_C) / (n_E + n_C)
  n_E  = size of the Exposed Population
  n_C  = size of the Non-Exposed Population
ThresholdInterpretation
Z > 1.96p-value < 0.05 → Uplift statistically significant at 95%
Z > 2.58p-value < 0.01 → Uplift statistically significant at 99%
95% CIUplift ± 1.96 × standard error of the Uplift

Worked example

MetricExposed PopulationNon-Exposed Population
Size50,00050,000
Enrolment rate6.5% (3,250)4.0% (2,000)
Absolute Uplift+2.5 pts
Step-by-step calculation:
p̂_pooled = (3,250 + 2,000) / (50,000 + 50,000) = 5.25%
Standard error = √[ 0.0525 × 0.9475 × (1/50,000 + 1/50,000) ] = 0.001407
Z = (0.065 − 0.040) / 0.001407 = 17.77
Significant at 99.9%. 95% CI: [+2.22%; +2.78%]
A Z of 17.77 means the observed difference is 17 times larger than what chance could explain. There is less than 1 in 1,000,000 chance this Uplift is due to luck. The effect is real.

Test 2 — Student’s t-test for Continuous Metrics

Used to compare means: average basket, annual spend, observed LTV. Test statistic:
t = (x̄_E − x̄_C) / √( s²_E/n_E + s²_C/n_C )

Where:
  x̄_E = mean of the Exposed Population
  x̄_C = mean of the Non-Exposed Population
  s²_E = variance of the Exposed Population
  s²_C = variance of the Non-Exposed Population
ThresholdInterpretation
|t| > 1.96Uplift significant at 95% (large samples)
95% CI(x̄_E − x̄_C) ± t_critical × standard error

Worked example — impact on average basket

GroupMeanStd. dev.Nt-valuep-value
Exposed€87.50€3210,000
Non-Exposed€74.20€3010,000
Uplift+€13.3029.4< 0.001

Uplift Modelling — Going Further

Uplift modelling (or causal machine learning) estimates the individual treatment effect, not just the average effect. It identifies four customer profiles:
ProfileBehaviour without exposureBehaviour with exposureRecommended action
Persuadables (true responders)Does not convertConvertsTARGET — high priority
Sure things (always buyers)ConvertsConvertsDo not expose — waste of resources
Lost causes (never buyers)Does not convertDoes not convertDo not expose — no effect
Sleeping dogs (negative effect)ConvertsDoes not convertExclude — counter-productive
In practice, Uplift modelling requires large volumes and data science expertise. For most Reelevant Use Cases, the classic A/B test framework is sufficient and much simpler to implement.

Reading and Communicating Confidence Intervals

Scenario 1 — Positive and significant

95% CI: [+1.8%; +3.2%] The effect is positive and statistically established. The minimum guaranteed Uplift at 95% confidence is +1.8 pts. Actionable result — publish and valorise.

Scenario 2 — Inconclusive

95% CI: [−0.2%; +2.8%] The interval crosses zero. A null effect cannot be excluded. The result is not conclusive. Increase sample size or extend the test.

Scenario 3 — Negative and significant

95% CI: [−2.1%; −0.3%] The effect is negative and statistically significant. The Reelevant Content had a counter-productive effect on this segment. Stop exposure on this segment and review the Content.

Common Interpretation Mistakes

  • Confusing statistical significance with practical importance: An Uplift of +0.1% can be statistically significant but have no business interest.
  • Stopping the test as soon as a positive result appears (peeking): This artificially inflates the false-positive rate.
  • Running multiple tests without correction (Bonferroni or Benjamini-Hochberg): Out of 20 tests, 1 false positive is expected by construction.
  • Not verifying test assumptions: The t-test assumes an approximately normal distribution on large samples.

Validation Checklist Before Publishing Results

Experimental design

  • Assignment was random (on-the-fly draw or holdout defined before measurement)
  • Both populations were comparable before exposure (equivalence test performed)
  • The observation window was defined a priori
  • No other lever differentiated the two populations

Statistical analysis

  • The statistical test is appropriate for the variable type
  • p-value < 0.05 (or predefined threshold)
  • Confidence interval calculated and communicated
  • Sufficient sample size (power ≥ 80%)

Valorisation

  • The incremental unit value is sourced and defensible
  • The incremental count is calculated from the observed Uplift
  • No double-counting with other metrics
  • The temporal projection is reasoned and bounded

Communication

  • Methodological limitations are explicitly mentioned
  • The observed Uplift is distinguished from the projected value
  • The period and segment are clearly defined
  • Results are reproducible (documentation available)