Statistical Model - Reelevant Documentation

Choosing the Right Statistical Test

The choice depends on the type of variable measured and the nature of the populations compared.

Decision tree

Situation	Recommended test	Usage	Example
Binary variable (yes/no): conversion rate, enrolment rate	Proportion difference test (Z-test or chi-squared)	Compare two rates: Exposed vs Non-Exposed Population	Enrolment rate: 6.5% vs 4.0%
Continuous variable, normal distribution: average basket, revenue per customer	Student’s t-test (two independent samples)	Compare two means on normal distributions	Average basket: €85 vs €72
Continuous variable, non-normal distribution: frequency, LTV	Mann-Whitney test (non-parametric)	Compare two medians without normality assumption	Purchases/year: median 3 vs 2
Control variables to integrate (customer profile, history)	Linear or logistic regression with control variables	Isolate Reelevant’s effect while controlling for other factors	Effect of exposure controlling for tenure and segment
Observational data without possible randomisation	Propensity Score Matching (PSM) or Difference-in-Differences (DiD)	Quasi-experimentation to approximate a causal effect	Comparison of Exposed/Non-Exposed with matched profiles

Essential Formulas

Test 1 — Proportion Difference Test (Z-test)

The most common test for binary metrics such as enrolment rates, conversion rates, or reactivation rates. Test statistic:

Z = (p̂_E − p̂_C) / √[ p̂(1 − p̂) × (1/n_E + 1/n_C) ]

Where:
  p̂_E = observed rate in the Exposed Population
  p̂_C = observed rate in the Non-Exposed Population
  p̂   = pooled rate = (x_E + x_C) / (n_E + n_C)
  n_E  = size of the Exposed Population
  n_C  = size of the Non-Exposed Population

Threshold	Interpretation
Z > 1.96	p-value < 0.05 → Uplift statistically significant at 95%
Z > 2.58	p-value < 0.01 → Uplift statistically significant at 99%
95% CI	Uplift ± 1.96 × standard error of the Uplift

Worked example

Metric	Exposed Population	Non-Exposed Population
Size	50,000	50,000
Enrolment rate	6.5% (3,250)	4.0% (2,000)
Absolute Uplift	+2.5 pts

Step-by-step calculation:

p̂_pooled = (3,250 + 2,000) / (50,000 + 50,000) = 5.25%
Standard error = √[ 0.0525 × 0.9475 × (1/50,000 + 1/50,000) ] = 0.001407
Z = (0.065 − 0.040) / 0.001407 = 17.77

→ Significant at 99.9%. 95% CI: [+2.22%; +2.78%]

A Z of 17.77 means the observed difference is 17 times larger than what chance could explain. There is less than 1 in 1,000,000 chance this Uplift is due to luck. The effect is real.

Test 2 — Student’s t-test for Continuous Metrics

Used to compare means: average basket, annual spend, observed LTV. Test statistic:

t = (x̄_E − x̄_C) / √( s²_E/n_E + s²_C/n_C )

Where:
  x̄_E = mean of the Exposed Population
  x̄_C = mean of the Non-Exposed Population
  s²_E = variance of the Exposed Population
  s²_C = variance of the Non-Exposed Population

Threshold	Interpretation
\|t\| > 1.96	Uplift significant at 95% (large samples)
95% CI	(x̄_E − x̄_C) ± t_critical × standard error

Worked example — impact on average basket

Group	Mean	Std. dev.	N	t-value	p-value
Exposed	€87.50	€32	10,000
Non-Exposed	€74.20	€30	10,000
Uplift	+€13.30			29.4	< 0.001

Uplift Modelling — Going Further

Uplift modelling (or causal machine learning) estimates the individual treatment effect, not just the average effect. It identifies four customer profiles:

Profile	Behaviour without exposure	Behaviour with exposure	Recommended action
Persuadables (true responders)	Does not convert	Converts	TARGET — high priority
Sure things (always buyers)	Converts	Converts	Do not expose — waste of resources
Lost causes (never buyers)	Does not convert	Does not convert	Do not expose — no effect
Sleeping dogs (negative effect)	Converts	Does not convert	Exclude — counter-productive

In practice, Uplift modelling requires large volumes and data science expertise. For most Reelevant Use Cases, the classic A/B test framework is sufficient and much simpler to implement.

Reading and Communicating Confidence Intervals

Scenario 1 — Positive and significant

95% CI: [+1.8%; +3.2%] The effect is positive and statistically established. The minimum guaranteed Uplift at 95% confidence is +1.8 pts. → Actionable result — publish and valorise.

Scenario 2 — Inconclusive

95% CI: [−0.2%; +2.8%] The interval crosses zero. A null effect cannot be excluded. The result is not conclusive. → Increase sample size or extend the test.

Scenario 3 — Negative and significant

95% CI: [−2.1%; −0.3%] The effect is negative and statistically significant. The Reelevant Content had a counter-productive effect on this segment. → Stop exposure on this segment and review the Content.

Common Interpretation Mistakes

Confusing statistical significance with practical importance: An Uplift of +0.1% can be statistically significant but have no business interest.
Stopping the test as soon as a positive result appears (peeking): This artificially inflates the false-positive rate.
Running multiple tests without correction (Bonferroni or Benjamini-Hochberg): Out of 20 tests, 1 false positive is expected by construction.
Not verifying test assumptions: The t-test assumes an approximately normal distribution on large samples.

Validation Checklist Before Publishing Results

Experimental design

Assignment was random (on-the-fly draw or holdout defined before measurement)
Both populations were comparable before exposure (equivalence test performed)
The observation window was defined a priori
No other lever differentiated the two populations

Statistical analysis

The statistical test is appropriate for the variable type
p-value < 0.05 (or predefined threshold)
Confidence interval calculated and communicated
Sufficient sample size (power ≥ 80%)

Valorisation

The incremental unit value is sourced and defensible
The incremental count is calculated from the observed Uplift
No double-counting with other metrics
The temporal projection is reasoned and bounded

Communication

Methodological limitations are explicitly mentioned
The observed Uplift is distinguished from the projected value
The period and segment are clearly defined
Results are reproducible (documentation available)

Key Challenges9 CRM Key Challenges with business definitions, Uplift formulas, financial valorisation formulas, and fully worked numerical examples.

⌘I

​Choosing the Right Statistical Test

​Decision tree

​Essential Formulas

​Test 1 — Proportion Difference Test (Z-test)

​Worked example

​Test 2 — Student’s t-test for Continuous Metrics

​Worked example — impact on average basket

​Uplift Modelling — Going Further

​Reading and Communicating Confidence Intervals

​Scenario 1 — Positive and significant

​Scenario 2 — Inconclusive

​Scenario 3 — Negative and significant

​Common Interpretation Mistakes

​Validation Checklist Before Publishing Results

​Experimental design

​Statistical analysis

​Valorisation

​Communication

Choosing the Right Statistical Test

Decision tree

Essential Formulas

Test 1 — Proportion Difference Test (Z-test)

Worked example

Test 2 — Student’s t-test for Continuous Metrics

Worked example — impact on average basket

Uplift Modelling — Going Further

Reading and Communicating Confidence Intervals

Scenario 1 — Positive and significant

Scenario 2 — Inconclusive

Scenario 3 — Negative and significant

Common Interpretation Mistakes

Validation Checklist Before Publishing Results

Experimental design

Statistical analysis

Valorisation

Communication