The 8 Steps of a Rigorous Incremental Measurement
Ask a business question
Before any numbers, define exactly what you want to prove. A good question is observable, measurable, and tied to a real customer behaviour.Examples of good questions:
- “Did I convert one-timers into repeat buyers?”
- “Did I increase loyalty programme enrolment among non-members?”
- “Do people exposed to Reelevant Content have a higher purchase frequency than Non-Exposed people?”
Identify the target population
The scope must be clear. With Reelevant, the observed population is everyone who opens email communications containing a Reelevant block.
| Criterion | Definition | Example |
|---|---|---|
| Observed population | Everyone who opens emails with a Reelevant block | Opt-in customers who opened at least one email with Reelevant Content during the period |
Split into Exposed and Non-Exposed Populations
This is the core of the method. Divide customers into two groups: those who see the personalised Reelevant blocks (Exposed Population, 90%) and those who see the version without personalisation (Non-Exposed Population, 10%). This split must be random to be valid.
| Population | Share | What they see |
|---|---|---|
| Exposed Population | 90% | Email with personalised Reelevant blocks |
| Non-Exposed Population | 10% | Same email, without Reelevant blocks — the reference baseline |
Sample size requirements
Population sizes depend heavily on the volume of exposed individuals. Frame the approach carefully to obtain significant results.| Term | Meaning |
|---|---|
| Base rate | The behaviour rate observed WITHOUT Reelevant (e.g. 4% second-purchase rate) |
| Target Uplift | The minimum gain you want to be able to detect (e.g. +1.5 points) |
Verify that both populations are comparable
Before drawing conclusions, confirm that both populations have the same starting profile. Otherwise, a difference in results could come from customer profiles, not from Reelevant.
| What to check | How | Alert signal |
|---|---|---|
| Historical average basket | Compare means of both populations | Gap > 5% |
| Past purchase frequency | Compare distributions | Profiles too different |
| Customer tenure | Compare averages | One population significantly “younger” |
| Geography | Check regional distribution | Over-representation of a region |
With Reelevant, the at-open random assignment already guarantees equivalence (see Step 8 — Reelevant’s assignment mechanism). This check is an additional safety measure.
Choose the observation window
How long do you observe behaviours after the send? Too short: you miss late effects. Too long: other marketing actions contaminate the measurement.
| Objective measured | Recommended window | Rationale |
|---|---|---|
| First purchase (prospect) | 7–14 days | Purchase decision is fast for this segment |
| Loyalty programme enrolment | 14–30 days | Decision takes slightly longer |
| Dormant customer reactivation | 30–60 days | Customer needs time to return |
| Purchase frequency | 90–180 days | Requires observing multiple purchase cycles |
| LTV and long-term retention | 6–12 months | Effect is measured over duration |
Calculate the Uplift
The Uplift is the behavioural difference between the Exposed Population and the Non-Exposed Population. It is the key figure that says “thanks to Reelevant, X more happened.”
| Term | Meaning |
|---|---|
| Absolute Uplift | Difference in percentage points (+5.4 pts here) |
| Relative Uplift | Gain expressed as % relative to the Non-Exposed Population (+35% here) |
How to know if the Uplift is real or just chance?
A statistical test calculates the probability that the observed gap occurred by accident. If this probability is below 5% (p-value < 0.05), the result is reliable.Analogy: If you flip a coin 10 times and get 7 heads, it might be chance. If you flip it 10,000 times and get 70% heads, that is statistically impossible by chance — the coin is biased. The same principle applies here.
Translate Uplift into euros
The behavioural Uplift (more clicks, more purchases, less churn) must be converted into financial value. This step makes results meaningful for leadership and the CFO.
| CRM objective | Unit value to use | Example |
|---|---|---|
| Loyalty enrolment | Annual spend(member) − Annual spend(non-member) | €215 − €130 = €85/member |
| Reactivation | Average revenue over 12 months post-return | €180 per reactivated customer |
| Churn reduction | Average annual revenue × estimated remaining lifetime | €150 × 2 years = €300 |
| One-timer → second purchase | Projected CLV × estimated retention rate | €72 per converted customer |
Document, interpret, and reproduce
A good test must be recorded in writing with its conditions, results, and limitations. This is the requirement for reproducing it and getting internal validation.
What you CAN affirm
- Reelevant generated X additional behaviours on this population
- This represents €Y of Incremental Value over the measured period
- The effect is statistically significant (reliable, not due to chance)
What you CANNOT affirm
- That the effect will be the same on another segment or another period
- That the effect will persist indefinitely without new personalisations
- That unit values will remain unchanged over time
Priority Segments
Each segment has its own logic: a question, an observation window, and a unit value for financial valorisation.One-timers
Question: “Did I convert one-timers into repeat buyers?”Window: 90 daysUnit value: CLV × estimated retention rate (e.g. €72 per converted customer)
Prospects (first purchase)
Question: “Does Reelevant Content increase first-purchase rate on non-buyer contacts?”Window: 7–14 daysUnit value: Average basket of first purchase
Dormant customers
Question: “Does the reactivation block re-engage customers inactive for 3+ months?”Window: 30–60 daysUnit value: Average revenue over 12 months post-return (e.g. €180)
Non-members (loyalty)
Question: “Did I enrol non-members into the loyalty programme?”Window: 14–30 daysUnit value: Annual spend(member) − Annual spend(non-member) (e.g. €215 − €130 = €85)
Reference Thresholds
| Situation | Threshold / Rule |
|---|---|
| Minimum Non-Exposed Population size | 1,700 contacts (base CVR 4%, target Uplift +1.5 pts) |
| Minimum Exposed Population size | 15,000 contacts under the same conditions |
| Profile gap between populations | Alert if > 5% on average basket, frequency, or tenure |
| Statistical significance | p-value < 0.05 |
| Segment definition snapshot | Always frozen at the send date — never recalculated after |
Three Levels of Measurement
You can analyse Reelevant’s impact at three levels of depth. Each level provides a different reading.| Level | Name | Example | Pros | Cons |
|---|---|---|---|---|
| 1 | Behaviour | ”+5.4 pts second-purchase rate” | Quick to measure, easy to understand | Does not yet tell you how much it is worth in euros |
| 2 | Immediate revenue | ”+€146,952 incremental revenue over 90 days” | Directly financial, defensible in steering committees | Does not capture effects on customer lifetime |
| 3 | Lifetime value (LTV) | ”+€238,797 incremental CLV at 12 months” | Complete picture — 1.6× more value than revenue alone | Requires reliable historical data for projection |
How Reelevant’s Assignment Works
Unlike a classic A/B test where populations are defined before the send, Reelevant assigns at the moment of email open. When the server generates the Content in real time, a random draw determines whether the customer sees personalised blocks (90%) or the standard version (10%). This mechanism is statistically valid because:- Stable assignment: A customer assigned to the Non-Exposed Population for a given send stays in that population for the entire experiment (via a persistent customer identifier).
- Independent of behaviour: The draw is independent of past behaviour — opens, profile, and purchase history do not influence the result.
- Law of large numbers: The 90/10 split is respected on average across the population. With 5,000+ individuals, both groups converge to identical behavioural profiles with no adjustment needed.
Analogy: Whether you draw all lottery tickets at once before the event or one by one as people arrive — if the draw is random, the statistical properties are identical.