Episode five: Test


Even with a thorough diagnosis of the existing behaviour and detailed designs grounded in solid behavioural theory, there is still the chance that, within the particular area of focus, the interventions will not achieve the desired outcome. In order to deal with this challenge, the impact of interventions must be tested in the field using experimental impact evaluation techniques such as randomised controlled trials. Based on the outcome of this, a decision can then be made regarding the scalability of the intervention, or if it is necessary to move back to the design phase.


When it comes to rigorously testing the impact of interventions in the field, there are five key steps that need to be successfully completed. These five steps are outlined below.

1. Build a strong hypothesis

The first step is to define the hypothesis that is going to be tested in the field. This can be done by clearly and simply stating how the intervention is expected to achieve the behavioural objective.

A strong hypothesis should have the following characteristics: 

  • A well-defined target audience
  • A clear understanding of the outcome measure that defines success
  • A summarised explanation of the intervention  
  • A well-defined underlying 'theory of change'

2. Design the field experiment

Once the hypothesis has been clearly stated, the technical aspects of the field experiment then need to be decided on and a plan clearly outlining the structure must be put in place. The ideal field experiment is a randomised controlled trial where a sample of representative users are randomly divided into two groups. One of these groups is exposed to the intervention, while the other is used as a control group. Randomisation ensures that, as long as the two groups are large enough, they should be on average statistically identical.

However, this approach is not always feasible, and in such cases, a non-random cross-section or time-series approach are both adequate for producing useful outcomes. In this case, the variables that could bias results should be considered.


Randomised controlled trial

The RCT is a design first developed for medical experimentation and then adapted for the social sciences. It is characterised by its high internal validity, because it minimises potential sources of error by randomly assigning participants into either a treatment group (which receives the intervention) or a control group (which does not receive the intervention). As rigid constraints are placed on the environment in which participants are tested, the only difference between the two groups is the intervention itself. These factors allow an experimenter to say with some confidence that the intervention was the cause of any differences measured between the groups.


Cross-section experiment

A cross-section experiment is typically used to measure differences between large groups of participants at one particular point in time. Comparisons can be made between companies, cities or even countries. Some examples of this method include drawing a "cross-section" of 1000 people at random and measuring their levels of income, education, or employment status.


Time-series field experiment

A time-series experiment is a type of panel data that includes multiple equally-spaced measurements taken before an intervention and then multiple equally-spaced measurements taken after an intervention. This design is typically used for the measuring of widespread policy changes where it is difficult to put environmental constraints in place on a large number of people in various environments.


3. Conduct the field experiment

When launching the experiment, it is extremely important to ensure that the experiment design exposed to the treatment group is exactly the same as the control group, barring the intervention (independent variable). This limits the possibility of error and increases the accuracy of the final results.

4. Collect the data and analyse the results

Once the data has been collected and cleaned, an appropriate statistical test can be run to interpret the results. In addition to a statistical test that best suits the data, other tests can be run to determine the reliability of results and minimise the risk of bias. This will make for a more accurate understanding of effect size and/or significance.

5. Scale the intervention or move back to the design phase for adaptation

In deciding whether or not to scale an intervention, or move back to the design phase, there are several complexities to consider. A commonly used approach is the assessment of the potential profitability of an intervention by comparing the cost of scaling to the expected impact on turnover (derived from the effect size remaining consistent with the trial results when scaled). It, of course, becomes more complicated when additional components are considered, like the societal value of the behaviour change.