Cookies on the Gambling Commission website

The Gambling Commission website uses cookies to make the site work better for you. Some of these cookies are essential to how the site functions and others are optional. Optional cookies help us remember your settings, measure your use of the site and personalise how we communicate with you. Any data collected is anonymised and we do not set optional cookies unless you consent.

Set cookie preferences

You've accepted all cookies. You can change your cookie settings at any time.

Skip to main content

Report

Illegal online gambling: Consumer engagement and trends

The Gambling Commission’s report into estimated trends in consumer engagement with illegal gambling websites.

Annex B - Bootstrapping methodology for web traffic estimates

Overview

To quantify the uncertainty in web traffic estimates, we employed a bootstrapping approach. This is one way to leverage the properties of a sample to further estimate properties of the population that sample came from.

Bootstrapping is a method that can estimate the variability of a given statistic by randomly resampling the data with replacement. By computing an estimated mean number of visits and visit duration for each of these 1,000 bootstrapped datasets, we defined the realistic range of our estimates as the 95 percent confidence interval that spans the 2.5th to the 97.5th percentiles of bootstrap-estimated values.

This method allows us to generate confidence intervals for key metrics without relying on parametric assumptions about the underlying data distribution. It is particularly useful when working with observational data that may exhibit skewness, outliers, or non-normality.

Metrics considered

The analysis focused on 2 primary metrics:

  • estimated visits per site per month
  • average visit duration provided in seconds and converted to minutes.

These were combined to derive a third metric: total time spent on identified illegal gambling websites, expressed in millions of minutes.

Bootstrapping methodology

For each month in the dataset:

  1. a subset of the data corresponding to that month was extracted
  2. from this subset, 1,000 bootstrap samples were drawn with replacement
  3. for each sample, the following statistics were calculated:
    1. mean number of visits
    2. total number of visits
    3. mean visit duration.

The 2.5th and 97.5th percentiles of the bootstrap distributions were used to construct 95 percent confidence intervals for each metric.

This process was repeated for every month in the dataset, resulting in a time series of bootstrapped estimates and associated confidence intervals.

Combining metrics

To estimate total time spent:

  • the mean number of visits was multiplied by the number of sites reporting data and the mean visit duration
  • confidence intervals for total time spent were derived by multiplying the lower bounds of the visit and duration intervals, and likewise for the upper bounds.

This approach assumes independence between the visit and duration metrics.

Interpretation

The resulting confidence intervals provide a range within which the true values of each metric are likely to fall given the observed data. This helps to communicate the inherent uncertainty in web traffic estimates.

The confidence interval calculated through this analysis is not uniform across a time series – it varies in magnitude between months. These variations are driven by the level of variation within the population of websites that month. The confidence interval will generally be smaller in months with the lower difference between the websites with the most and least traffic. We see larger confidence intervals In months where overall traffic is dominated by a smaller number of websites with high volumes of traffic.

Previous section
Annex A - Summary of approach, assumptions and caveats
Next section
Annex C - Consumer research on VPN use
Is this page useful?
Back to top