Live return to player performance monitoring of games of chance

Remote operators are required to monitor the performance of the games they offer.

Even though games must be tested prior to release (and after updates which may affect fairness) it is possible for a design, implementation or operational issue to evade identification during testing or deployment. This may affect the game’s RTP and result in either an overpaying or underpaying product.

If you provide applicable products you must have processes in place to measure the ongoing performance of games. These would usually be periodic reports or automated backend processes running over the stored transactional data.

You should raise alerts where a game appears to be falling outside the acceptable performance range. You should keep appropriate records as evidence of these processes as well as any more detailed investigations that have been performed as a result of an alert or an escalated customer complaint that warrants such an investigation.

Where a game indicates an error in its performance that is confirmed upon further investigation a key event must be submitted.

Businesses in scope for the annual games testing audit (those who produce and update games and obtain the external testing by approved test houses) part of this audit will involve a review of RTP monitoring processes for adequacy.

While the focus of this guidance is on RNG driven products with a statistically defined RTP, such as slots games, there will be some performance monitoring applicable for other products as indicated in some of the examples. This will generally include products such as bingo, peer-to-peer poker, blackjack and virtual sports where skill or player choices can influence the return. For these products the focus of monitoring would instead be on the frequency and distribution of possible event outcomes to ensure they are acceptably random.

The overall aim of this monitoring is to ensure games are operating fairly as designed and advertised.

How to calculate return to player (RTP)

By dividing the win and turnover figures generated from a game you can determine the actual RTP.

For example, if after one month of play a game, designed with a 91.68% RTP, has accrued £1,200,000 of turnover and £1,085,000 in wins the RTP can be calculated as follows:

1,085,000 / 1,200,000 = .9042

Therefore this game has achieved an actual RTP of 90.42%, which is below the designed RTP.

Importantly however the volatility of the game must also be considered as it will inform the allowable tolerance above or below the theoretical RTP. The tolerance will be wider when only a limited amount of play has been measured, but as the volume of play increases the tolerance will decrease.

After a significant amount of plays the actual RTP should be very close or equal to the theoretical RTP. To continue with the above example, if the game had a volatility (standard deviation) of 5.6 then the acceptable upper and lower tolerance will be as below:

Number of games played	Range +/-	% from the mean^†
50,000	+/-	4.90862
100,000	+/-	3.47092
200,000	+/-	2.45431
300,000	+/-	2.00393
400,000	+/-	1.73546
500,000	+/-	1.55224
600,000	+/-	1.41700
700,000	+/-	1.31188
800,000	+/-	1.22715
900,000	+/-	1.15697
1,000,000	+/-	1.09760

^† This deviation from the mean is calculated with a 95% confidence interval (opens in new tab). This would mean a non defective game might still fall outside range approximately 1 in 20 tests. A higher confidence interval can be selected to reduce the chance of false alarms however caution should be exercised so as not to create tolerances that are too wide. The confidence interval should not exceed 99%, this would mean a non-defective game might fall outside range approximately only 1 in 100 tests. One measurement failure does not confirm the game/RNG is faulty, however sequential failures or a number of failures over a given frequency of measurements might.

So if 400,000 games had been played to accrue turnover and win figures of the example then the allowable tolerance will be 1.75 above or below 91.68. The game could return between 89.93% and 93.43% and still be considered to be performing as expected.

The game’s designers will have calculated the exact theoretical RTP as well as the game’s volatility (these figures will also be reviewed as part of the required external testing). These are the figures against which the actual performance should be measured.

What volume of play should be achieved before measuring the actual RTP?

Measuring a game with only a small amount of play will be pointless as the tolerance will be too large for meaningful results. On the other hand waiting for millions of game plays to occur might mean a game with errors is not detected for an unreasonably long time. The volatility of a game will detail the acceptable tolerance range and must be taken into account regardless of the amount of play accrued. In this way measurements can be performed at any time and the volatility derived tolerance will determine whether the game is performing as expected or not.

As one instance of a remote game will be released to potentially thousands of players the game will quickly accrue thousands of game plays, particularly if it’s a popular game offered by multiple remote casinos. In cases where the game is offered via a B2B supplier they will likely hold the aggregated win and turnover figures for all of the B2Cs offering the game to their customers.

It will be up to the licensee (often based on the game designer’s instructions) to determine the measurement approach and frequency. One approach could be to perform daily measurements based on the previous 30 days of play, which will ensure fresh data sets are measured as time progresses. Measuring months and months of activity could hide errors that have been introduced by new updates. A wider date range (for example, measuring a rolling 90 days of activity) could be measured in parallel so that a greater volume of play is considered (meaning the data will have a much finer tolerance).

Another approach instead of basing the measurement on a number of days could be to measure once X number of plays has been achieved. This would account for volume of play issue between popular and unpopular games.

The theoretical RTP is the centre line and the allowable tolerance above and below is represented by the green and red lines respectively and is determined by the game’s volatility. As more gameplay is achieved the actual RTP should be very close to the theoretical RTP.

Other considerations for live return to player performance monitoring of games of chance

As remote gambling will record transactions in databases, and often at a very granular level, it easily facilitates more sophisticated measurements. The granularity of recorded gaming transactions and performance measurements should be commensurate with the game’s design and complexity, it should enable accurate performance monitoring. Below are more detailed examples, these could be performed as part of the normal monitoring processes, on an ad-hoc basis, or when investigating an apparent discrepancy.

Measuring each stake level

A game that allows players to alter the stake per spin will result in turnover and win figures with mixed stake and wins from all bet levels. This will mean that activity played at max bet levels might drown out activity played at minimum bet levels. After a high number of games the influence of this will reduce, however measuring a game independently for each main bet level can give more accurate results. It can also detect if there is a problem that only exists with certain stake levels (for example, a designed multiplier might not be working properly). Consideration should be given to monitoring games at a stake level where possible, particularly when investigating possible game faults.

Measuring per channel

Remote games are often released separately on different platforms or channels (for example, mobile, flash, download). Depending on the game’s design and architecture it could mean a game faults might only exist in one channel, this will be harder to detect if activity from all channels is aggregated into one measurement. We have seen instances where although the mobile version of a game was based on a previous flash version errors made in the adaption process have resulted in errors that only existed in the mobile version. Where there is the potential for fairness to be affected as a result of differences between channels then measurements should be made at both an aggregated level and per channel level.

Segregating base game activity from bonus features or progressive jackpots

Where games are designed with complex bonus features the ability to monitor the game at both a base game and feature game level should be included. This will be particularly important where the feature has a large effect on the overall game’s RTP and is certainly important where a game implements a skill component in the feature (as the skill RTP component will vary greatly depending on players’ actions). For example, some games will offer a 50/50 (double or quits) gamble feature; monitoring the overall game RTP including gambles will not necessarily confirm that the gamble is operating as a true and fair 50/50. Similarly for games connected to progressive jackpots, the base game should be measured independently of the progressive component.

Virtual sports products

In a similar way to skill games virtual sports returns will mostly depend on the player choices and so there won’t be a single theoretical RTP. In its place operators could monitor the hit rate and distribution of each possible event outcome against the designed probability. For example if there are seven virtual horses in a racing event ensure each horse is winning the expected number of races according to their designed probability, as reflected by the offered odds (with over round). A similar approach might apply to roulette and blackjack.

What about live dealer casino games?

The primary focus in RTP monitoring is on RNG driven software product. Live dealers use physical equipment (such as roulette wheels and decks of cards) to determine the results and there are a range of other integrity measures that surround such provision. For example, there will be controls over the supply (from casino standard manufacturers), installation and continuing operation of these devices. Fair shuffling of cards, ongoing integrity measurements of roulette wheels and dealing processes all have an influence over the fairness and will be part of everyday provision.

There is still merit in measuring certain outputs after a period of play, such as the distribution of results on a roulette wheel over an extended period to see if they are acceptably random. This information will be held in databases and can easily be measured.

Measuring progressive jackpots

Games connected to progressive jackpot systems will need to be measured at a base game level (that is, the game without the jackpot component). Jackpots are reaching very high values and their performance should be separately monitored. Given jackpots tend to be infrequent and large they will have high volatility ratings, therefore measuring their RTP might not be feasible. In its place other checks can be performed such as whether the frequency and distribution of jackpots and average jackpot levels are as expected. Designers of the jackpot system will be best placed to define the monitoring approach.

Key terms relating to live return to player performance monitoring of games of chance

Theoretical RTP

This is the designed return to player percentage of the game, it will also be the advertised RTP of the game as displayed in the player facing rules, as per RTS 3C.

Actual RTP

Calculated using the generated win and turnover figures of the live (operational) game. It shows the RTP the game has actually achieved for the past period as covered by the selected win and turnover amounts.

Volatility

Most commonly the standard deviation of the game is used to represent the game’s volatility. A highly volatile game will have a larger tolerance and might be comprised of prizes falling into the ‘very large but rare’ category. A low volatility game will be much more predictable and mostly comprised of prizes falling into the ‘small and often’ category. Standard deviation is a mathematically calculated figure (square root of the game’s variance, where the variance depends on the game’s cycle and prize frequency).

Turnover

The total of all stakes made on the game, this will include reinvested winnings awarded during play.

Win

The total of all prizes awarded during game play. The GGY of a game will be the turnover minus win.

Last updated: 29 January 2025

Show updates to this content

Formatting changes only.

Cookies on the Gambling Commission website