The Timeform Knowledge: Probability

How likely is a big-priced winner at the Cheltenham Festival?
Join today

In the latest edition of the Timeform Knowledge, Simon Rowlands examines basic probability, and shows you how it can be applied to betting on horse racing.

Winners at 33/1 or longer have happened only around 7.5% of the time at the Cheltenham Festival, but across 27 races in any one year it becomes much more probable than not that at least one will occur.

One aspect of probability which often trips up the novice (and even sometimes the expert) is that the probability of throwing two consecutive identical numbers is NOT the same as 0.0277.

It is a certainty that a number - any number, not a specified number - is thrown with the first dice, so it simply becomes that (probability = 1) multiplied by 1/6 (probability = 0.1666) that two unspecified numbers are thrown consecutively. That is, the probability is 0.1666 again.

This can be illustrated in horseracing terms by reference to the US Triple Crown, which is the achievement of winning the Kentucky Derby, Preakness Stakes and Belmont Stakes with the same horse. American Pharoah, in 2015, was the first horse to manage this feat since Affirmed in 1978: a magnificent effort by the horse himself but nowhere near as improbable an event as stated in some quarters.

The probability of this happening in any given year was not the multiplied probability (perhaps derived from race-day odds) of a given horse in the three separate legs, let alone the implied probability that one specific horse from the tens of thousands bred each year would pull off the feat.

That a horse - any horse - would win the first leg, The Kentucky Derby, was a certainty (for these purposes), so the true probability of a horse winning the Triple Crown was the probability that a horse which HAD WON the Kentucky Derby then won the Preakness Stakes and the Belmont Stakes.

Given that a horse which won the Kentucky Derby would, barring injury, almost certainly contest the second leg, the Preakness Stakes, at a short price, and then, if it won that second leg, contest the third leg as a dual classic winner at a very short price, the probability clearly becomes much less.

Perhaps something like 1 (certainty) multiplied by 0.28 (approximately 5/2 in terms of odds) multiplied by 0.4 (6/4 in terms of odds), which is 0.112, or around 8/1 (though others would doubtless assign different probabilities).

You might expect the US Triple Crown to be won about once a decade. It has been won 12 times in the last 100 years and the "famine" between 1978 and 2015 (which included several near misses) can be seen as unrepresentative.

Moving from the throwing of dice (an "aleatory" activity for those who like to know these things) to the uncertainty and subjectivity of horseracing has taken us into the far more interesting world of subjective and conditional probabilities.

In the above example, the probability of a horse winning the Preakness given that it had won the Kentucky Derby might be 0.28, and the probability of a horse winning the Belmont given that it had won the Kentucky Derby might be similar. But the probability of a horse winning the Belmont given that it had won the Kentucky Derby AND the Preakness will nearly always be greater.

The last-named scenario includes additional information, with probabilities clearly being influenced by the hypothetical chain of events. That is why bookmakers quote separate odds about conditional or contingent events, such as a horse winning both the 2000 Guineas and The Derby.

Probability Theory - not to mention common sense - shows that they are right to do so.

Probabilities case study: longest-priced winner at the Cheltenham Festival

Among the many "special bet" markets available at the Cheltenham Festival every year is one which illustrates the benefit of tackling things in terms of probabilities rather than more obvious but less suitable alternatives. This is the "longest-priced winner at the Cheltenham Festival" market. The categories can vary from year to year, but the principles are the same. For this illustration, "shorter than 33/1", "longer than 40/1" and "in between (33/1 and 40/1)" will be used.

History tells us that shorter has obliged no times in the last seven years, that longer has obliged once, and that in between has paid off on six occasions.

It would appear that in between is by far the likeliest option, but we are dealing with crude outcome measures of what is a complex phenomenon. Is that misleading, and, in any case, what would be a "fair" price?

In terms of frequencies across all races, shorter has accounted for 92.5% of winners across the last seven years, longer has accounted for just 0.5% and in between makes up the rest (7.5%). There have been 186 winners from 3368 runners.

The complexity of the proposition must be clearer to the reader now, for converting these frequencies into probabilities requires an understanding of what constitutes a "win" for any given category and some mathematics.

A win for longer requires that at least one race - any race - is won by a horse starting at 50/1 or greater; a win for shorter requires that every race - all of them - is won by a horse starting at shorter than 33/1; and in between will be what is left when those two likelihoods are deducted from the certainty that horses of various prices will win the races in question.

It is also the case that frequencies are poor ways of determining likely outcomes when those outcomes are rare (for the definitive explanation of this please read "Black Swan" by Nassim Taleb: in fact, read it anyway).

Fortunately, we have a better measure of the likelihood of each outcome, and that is provided by the Betfair Starting Price in a 100% book. Conventional starting prices have a "favourite-longshot bias" - whereby longer-priced horses are nonetheless shorter than they should be for precautionary reasons - but that is negligible with BSP.

Using BSP suggests that a random race at the Cheltenham Festival has a 92.9% chance of a shorter-priced horse winning, a 2.1% chance of a longer-priced horse winning (more than four times what is implied by frequencies above) and a 5.0% chance of in between.

Races at the Cheltenham Festival are not random - outsiders will be rare in some of them but more plentiful in others, for instance - but over 27 of them this assumption should bear out.

The likelihood that every race out of 27 will be won by a category which has a 92.9% chance of winning an individual race is much less than may be supposed.

It is, in terms of probabilities, 0.929 to the power of 27, or (to use Excel notation) =POWER (0.929, 27), which is 0.137 or 13.7%.

The likelihood that at least one race out of 27 will be won by a category which has a 2.1% chance of winning an individual race is the same as 1 minus the likelihood that this won't happen, or 1 - 0.979 to the power of 27: that is, 0.437 or 43.7%.

It follows that the in between value will be 1 minus the two above likelihoods, or 1 minus 0.137 minus 0.437: that is, 0.426 or 42.6%.

Percentage likelihoods of 13.7 (shorter), 43.7 (longer) and 42.6 (in between) are equivalent to 7.3, 2.29 and 2.35 in decimal odds terms, or roughly 13/2, 5/4 and 11/8 in fractional odds.

The in-between category (six wins out of seven) has been over-performing, but it is still a long-odds-on shot that either it or longer will prevail.
Winners at 33/1 or longer have happened only around 7.5% of the time at the Cheltenham Festival, but across 27 races in any one year it becomes much more probable than not that at least one will occur.

Probabilities tell you that where frequencies could put you away.

Join today