Flipping a coin until reaching equal heads and tails

Today’s question is: how many coin flips, on average, would we have to do if we keep flipping a coin until the number of heads equals the number of tails?

Obviously it takes at least two, and might take many more. After two flips there are four possible outcomes:

  1. HH: Heads on flip 1, heads on flip 2.
  2. HT: Heads on flip 1, tails on flip 2.
  3. TH: Tails on flip 1, heads on flip 2.
  4. TT: Tails on flip 1, tails on flip 2.

From basic probability theory (which you should go review if necessary), each of these four outcomes is equally likely. Two out of the four of them have an equal number of heads and tails; therefore, we know that 50% of the time we try: “Flip a coin until the number of heads and tails are equal” we will be done in two flips. But the other 50% of the time we will have to keep going.

How much longer in that other 50% of the time? The first thing that should be obvious is that there will have to be an even number of flips. There will never be an equal number of heads and tails in 3 flips, or 5 flips, or any odd number of flips.

So we have to do at least 2 more flips 50% of the time we start out (i.e., whenever we start out with HH or TT).

Starting with HH the next string of possibilities is:

  1. HH HH
  2. HH HT
  3. HH TH
  4. HH TT – this one reaches equal heads/tails

Similarly, starting with TT:

  1. TT HH – this one reaches equal heads/tails.
  2. TT HT
  3. TT TH
  4. TT TT

This time there were eight possibilities, and only 2 of them reach equality. Note that even though there are four flips of the coin there aren’t 16 possibilities, because eight of the sequences start with either HT or TH and therefore reach equality of heads and tails at 2 flips. Or looked at another way, those branches of the 2**n tree (2 possibilities for 1 flip, 4 possibilities for 2 flips,  8 for 3 flips, 16 for 4 flips, etc) were pruned off.

To see this explicitly, here is the complete list of all 16 possible sequences of 4 flips:

HHHH
HHHT
HHTH
HHTT

HTHH ** Never happens; equality after HT start
HTHT **    ...
HTTH **    ...
HTTT **    ...

THHH ** Never happens; equality after TH start
THHT **    ...
THTH **    ...
THTT **    ...

TTHH
TTHT
TTTH
TTTT

So, in review:

  • 50% of the time 2 flips is enough.
  • In the remaining 50% case:
    • 25% of the time (2 cases out of 8 as shown above) 4 flips is enough.
    • 75% of the time (6 cases out of 8) more than 4 will be needed.

We can start to do some math:

Percent of time reach equality in 4 flips or fewer:
    50% in 2 flips
    25% of 50% = 12.5% in 4 flips
    75% of 50% = 37.5% takes more than 4 flips

We can now say 62.5% (50% + 12.5%) of the time it will take 4 flips or fewer (i.e., 2 or 4) to get to equality, and 37.5% of the time it will take more than 4.

How many more? Carrying this case-by-case analysis out further gets messy quickly. Fortunately this problem is related to something called a Catalan number. A description of them can be found in the Online Encyclopedia of Integer Sequences (A000108). There is also a wikipedia entry which I paraphrase here:

There are many counting problems in combinatorics whose solution is given by the Catalan numbers.

C(n) is the number of Dyck words of length 2n. A Dyck word is a string consisting of n X’s and n Y’s such that no initial segment of the string has more Y’s than X’s.

For example, the following are the Dyck words of length 6:
XXXYYY XYXXYY XYXYXY XXYYXY XXYXYY.

The way they describe that is a bit confusing, plus we are actually interested in both XXXYYY (HHHTTT) and YYYXXX (TTTHHH). Here’s another description from wikipedia:

C(n) is the number of ways to form a “mountain range” with n upstrokes and n downstrokes that all stay above a horizontal line. The mountain range interpretation is that the mountains will never go below the horizon.

and this accompanying picture:

source/attribution: Wikipedia File:Mountain_Ranges.png

This is better and more directly illustrates why the numbers we need are two times the Catalan number, because we are interested in both mountain ranges that start by going up first (e.g., first flip is heads and eventually we need to get back “down” to equal heads/tails) or by going down first (e.g., first flip is tails and eventually we need to get back “up” to equal heads/tails).

Here is a listing of all possible sequences of 2 flips, 4 flips, 6 flips, 8 flips, and 10 flips that end up with an equal number of heads and tails:

# 2 sequences of 2 flips: 2 * Catalan(n=0)
HT
TH

# 2 sequences of 4 flips: 2 * Catalan(n=1)
HHTT
TTHH

# 4 sequences of 6 flips: 2 * Catalan(n=2)
HHHTTT
HHTHTT
TTHTHH
TTTHHH

# 10 sequences of 8 flips: 2 * Catalan(n=3)
HHHHTTTT
HHHTHTTT
HHHTTHTT
HHTHHTTT
HHTHTHTT
TTHTHTHH
TTHTTHHH
TTTHHTHH
TTTHTHHH
TTTTHHHH

# 28 sequences of 10 flips: 2 * Catalan(n=4)
HHHHHTTTTT
HHHHTHTTTT
HHHHTTHTTT
HHHHTTTHTT
HHHTHHTTTT
HHHTHTHTTT
HHHTHTTHTT
HHHTTHHTTT
HHHTTHTHTT
HHTHHHTTTT
HHTHHTHTTT
HHTHHTTHTT
HHTHTHHTTT
HHTHTHTHTT
TTHTHTHTHH
TTHTHTTHHH
TTHTTHHTHH
TTHTTHTHHH
TTHTTTHHHH
TTTHHTHTHH
TTTHHTTHHH
TTTHTHHTHH
TTTHTHTHHH
TTTHTTHHHH
TTTTHHHTHH
TTTTHHTHHH
TTTTHTHHHH
TTTTTHHHHH

Because we have two variations (heads first, tails first), the number of sequences possible for a given generation (“n” value) is double the catalan number C(n). The formula for the nth Catalan number is:

# python code. Note: // is integer division
def catalan(n):
    return comb(2*n, n) // (n + 1)

where comb() is the combinatoric function “m take n”.

Given all this we can compute how many patterns of n flips will result in our experiment finishing with an equal number of heads and tails. First let’s just print out a bunch of Catalan numbers (times 2):

import math

def comb(m, n):
    if m < n: 
        raise ValueError("requires m >= n")

    top = 1
    for i in range(n):
        top *= m-i
    return top // math.factorial(n)

def catalan(n):
    return comb(2*n, n) // (n + 1)

for n in range(20):
    print("flips={:2d}, 2C(n={})={}".format(
            2*(n+1), n, 2*catalan(n)))

Running this gives:

flips= 2, 2C(n=0)=2
flips= 4, 2C(n=1)=2
flips= 6, 2C(n=2)=4
flips= 8, 2C(n=3)=10
flips=10, 2C(n=4)=28
flips=12, 2C(n=5)=84
flips=14, 2C(n=6)=264
flips=16, 2C(n=7)=858
flips=18, 2C(n=8)=2860
flips=20, 2C(n=9)=9724
flips=22, 2C(n=10)=33592
flips=24, 2C(n=11)=117572
flips=26, 2C(n=12)=416024
flips=28, 2C(n=13)=1485800
flips=30, 2C(n=14)=5348880
flips=32, 2C(n=15)=19389690
flips=34, 2C(n=16)=70715340
flips=36, 2C(n=17)=259289580
flips=38, 2C(n=18)=955277400
flips=40, 2C(n=19)=3534526380

In the output above, think of n as the generation number where each generation is two flips. Like many things in computers and math we number the generations starting at zero, so the zeroth generation has 2 flips, and there are 2*C(0) == 2 combinations (HT, TH) that reach equality at that generation. Similarly the second generation (n=1) has four flips, and two of the sequences of four flips will reach equality at that generation. We showed this explicitly above by listing all the possibilities. These Catalan numbers become large quickly, confirming that explicit case-by-case analysis is going to get quite difficult. We need to figure out some general math instead.

The next thing we need to figure out is how many of the sequences there will be in each generation. We already know that it is less than 2**flips because some of them are cut off each generation – as was shown when there were only 8 sequences to consider at four flips (instead of 2**4 which would be 16).

If there are q sequences in generation n, the number of them that have equal numbers of heads and tails is given by 2*C(n); those sequences will not continue any further. Each remaining active sequence in generation n becomes four possible sequences in generation n+1, from appending HH, HT, TH, and TT outcomes.

Therefore, if q(n) is the number of active sequences in generation n, the number of sequences in generation n+1 will be:

q(0) = 4      # by definition
q(n+1) = 4 * (q(n) - 2*C(n))

Picking that back apart: q(n) is the number of sequences there will be at generation n, and 2*C(n) is the number of those that will reach equality. Therefore we subtract 2*C(n) from q(n) and that is the number of sequences that need to go for more flips; each of those sequences will expand to four possibilities in the next generation, hence the multiplication by 4.

It’s more convenient to express that equation in terms of the current generation and the previous generation (rather than current and next):

q(0) = 4
q(n) = 4 * (q(n-1) - 2*C(n-1))

As alread noted, some of those sequences will reach equality; for example this formula gives eight sequences at n=1 (four flips). We listed those eight before, and two of them (2*C(1)) reach equality at that generation (so six of them go on to n=2 / six flips).

This is a recursive formula that doesn’t let us compute an arbitrary q(n) without going through all the cases leading up to it, but just for grins lets write some code and print out a list of q(n) values and see if we agree with it:

for n in range(10):
    if n == 0:
	q = [4]
    else:
        q.append(4 * (q[n-1] - (2*catalan(n - 1))))

# q[i] will be the computed value at generation i
print(q)

This is some quick-and-dirty code that only works if we request the q values sequentially (as is done in this example), but it’s good enough for now. Running this (with the catalan function as previous defined) results in:

[4, 8, 24, 80, 280, 1008, 3696, 13728, 51480, 194480]

This matches the results we got manually – at n=0 (the very first pair of flips) there are 4 sequences to consider: HH, HT, TH, TT. At n=1, four flips, there are eight sequences to consider and we listed them above. We know, from that list, that 2 of those eight sequences reach equality, leaving 6 “active” sequences for the next generation, and we know that each of those 6 active sequences will turn into four new sequences (from appending HH, HT, TH, and TT) … so there will be 24 sequences to consider at the next generation.

Ok, so what does this tell us about how many times we’re going to have to flip that coin, on average, to get back to equal?

We have all the tools needed to calculate the chance that you reach equality by (or before) generation n. Let E(n) be the probability that we have an equal number of heads and tails by generation n (i.e.,  in 2*(n+1) flips). We know, by definition (inspection of all the cases: HH, HT, TH, TT) that E(0) = 0.50 (50%). The probability of reaching equality by arbitrary generation n will be made up of two components:

  • the chance you reached equality by or before generation n-1, which is E(n-1)
  • the chance you had to keep going, which is whatever is left after you take out the E(n-1) chance you already finished; so that is 1 - E(n-1), which we will then multiply by the chance that you reach equality on this nth generation.

We know from the above formulas that q(n) is the number of sequences “in consideration” in generation n. We also know that the number of sequences that will reach equality in generation n is given by: 2C(n) (twice the catalan number for n).

Therefore we know the chance you reach equality in a specific nth generation is the number of such sequences that reach equality, divided by the total number of sequences that would be in consideration at that generation:

2C(n)/q(n)

You can verify that yourself with the four flip case (n=1), where there were 8 sequences (q(1) == 8) and 2 of them (C(1) == 1, 2 times that is 2) reached equality, leading to a 25% chance of being done in that case (with that case only happening 50% of the time). Putting all that together we finally get:

E(n) = E(n-1) + (1 – E(n-1)) * 2C(n)/q(n)

Again we have a recursive formula that requires us to iterate through the cases. Here’s some code:

def q(n):
    if n == 0:
	return 4
    else:
        return (4 * (q(n-1) - (2*catalan(n-1))))

def bigE(n):
    if n == 0:
	return 0.50
    else:
	em1 = bigE(n-1)
	return em1 + ((1 - em1)*((2*catalan(n)) / q(n)))

Python experts will immediately realize there are several efficiency and memory problems lurking here;  on my machine trying bigE(500) takes about 5 seconds, and for large enough value bigE will fail with an exception because the recursion gets too deep. We can fix those problems but for now this code suffices to just get a feel for some results. I ran the calculations for some values and printed out the chance of NOT reaching equality (1-E(n)) by the nth generation:

Chance equality NOT reached after    2 flips: 50.00%
Chance equality NOT reached after    4 flips: 37.50%
Chance equality NOT reached after    6 flips: 31.25%
Chance equality NOT reached after    8 flips: 27.34%
Chance equality NOT reached after   10 flips: 24.61%
Chance equality NOT reached after   20 flips: 17.62%
Chance equality NOT reached after   50 flips: 11.23%
Chance equality NOT reached after  100 flips:  7.96%
Chance equality NOT reached after  200 flips:  5.63%
Chance equality NOT reached after  400 flips:  3.99%
Chance equality NOT reached after  600 flips:  3.26%
Chance equality NOT reached after  800 flips:  2.82%
Chance equality NOT reached after 1000 flips:  2.52%

These values seem pretty counter-intuitive to me. A 2.52% chance of not reaching equality within 1000 flips is huge. It means that if you sit down periodically and try this yourself with a coin (assuming it is a fair coin), about 1 time out of 40 you will have to do 1000 flips and still would have not reached equality. I think most people when asked how frequently that would happen (especially people who misunderstand the “law of averages”) would guess it to be very rare, and certainly not as likely as a 1 in 40 shot.

I opened this posting with the question “what would be the average number of flips it takes to reach equality” but I’m still not able to answer that question. I can’t tell yet whether these formulas, with their tailing-off values as they get large, converge onto a bounded number or whether in fact the average number of flips to reach equality might be unbounded (i.e., infinite). I’m going to wrap up this post just to get it “out there” and do some more google research later. I do wonder if I may have reached the limit of my mathematical pay grade in this area.

I have written a python simulator to do some “flip until equality” experiments, but I’m not yet prepared to write it up as I want to ensure that its results are valid, and without better math support I’m not sure yet. I will say the preliminary results from the python simulator suggest that the answer to “what is the average” might be “infinity” (!!) though I am not yet certain of that. I have definitely had some astoundingly-long runs come out of the simulator so far though — the longest I’ve seen at the moment being over 380 billion flips before equality was reached (I *think* the python random number generator is suitable for this task, having a period of 2**19937, but that’s just one of many questions I have at the moment about my simulation code).

With the caveat that I’m not 100% certain my code’s results are correct, here are some preliminary results of interest from the simulation.

I’ve run 1,607,009 trials (sequences of coin flips, each starting with one flip and finishing when the number of heads and tails is equal).  Out of those trials, 804,083 reached equality in two flips. That’s 50.036% which tracks the computed/expected result correctly. Equality in four flips was reached in 200,809 trials, or 12.496% which also tracks the 12.5% prediction. The longest sequence seen was 380,172,077,974 flips — 380 billion flips!

The top ten longest flip sequences at the moment are:

380,172,077,974
 33,378,243,830
 27,175,971,118
 25,567,423,414
 22,337,890,602
 11,709,891,508
 11,678,616,684
 10,918,847,300
 10,600,505,508
  9,424,437,132

These results are fueling my suspicion that the overall average might be unbounded… i.e., that these huge sequence length numbers will occur frequently enough to drive the average towards infinity. But I’m not certain of that (can’t really be certain of it until we do more math).

Sequences of length 2 occurred 804,093 times, length 4 occurred 200,809 times, etc so if you imagined a very long list of numbers – 804,093 entries with “2”, 200,809 entries with “4”, etc, and added them all up and computed the average entry value you’d get 398,917 (obviously I didn’t do this manually, I did it by weighted-average of my recorded data points). This number might be meaningless though, if my speculation about the average being unbounded is correct (i.e., if the average will continue to increase as more trials are run). But at the moment if you ask me to answer “Dammit, Neil, what’s the average?” I guess I’d say 400K flips needed, on average, to reach equality. Again that’s a huge number and somewhat counter-intuitive; you’d think a fair coin would reach my heads/tails equality criteria in “some reasonable” amount of time. Most human intuitions about probability are wrong, and assuming my simulation is returning valid results, this appears to be yet another case of that.

Correlation vs. Causation

In case you needed any more evidence of the link between fossil fuel imports (at least, from Norway) and harm to people’s health, I present you:

Solving an arithmagon

One of my nephews was working on an “arithmagon” puzzle. Here’s an example puzzle:
The idea is to solve for a, b, and c such that:

a + b = 26
a + c = 33
b + c = 35

and, I guess, the idea at his young age is to try different numbers ad-hoc until you find the ones that work.

Well, of course, that’s not how I think these should be done. 🙂

In general:

we have:

a + b = X
a + c = Y
b + c = Z

Three unknowns (a, b, c) and three linear equations; we can solve for the general case:

b = X - a
c = Y - a

therefore

b + c = Z
using b = X - a and c = Y - a:
X - a + Y - a = Z
X + Y - 2a = Z
X + Y - Z = 2a

a = (1/2) * (X + Y - Z)

So we get a = 12 for the original example shown at the start, and from that (using c = Y – a and b = X – a) c = 21 and b = 14.

Easy! Sorry for ruining all the puzzles for my nephew (once he can understand algebra).

 

Politifacts

Inspired by a friend’s recent Facebook posting, I tallied up the Politifact statistics for 4 Democrats and 4 Republicans: Obama, Biden, Pelosi, Reid, Boehner, Romney, Perry, and Gingrich.

Politifact has six ratings: True, Mostly True, Half True, Mostly False, False, and the crowd-pleasing Pants On Fire.

Here is the raw data (counts for each ruling):

NAME      True      Mostly True      Half True       Mostly False      False      Pants on Fire

Obama      80              74                   79                  42                 51             4

Biden       11               11                   11                   8                   6              3

Pelosi        1                 2                     9                   2                   3              2

Reid          2                 2                     3                   3                   3              1

Boehner   15                3                     4                   9                  16             1

Romney    20             15                   19                  13                 13             8

Perry        15             10                   30                   22                 23            11

Gingrich     3              3                     9                    6                    9              6

There are, of course, problems with this data. I don’t know how Politifact decides which statements to check. The selection process is likely biased in several ways. For example, perhaps controversial or likely-false statements are more likely to become Politifact fodder in the first place. Thus I would be hesitant to conclude that any of the above people are lying X% of the time in general, just because X% of their rulings are False or PantsOnFire. There are a variety of other problems lurking here, especially when the data is thin like for Pelosi and Reid.

Nevertheless, let’s do some math for the fun of it.

In aggregate, 21% of the statements ruled on are TRUE. Boehner is highest at 31% and Pelosi is lowest at 5% (but her data is so thin; one more “true” statement and she would have been 10%). The four Democrats scored 23% in aggregate and the GOP scored 19%.

Expanding to include “mostly true” (and the fully-true), the aggregate is 38%, with Obama best at 47%, and Pelosi and Gingrich bringing up the rear at around 16%. The four Democrats were 44% mostly-true or true, and the GOP was 30%.

False or Pants on Fire is 23% in aggregate. Gingrich is the run away winner here at 42% with Obama the best, of a sorry lot, at 17%. GOP 31%, Democrats 18%.

I also looked at the rulings for the DNC and the RNC. Their numbers:

 

True      Mostly True      Half True       Mostly False      False      Pants on Fire

DNC    6             5                   8                       6                     2                 0

RNC     2             4                   8                      5                     5                  2

This nets out to DNC being Mostly True or True 41% of the time, and the RNC only 23%.

The data is at the very least suggestive that the Republicans, in aggregate, are less truthful than the Democrats though we’d really have to do a more scientific survey (better selection methodology) to draw any real conclusion. Neither party appears to deserve a gold star for veracity.

Welcome to neilwebber.com

Well, I finally decided to upgrade my site from vintage 1997 “I edited this by hand with emacs, just barely one step up from hammer and chisel” … trying out WordPress. Nothing terribly fancy, just another blog.