I’m not a gambler myself, but I do enjoy probability and statistics problems associated with games of chance. Here’s one I came up with recently:

A manufacturer has invented a new game of chance based on a three-reel slot machine. Instead of fruit, each reel contains the digits from 0 through 9, so that pulling the handle generates a number between 000 and 999.

The game is played as follows: a player inserts a coin and pulls the handle repeatedly to generate a series of random numbers. The machine keeps track of the numbers generated on each pull. As long as all the numbers are different, the player continues to pull the handle. The game ends when any number comes up a second time. The machine then pays out a dollar for each time the handle was pulled.

For example, here are the numbers generated in a representative game:

{94, 845, 913, 994, 96, 269, 377, 913}

Since the number generated on the 8th pull (913) is a repeat of the number generated on the third pull, the game ends there and the machine pays out 8 dollars.

On average, how many times can a player expect to pull the handle before the game ends?

First I’ll write a program to simulate this game, and in the next post I’ll develop an analytical solution. The solution begins after the fold.

I began with a program that generates one random number at a time using the **sample** function. A game is simulated as follows: For each number generated (after the first one), the program uses **%in%** to check if that number is already on the list of previously generated numbers. If it is, the game ends and the number of pulls is added to *total*. If the number is not already on the list, it is appended to the list and the game continues with a new random number.

All of this is wrapped in a **for** loop to simulate a large number of games. When the maximum number of iterations is reached, the expected number of pulls is found by dividing *total* by the number of iterations. Here is the program:

t <- Sys.time() maxiter <- 50000 total <- 0 for(i in 1:maxiter) { numberlist <- sample(1000,1) - 1 npulls <- 1 repeat { g <- sample(1000,1) - 1 npulls <- npulls + 1 if (g %in% numberlist) break numberlist <- append(numberlist,g) } total <- total + npulls } t <- Sys.time() - t tp <- 1000*t/maxiter cat("Simulated value: ",total/maxiter,"\n",sep="") cat("Timing: ",tp," ms/iteration.\n",sep="")

The result is:

Simulated value: 40.30798 Timing: 0.2447502 ms/iteration.

40 pulls is quite a lot when you think about it. The players’ arms would get tired after a while.

The program runtime is not too bad, but R documentation advises against the use of **for** loops. The **append** statement is also said to be slow. So I rewrote the code as follows:

total <- 0 maxiter <- 50000 estimate <- 45 allnums <- 0:999 samplsize <- maxiter*estimate t <- Sys.time() numberlist <- sample(allnums,samplsize,replace=TRUE) indx <- 1 niter <- 1 while(niter < maxiter) { bcard <- rep(FALSE,1000) npulls <- 1 g <- numberlist[indx] bcard[g+1] <- TRUE repeat { npulls <- npulls + 1 indx <- indx + 1 if (indx > samplsize) stop("Reached end of sample. Increase your estimate.") g <- numberlist[indx] if (bcard[g+1]) break else bcard[g+1] <- TRUE } total <- total + npulls niter <- niter + 1 } t <- Sys.time()-t tp <- 1000*t/maxiter cat("Simulated value: ",total/maxiter,"\n",sep="") cat("Timing: ",tp," ms/iteration. \n",sep="")

There are a number of differences here. First, all of the random numbers are generated at once at the beginning of the program (line 7). Unfortunately in order to generate enough of these numbers, the expected number of pulls in each game must be known in advance. From the previous code we know the expected value is a little over 40, so to be on the safe side I used 45 here. Just in case, a check in line 18 ensures that we don’t try to read beyond the end of the list.

Another difference in the new code is that the numbers that turn up in each game are no longer stored in a list. Instead, a logical vector, *bcard*, is used to keep track of them. At the beginning of each game all 1000 values of this array are set to FALSE. As each number *g* is generated, the value of *bcard[g+1]* is set to TRUE (the index is *g+1* because R’s array indexing begins with 1 and not 0 as in other languages). Then for each new *g*, instead of using **%in%** as before, the program just checks if *bcard[g+1]* is TRUE (line 20).

I’m sure an experienced R programmer would be able to eliminate some of the loops, but I didn’t try — though I did decide to use a **while** loop instead of a **for** loop. I’m not sure that’s any faster. But here are the results:

Simulated value: 40.32498 Timing: 0.1039554 ms/iteration.

So the runtime has been reduced by about 57%, which is pretty good.

## One thought on “Gambling Problem (part 1)”