# Gambling Problem (part 1)

I’m not a gambler myself, but I do enjoy probability and statistics problems associated with games of chance. Here’s one I came up with recently:

A manufacturer has invented a new game of chance based on a three-reel slot machine. Instead of fruit, each reel contains the digits from 0 through 9, so that pulling the handle generates a number between 000 and 999.

The game is played as follows: a player inserts a coin and pulls the handle repeatedly to generate a series of random numbers. The machine keeps track of the numbers generated on each pull. As long as all the numbers are different, the player continues to pull the handle. The game ends when any number comes up a second time. The machine then pays out a dollar for each time the handle was pulled.

For example, here are the numbers generated in a representative game:

{94, 845, 913, 994, 96, 269, 377, 913}

Since the number generated on the 8th pull (913) is a repeat of the number generated on the third pull, the game ends there and the machine pays out 8 dollars.

On average, how many times can a player expect to pull the handle before the game ends?

First I’ll write a program to simulate this game, and in the next post I’ll develop an analytical solution. The solution begins after the fold.

I began with a program that generates one random number at a time using the sample function. A game is simulated as follows: For each number generated (after the first one), the program uses %in% to check if that number is already on the list of previously generated numbers. If it is, the game ends and the number of pulls is added to total. If the number is not already on the list, it is appended to the list and the game continues with a new random number.

All of this is wrapped in a for loop to simulate a large number of games. When the maximum number of iterations is reached, the expected number of pulls is found by dividing total by the number of iterations. Here is the program:

```t <- Sys.time()
maxiter <- 50000
total <- 0
for(i in 1:maxiter) {
numberlist <- sample(1000,1) - 1
npulls <- 1
repeat {
g <- sample(1000,1) - 1
npulls <- npulls + 1
if (g %in% numberlist) break
numberlist <- append(numberlist,g)
}
total <- total + npulls
}
t <- Sys.time() - t
tp <- 1000*t/maxiter
cat("Simulated value: ",total/maxiter,"\n",sep="")
cat("Timing: ",tp," ms/iteration.\n",sep="") ```

The result is:

```Simulated value: 40.30798
Timing: 0.2447502 ms/iteration.```

40 pulls is quite a lot when you think about it. The players’ arms would get tired after a while.

The program runtime is not too bad, but R documentation advises against the use of for loops. The append statement is also said to be slow. So I rewrote the code as follows:

```total <- 0
maxiter <- 50000
estimate <- 45
allnums <- 0:999
samplsize <- maxiter*estimate
t <- Sys.time()
numberlist <- sample(allnums,samplsize,replace=TRUE)
indx <- 1
niter <- 1
while(niter < maxiter) {
bcard <- rep(FALSE,1000)
npulls <- 1
g <- numberlist[indx]
bcard[g+1] <- TRUE
repeat {
npulls <- npulls + 1
indx <- indx + 1
if (indx > samplsize)
stop("Reached end of sample. Increase your estimate.")
g <- numberlist[indx]
if (bcard[g+1]) break else bcard[g+1] <- TRUE
}
total <- total + npulls
niter <- niter + 1
}
t <- Sys.time()-t
tp <- 1000*t/maxiter
cat("Simulated value: ",total/maxiter,"\n",sep="")
cat("Timing: ",tp," ms/iteration. \n",sep="")
```

There are a number of differences here. First, all of the random numbers are generated at once at the beginning of the program (line 7). Unfortunately in order to generate enough of these numbers, the expected number of pulls in each game must be known in advance. From the previous code we know the expected value is a little over 40, so to be on the safe side I used 45 here. Just in case, a check in line 18 ensures that we don’t try to read beyond the end of the list.

Another difference in the new code is that the numbers that turn up in each game are no longer stored in a list. Instead, a logical vector, bcard, is used to keep track of them. At the beginning of each game all 1000 values of this array are set to FALSE. As each number g is generated, the value of bcard[g+1] is set to TRUE (the index is g+1 because R’s array indexing begins with 1 and not 0 as in other languages). Then for each new g, instead of using %in% as before, the program just checks if bcard[g+1] is TRUE (line 20).

I’m sure an experienced R programmer would be able to eliminate some of the loops, but I didn’t try — though I did decide to use a while loop instead of a for loop. I’m not sure that’s any faster. But here are the results:

```Simulated value: 40.32498
Timing: 0.1039554 ms/iteration.
```

So the runtime has been reduced by about 57%, which is pretty good.