Words With Friends Scores

Over a period of several months, I and a friend played 189 games of Words With Friends, a Scrabble-like game popular on Facebook. I kept track of our scores, and the resulting dataset — which I make available here — provides a couple of insights into the game.

The structure of the file itself is very simple: one line per game, with each line containing my score followed by my opponent’s score. There is no need to record the game number, since those are equivalent to the row numbers that are added automatically. Using the readr package:

require("tidyverse")
scores <- read_csv("gameset1.csv")

Here’s what the data looks like:

> scores
# A tibble: 189 × 2

      Me  Opp1
1    360   313
2    365   388
3    458   349
4    378   419
5    440   348
6    388   353
7    358   376
8    332   379
9    362   325
10   353   326
# ... with 179 more rows

(Note that if you print scores in R, the print routine for the tibble will also provide the type of each column. But since the type [int for these columns] is enclosed in angle brackets, WordPress apparently thinks they’re HTML commands and so does not print them.)

If the fundamental unit of observation is the game, then the data is about as tidy as it can get. However, to compare the respective distributions of our scores using ggplot, I needed all the scores to be in the same column. This gave me the chance to use tidyr::gather for the first time. Also I had to add in a column for game number after all:

sep_scores <- scores %>%
  mutate(game = as.integer(row.names(scores))) %>%
  gather(player,points,-game) %>%
  arrange(game)

The data frame sep_scores then looks like this:

> sep_scores
# A tibble: 378 × 3
    game player points
1      1     Me    360
2      1   Opp1    313
3      2     Me    365
4      2   Opp1    388
5      3     Me    458
6      3   Opp1    349
7      4     Me    378
8      4   Opp1    419
9      5     Me    440
10     5   Opp1    348
# ... with 368 more rows

The following code generates the paired density plot:

ggplot(sep_scores,aes(points,fill=player)) + geom_density(alpha=0.65,bw=25) +
  labs(title = "Density Plot of Scores", x = "Points per game")

I was surprised at how normal my scores looked (in the Gaussian sense). In fact, they passed a Shapiro-Wilk normality test with p = 0.9 (actually, the p-score means we can’t reject the hypothesis that my scores are drawn from a normal distribution). The mean of my scores is 391 and the standard deviation is 50.2.

My friend’s scores were a bit more skewed. They failed a Shapiro-Wilk test, with p = 0.003 (meaning there is sufficient evidence to reject the null hypothesis that the scores are drawn from a normal distribution). My friend’s scores had a mean of 342 and a standard deviation of 43.6.

Since the player with the higher score wins the game, what really matters is the difference between the scores. To generate a density plot of this difference, I added a new column delta to scores:

scores <- scores %>% mutate(delta = Me - Opp1)
ggplot(scores,aes(delta)) + geom_density(fill="gold1",alpha=1/2,bw=25) +
  labs(title="Density Plot of Winning Margin", x = "Winning Margin")

Although the distribution appears bimodal, it does pass a Shapiro-Wilk test for normality, with p = 0.25. The mean is 49.3 and the standard deviation is 77.4. This means I beat my friend by an average of 49 points per game. But the large standard deviation means he wins about 25% of the games.

The figure above looked strangely familiar. Where had I seen it before? Of course:

It’s not a hat, it’s a boa constrictor swallowing an elephant! [The reference is to Le Petit Prince, a book fondly remembered by many students who studied French in high school.]

My next question was whether our scores were correlated. There are three possibilities here. First, there could be no correlation at all; our scores in each game are just random draws from our respective score distributions.

The second possibility is that the scores are positively correlated. It could be that when one player does well, the other player rises to the challenge and does well too.

The third possibility is that the scores are negatively correlated. A possible explanation here is that the J, Q, X and Z tiles are worth a lot of points, so the more one player gets, the fewer the other player gets.

Let’s look at a plot of my opponent’s score vs. my score for each game, along with the regression line:

ggplot(scores,mapping=aes(Me,Opp1)) + 
  geom_point(color=ifelse(scores$delta>0,"blue","red"),size=2.5,shape=19) +
  geom_abline(intercept = mod_coef[1], slope = mod_coef[2], size=1) + 
  labs(title = "Scores in 189 games",x = "My Score", y = "Opponent's Score") +
  coord_cartesian(ylim = c(100, 600)) 

Points in blue represent games I won, and points in red are games I lost. It looks like the correlation is negative, meaning the higher my score is, the lower my opponent’s score (and vice versa). So the game is to some extent “zero sum”.

Let’s look at the details of the linear regression:

summary(lm(Opp1 ~ Me, data=scores))
Call:
lm(formula = Opp1 ~ Me, data = scores)

Residuals:
     Min       1Q   Median       3Q      Max 
-106.915  -27.256   -1.494   23.824  131.806 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 464.98353   23.36229  19.903  < 2e-16 ***
Me           -0.31417    0.05921  -5.306 3.15e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 40.72 on 187 degrees of freedom
Multiple R-squared:  0.1308,	Adjusted R-squared:  0.1262 
F-statistic: 28.15 on 1 and 187 DF,  p-value: 3.154e-07

The r-squared is only 0.1308, but the regression is statistically significant. The slope of -0.31417 means that for every ten points I get, my opponent’s score is about 3 points lower.

So it seems that the score of each game is a draw from a bivariate normal distribution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s