I have been reading Kevin Murphy’s book Machine Learning: a Probabilistic Perspective to study machine learning in more depth. I love the book and it has been an exciting experience so far, but I wish it could have been better.

Once I started reading the first new topic (for me) Sec. 3.3 **The Beta Binomial Model**, I started seeing that he has used inconsistent notation in the formulae, especially with the shape parameters for the Beta distribution. I am not going to get into the details here, but if you read that section, you will understand the sloppiness.

Here, I thought I could share the R code for producing the plot in Figure 3.7. It is quite unfortunate that the Matlab code for the book (pmtk3) also uses inconsistent variable names. Fig. 3.7 is using `M = 10`

, which is not even mentioned in the text! For the plug-in approximation you need to use the MAP estimate given in Eq. (3.21) and a binomial distribution.

I hope the code below will help map the variable names to the ones defined in the text correctly and will clarify how the plots are produced.

########################################## # beta binomial model ########################################## library(ggplot2) # prior is beta(a, b) with a = 2, b = 2 a <- 2 b <- 2 # likelihood; 17 tails and 3 heads N0 <- 17 N1 <- 3 N <- N0 + N1 # predict number of heads in M = 10 more tosses M <- 10 # posterior predictive distribution (ppd) ppd <- rep(NA, M) for (k in 0:M) { # Eq (3.34) - (a, b) replaced by (a + N1, b + N0) ppd[k + 1] <- choose(M, k) * beta(a + k + N1, M - k + N0 + b) / beta(a + N1, b + N0) } # store results in a date frame for plotting with ggplot # store the title for faceting pltDf <- data.frame(x = 0:M, d = ppd, title = rep("Posterior Pred", M + 1)) # plug-in approx. # first get MAP estimate = mode of posterior. Eq (3.21) thetaHat <- (a + N1 - 1) / (a + b + N - 2) pluginApprox <- rep(NA, M + 1) for (k in 0:M) { pluginApprox[k + 1] <- choose(M, k) * thetaHat ^ k * (1 - thetaHat) ^ (M - k) } pluginDf <- data.frame(x = 0:M, d = pluginApprox, title = rep("Plug-in Approx", M + 1)) pltDf <- rbind(pltDf, pluginDf) ggplot(pltDf, aes(x, d)) + geom_bar(stat = "identity", fill = "navyblue", alpha = 0.8) + scale_x_continuous(breaks = pltDf$x) + facet_wrap(~title)

That produces the following plot: