I have been reading Kevin Murphy’s book Machine Learning: a Probabilistic Perspective to study machine learning in more depth. I love the book and it has been an exciting experience so far, but I wish it could have been better.

Once I started reading the first new topic (for me) Sec. 3.3 **The Beta Binomial Model**, I started seeing that he has used inconsistent notation in the formulae, especially with the shape parameters for the Beta distribution. I am not going to get into the details here, but if you read that section, you will understand the sloppiness.

Here, I thought I could share the R code for producing the plot in Figure 3.7. It is quite unfortunate that the Matlab code for the book (pmtk3) also uses inconsistent variable names. Fig. 3.7 is using `M = 10`

, which is not even mentioned in the text! For the plug-in approximation you need to use the MAP estimate given in Eq. (3.21) and a binomial distribution.

I hope the code below will help map the variable names to the ones defined in the text correctly and will clarify how the plots are produced.

##########################################
# beta binomial model
##########################################
library(ggplot2)
# prior is beta(a, b) with a = 2, b = 2
a <- 2
b <- 2
# likelihood; 17 tails and 3 heads
N0 <- 17
N1 <- 3
N <- N0 + N1
# predict number of heads in M = 10 more tosses
M <- 10
# posterior predictive distribution (ppd)
ppd <- rep(NA, M)
for (k in 0:M) {
# Eq (3.34) - (a, b) replaced by (a + N1, b + N0)
ppd[k + 1] <- choose(M, k) *
beta(a + k + N1, M - k + N0 + b) /
beta(a + N1, b + N0)
}
# store results in a date frame for plotting with ggplot
# store the title for faceting
pltDf <- data.frame(x = 0:M, d = ppd,
title = rep("Posterior Pred", M + 1))
# plug-in approx.
# first get MAP estimate = mode of posterior. Eq (3.21)
thetaHat <- (a + N1 - 1) / (a + b + N - 2)
pluginApprox <- rep(NA, M + 1)
for (k in 0:M) {
pluginApprox[k + 1] <- choose(M, k) *
thetaHat ^ k *
(1 - thetaHat) ^ (M - k)
}
pluginDf <- data.frame(x = 0:M, d = pluginApprox,
title = rep("Plug-in Approx", M + 1))
pltDf <- rbind(pltDf, pluginDf)
ggplot(pltDf, aes(x, d)) +
geom_bar(stat = "identity", fill = "navyblue",
alpha = 0.8) +
scale_x_continuous(breaks = pltDf$x) +
facet_wrap(~title)

That produces the following plot: