STA 141C - Big Data & High Performance Statistical Computing Spring 2022
Homework 4
Lecturer: Bo Y.-C. Ning Due June 02, 2023
Due June 02, 2023 by 11:59pm.
A few notes:
1. Submit your homework using the file name ”LastName FirstName hw4”
2. Answer all questions with complete sentences.
3. Your code should be readable; writing a piece of code should be compared to writing a page of a book.
Adopt the one-statement-per-line rule. Consider splitting a lengthy statement into multiple lines
to improve readability. (You will lose one point for each line that does not follow the one-statementper-line rule)
4. To help understand and maintain code, you should always add comments to explain your code. (homework with no comments will receive 0 points). For a very long comment, break it into multiple lines.
5. Submit your final work with one .pdf (or .html) file to Canvas. I encourage you to use LATEX for
writing equations. Handwriting is acceptable, you have to scan it and then combine it with the coding
part into a single .pdf (or .html) file. Handwriting should be clean and readable.
1 Handwriting recognition
In this homework, we work on a model-based method for handwritten digit recognition. Following figure
shows example bitmaps of handwritten digits from U.S. postal envelopes.
Each digit is represented by a 32×32 bitmap in which each element indicates one pixel with a value of white
or black. Each 32 × 32 bitmap is divided into blocks of 4 × 4, and the number of white pixels are counted
in each block. Therefore each handwritten digit is summarized by a vector x = (x1, . . . , x64) of length 64
where each element is a count between 0 and 16.
By a model-based method, we mean to impose a distribution on the count vector and carry out classification
using probabilities. The goal is to predict handwritten digit. We separate the dataset into training data and
test data. The training set contains 3823 handwritten digits and the test set contains 1797 digits.
A common distribution for count vectors is the multinomial distribution. However, it is not a good model for
handwritten digits. Let’s work on a more flexible model for count vectors, the Dirichlet-multinomial model.
4-1
4-2 Homework 4: Bo Y.-C. Ning
For a multivariate count vector x = (x1, . . . , xd) with batch size |x|=
Pd
j=1 xj , the probability mass function
for Dirichlet-multinomial distribution is
f(x|α) =
|x|
x
Qd
j=1(αj )xj
(|α|)|x|
,
where (a)k =
Qk−1
i=0 (a + i).
Given independent data points x1, . . . , xn, the log-likelihood is given by
L(α) = Xn
i=1
log
|xi
|
xi
+
Xn
i=1
X
d
j=1
[log(Γ(αj + xij ) − log(Γ(αj ))] −
Xn
i=1
[log Γ(|α|+|xi
|) − log Γ(|α|)] .
How do you calculate the MLE?
In this exercise, we use Newton’s method. First, the score function is given by
∂
∂αj
L(α) = Xn
i=1
[Ψ(xij + αj ) − Ψ(αj )] −
Xn
i=1
[Ψ(|xi
|+|α|) − Ψ(|α|)] ,
where Ψ(x) = d(log Γ(x)) = Γ0
(x)/Γ(x).
Next, the observed information matrix is given by
−d
2L(α) = D − c1d1
0
d
,
where D is a diagonal matrix,
Djj =
Xn
i=1
[Ψ0
(αj ) − Ψ
0
(xij + αj )] = Xn
i=1
xXij−1
k=0
1
(αj + k)
2
and
c =
Xn
i=1
[Ψ0
(|α|) − Ψ
0
(|xi
|+|α|)] = Xn
i=1
|xXi|−1
k=0
1
(|α|+k)
2
.
Then given an initial value for α
(0) = (α
(0)
1
, . . . , α
(0)
n ), the Newton’s method keep updating
α
(t) = α
(t−1) + [−d
2L(α
(t−1))]−1
dL(α
(t−1))
for t = 1, . . . , T. We stop when |L(α
(t)
) − L(α
(t−1))|≤ , and then take ˆα = α
(t)
.
Note that D is a diagonal matrix, we should use the Sherman-Morrison formula to write
[−d
2L(α)]−1 = D−1 +
1
1/c −
P
j
d
−1
j
D−1110D−1
In the folder uploaded on Piazza, you will find
• Data containing the training data and the testing data
• ‘ddirmult.R’, which evaluates the likelihood function (if log = FALSE) or the log-likelihood function
(if log = TRUE) of the Dirichlet-multinomial density
Homework 4: Bo Y.-C. Ning 4-3
• ‘ddirmult.fit.R‘, which estimates the maximum likelihood estimator (MLE) by Newton’s method
• ‘trainingMLE.R‘, which estimates the MLE based on the training data
Question 1. Open ‘trainingMLE.R’ and obtain MLE estimators for each of the 10 handwriting digits
(0, 1, 2, . . . , 9). (You may need to change the path when loading the data)
Question 2. Read in the testing data. Use the estimated MLE for each digit from training data to predict
handwriting digits for the testing data.
• Hint 1: To predict the handwriting digit, you should use the ‘ddirmult.R’ function. The following code
can be helpful
1 # testDigitProb stores posterior probability of each digit being 0 ,1 ,... ,9
2 testDigitProb <- matrix (0 , dim ( testdata ) [1] , 10)
3 for ( dig in 0:9) {
4 testDigitProb [, dig + 1] <-
5 ddirmult ( testdata [ , -65] , alphahat [ dig + 1, ], log = TRUE )
6 }
7 testDigitProb <- testDigitProb +
8 rep ( log ( digitCount / sum ( digitCount ) ) , each = nrow ( testdata ))
9 digitPredicted <- max . col ( testDigitProb ) - 1
• Hint 2: To summarize the result, you can construct a confusion table using the code
1 table ( testdata [, 65] , digitPredicted )
The output should look like this:
Question 3. Comment on using gradient descent to obtain the MLE (instead of Newton’s method)? (You
do not need to implement this.)
Question 4. What is the advantage and disadvantage of using gradient descent instead of Newton’s method?
Question 5. Do you think the current method is satisfactory for predicting handwriting digits? Do you
know any other method(s) that can achieve a higher accuracy?