CSC 580辅导、C++，Python程序辅导

CSC 580 Homework 3
Due: 11/16 (Wed) 5pm
Instructions:
If you use math symbols, please define it clearly before you use it (unless they are standard
from the lecture).
You must provide the derivation for obtaining the answer and full source code for whatever
problem you use programming. Please email your source codes to csc580homeworks@
gmail.com.
Please use the problem & subproblem numbering of this document; do not recreate or
renumber them.
Submit your homework on time to gradescope. NO LATE DAYS, NO LATE SUBMISSIONS
ACCEPTED.
The submission must be one single PDF file (use Acrobat Pro from the UA software library if
you need to merge multiple PDFs).
Please include your answers to all questions in your submission to Gradescope. (Do not store
your answers in your source codes or Jupyter notebooks - I will not look at them by default.)
– You can use word processing software like Microsoft Word or LaTeX.
– You can also hand-write your answers and then scan it. If you use your phone camera, I
recommend using TurboScan (smartphone app) or similar ones to avoid looking slanted
or showing the background.
– Watch the video and follow the instruction: https://youtu.be/KMPoby5g_nE .
Collaboration policy: do not discuss answers with your classmates. You can discuss HW for
the clarification or any math/programming issues at a high-level. If that is the case, please
mention who you’ve talked to in your submission. Declaring your collaborators will not
result in deduction of points; instead, failure to declare your collaborators counts as academic
integrity violation.
1
CSC 580
Problem 1. Probabilistic Reasoning.
(a) Denote background evidence by event E. Suppose X,Y are two other events. Prove the
conditional version of Bayes’ rule:
P (X ∣ Y,E) = P (Y ∣X,E)P (X ∣ E)
P (Y ∣ E)
(b) Consider the following Bayesian network (picture by Lawrence Saul):
Figure 1: A Bayesian network for a house
(i) Using Bayes’ rule, calculate P (E = 1 ∣ A = 1). Is it larger than P (E = 1)? Does it make
intuitive sense?
(ii) Using Bayes’ rule, calculate P (E = 1 ∣ A = 1,B = 1). Is it larger than P (E = 1 ∣ A = 1)? Use
this as an example to demonstrate the “explain away” phenomenon discussed in class.
(iii) Is E ⊥ B ∣M? Justify your answer. Does your answer match your intuition?
(iv) Calculate the joint distribution of (J,M). Is J ⊥ M? Is J ⊥ M ∣ A? Justify your answers.
2
CSC 580
Problem 2. Maximum Likelihood Estimation.
(a) Let (n1, . . . , nK) ～ Multinomial(n, p) where p ∈ ?K?1 (recall that ?K?1 denotes the K-
dimensional probability simplex). We’d like to estimate p using this single observation.
(i) Write down the maximum log likelihood optimization problem (it is okay to omit terms that
do not matter w.r.t. the optimization problem). Don’t forget to specify the constraints.
(ii) Write down the Lagrangian of the optimization problem you have in (i).
(iii) Solve (ii) to find the MLE solution p?. (hint: we did something similar to this in the naive
Bayes model lecture.)
(b) Suppose you model a dataset of n iid D-dimensional sensor measurements S = (x1, . . . , xn)
(where each xi ∈ RD) using a spherical Gaussian distribution, where μ ∈ R, σ > 0 are the distribution
parameters:
p(x;μ,σ) = N(x;μ,σ2ID) = ( 12πσ2 )D2 exp(?∥x ? μ∥222σ2 )
(i) What is the maximum likelihood estimator of this model given S?
(ii) As discussed in Piazza, there are some errors in the CIML book Eqs. (16.14-16.16) and
(16.20-16.22). Based on your results in (i), can you write down their respective correct
formulae? Justify your answer.
3
CSC 580
Problem 3. Language Identification with Na?¨ve Bayes
Implement a character-based Naive Bayes classifier that classifies a document as English, Japanese,
or Spanish - all written with the 26 lower case characters and space.
The dataset is languageID.tgz and can be found in our Piazza page. You need to unpack it. This
dataset consists of 60 documents in English, Japanese, and Spanish. The correct class label is the
first character of the filename: y ∈ {E,J,S}.
We will be using a character-based multinomial na?¨ve Bayes model. You need to view each document
as a bag of characters, including space (we say ‘bag’ because we ignore the order). We have made
sure that there are only 27 different types of printable characters (a to z, and space) – there may be
additional control characters such as new-line, please ignore those. Your vocabulary will be these
27 character types.
Here is the model. Let ni be the length of the i-th document (same as the total number of characters
in the document including the space character). For i ∈ [n] ∶= {1, ..., n},
Generate yi ∈ {e, j, s} from Categorical(π) where π ∈ ?2 (i.e., π1 = P(yi = E), π2 = P(yi =
J), π3 = P(yi = S)).
Generate ?j ∈ [ni], xi,j ～ Categorical(θyi) where θy ∈?26,?y ∈ {E,J,S}.
Background on smoothing: When estimating a multinomial parameter, add-? smoothing is a
popular technique. This amounts to performing the MLE, i.e., count the occurrences and normalize
it, assuming that we have ? > 0 additional observations for each outcome (note: ? does not have to
be an integer). For example, if n1, . . . , nK ～Multinomial(n;p), then we estimate p by
p? = ? + ni∑Kl=1(? + nl) .
This helps avoiding the issue of assigning zero probability for test data points.
(a) Use files [y]0.txt to [y]9.txt where y ∈ {E,J,S} in each language as the training data. Estimate
the prior probabilities π with add-1 smoothing and print them. (Hint: Store all probabilities
here and below in log() internally to avoid underflow. This also means you need to do
arithmetic in log-space. But answer questions with probability, not log probability.)
(b) Using the same training data, estimate the class conditional distribution for English (i.e., θE)
using add-1 smoothing. Ensure that the components of the vector θE is ordered with the
following order: (a, . . . , z, space). Write down the formula for add-1 smoothing in this case.
Print θE which is a vector with 27 elements. Do the same for θJ and θS .
(c) Treat e10.txt as a test document x. Represent x as a count vector c(x) ∈ N27≥0. This is called
a bag-of-words vector (it is actually bag of characters, here, but bag-of-words is a standard
terminology in the field of natural language processing). Print the bag-of-words vector c(x).
4
CSC 580
(d) Let θy,i be the i-th component of θy. Write down mathematically how you will compute
p?(x ∣ y) for y = {E,J,S} with our estimated parameters. Here, we used p? to denote that it
is evaluated using the estimated probability. Then, compute and show the following three:
p?(x ∣ y = E), p?(x ∣ y = J), p?(x ∣ y = S).
(e) Write down mathematically the posterior p?(y ∣ x) using Bayes rule and your estimated prior
and likelihood. Show the three values: p?(y = E ∣ x), p?(y = J ∣ x), p?(y = S ∣ x). Show the
predicted class label of x based on your estimated model.
(f) Evaluate the performance of your classifier on the test set (files [y]10.txt to [y]19.txt in
three languages). Present the performance using a confusion matrix. A confusion matrix
summarizes the types of errors your classifier makes, as shown in the table below. The
columns are the true language a document is in, and the rows are the classified outcome of
that document. The cells are the number of test documents in that situation. For example, the
cell with row = English and column = Spanish contains the number of test documents that are
really Spanish, but misclassified as English by your classifier.
English Spanish Japanese
English
Spanish
Japanese
(g) Repeat the same experiment as (f), but this time with training and test examples induced by
loading only the first 5 rows of the respective documents. Report the new confusion matrix.
5
CSC 580
Problem 4. Principal Component Analysis
Download three.txt and eight.txt, which can be found in our Piazza page. Each has 200 handwritten
digits. Each line is for a digit, vectorized from a 16x16 gray scale image.
(a) Each line has 256 numbers: they are pixel values (0=black, 255=white) vectorized from
the image as the first column (top down), the second column, and so on. Visualize using
python the two gray scale images corresponding to the first line in three.txt and the first line in
eight.txt.
(b) Put the two data files together (threes first, eights next) to form a n×d matrix X where n = 400
digits and d = 256 pixels. The i-th row of X is x?i , where xi ∈ Rd is the i-th image in the
combined data set. Compute the sample mean xˉ = 1n ∑ni=1 xi. Visualize xˉ as a 16x16 gray
scale image.
(c) Center X using xˉ above. Then form the sample covariance matrix S = X?Xn?1 . Show the 5x5
submatrix S(1 . . .5,1 . . .5).
(d) Use appropriate software/library to compute the two largest eigenvalues λ1 ≥ λ2 and the corre-
sponding eigenvectors v1, v2 ofS. For example, in python one can usescipy.sparse.linalg.eigs.
Show the value of λ1, λ2. Visualize v1, v2 as two 16x16 gray scale images. Hint: you may
need to scale the values to be in the valid range of grayscale ([0, 255] or [0,1] depending on
which function you use). You can shift and scale them in order to show a better picture. It is
best if you can show an accompany ‘colorbar’ that maps gray scale to values.
(e) Now we project (the centered) X down to the two PCA directions. Let V = [v1, v2] be the
d × 2 matrix. The projection is simply XV . (To be precise, these are the coefficients along
the principal directions, not the projection itself.) Show the resulting two coordinates for the
first line in three.txt and the first line in eight.txt, respectively.
(f) Report the average reconstruction error 1n ∑ni=1 ∥xiV V ? ? xi∥2, where xi ∈ R1×d is the i-th row
of the centered data matrix X .
(g) Now plot the 2D point cloud of the 400 digits after projection. For visual interest, color points
in three.txt red and points in eight.txt blue. But keep in mind that PCA is an unsupervised
learning method and it does not know such class labels.
6
CSC 580
Problem 5: Project check-in.
Please answer the following in a point-by-point manner.
(a) Respond to my feedback on your project proposal. (Of course, if some points in my feedback
does not make sense to you, please point them out – I am happy to discuss more.)
(b) (Answer it around Nov. 14) Where are you in terms of your project progress:
What have you achieved?
What difficulties have you encountered?
What remains to be done? List all your todo items and give your deadline for each of them.
(The last homework will be shorter, to give you more time for your project.) Make sure you
allocate 1-2 weeks for writing up the project report.
What difficulties do you anticipate, and what are your contingency plans?