首页 >
> 详细

ST3SML

ASSIGNMENT

This assessment is due 12:00pm (mid-day) Friday 20th March 2020.

You are required to submit a hardcopy of your solutions AND a .R file

containing your annotated R code used to obtain your solutions.

Submit the hardcopy in the drop box in the Support Centre, JJT, and fill in a coursework

coversheet.

Submit your .R file on blackboard.

Note that is it not necessary to include code used for data manipulation and plotting in your

.R file. Plotting and data manipulation may be performed in any suitable software package.

Question

In linguistics, a phoneme is the smallest unit of sound that distinguishes one word from

another. This question concerns classification of the phonemes “aa”, as in the sound of the

vowel “a” in the word “dark”, and “ao” , as in the sound of the vowel “a” in the word

“water”, using samples from the TIMIT speech recognition database (TIMIT AcousticPhonetic

Continuous Speech Corpus, NTIS, US Dept of Commerce).

The data for this question were obtained from speech frames of phonemes taken from samples

of continuous speech by different speakers. The data are available on Blackboard as ASCII

files and consists of a column labelled “speaker”, a response column labelled “g” and 256

columns labelled “x.1” - “x.256”. Each row of “x.1” - “x.256” is a log-periodogram computed

from a speech frame of either “aa” or “ao” measured at 256 frequencies. A log-periodogram

is a widely used method for converting speech to a form suitable for speech recognition.

The training data is in file phoneme_train.txt and the test data is in phoneme_test.txt.

(a) Using the training data, draw line plots of a sample of 20 log-periodograms for the

phoneme “aa” and a sample of 20 log-periodograms for the phoneme “ao” against frequencies

1 to 256, on the same graph. The log-periodogram values should be on the

y-axis and the frequencies on the x-axis.

Comment on the plots.

[5 marks]

1

(b) Perform logistic regression on the training data in order to predict a phoneme using its

log-periodogram and obtain the confusion matrix and training error rate. Do not include

an intercept term in your model.

Also compute the test error rate and a bootstrap 95% confidence interval for the test

error based on 1000 bootstrap estimates. You should include your code for computing

the bootstrap estimate in the hard copy of your solution.

Comment on your results.

[20 marks]

(c) Repeat part (b) using a QDA model and comment on the results. Also compare the

error rates with those obtained for the logistic regression model.

[20 marks]

(d) Here we investigate improving the test error rate of the logistic regression model in part

(b) by constructing a simple filter.

The figure above is a plot of the estimated parameters βˆ1, . . . , βˆ256 from the fitted logistic

model in part (b) with predictors x.1,. . .,x.256 against the frequencies 1, . . . , 256. The rapid fluctuations seen in the plot indicates strong negative correlation between neighbouring

estimates and is due to the neighbouring frequencies in the speech frames being

highly positively correlated.

We now wish to construct a logistic regression model in which the parameter estimates

are forced to vary smoothly with frequency.

(i) Write R code to generate 13 natural cubic splines basis functions with knots uniformly

placed over the integers 1, 2, . . . , 256, representing the frequencies, and construct

the 256 × 13 basis matrix B.

What are the elements in the last 6 rows of your matrix B?

Note: To avoid the elements in B becoming too large, you should rescale the frequencies

1, 2, . . . , 256 to take values between 0 and 1 . The following code constructs

13 knots evenly placed between the rescaled frequencies.

knots<-quantile(1:256/256, probs=seq(0, 1, length.out=13))

[22 marks]

(ii) “Filter” the predictors x = (x.1,. . .,x.256) in the training data by computing x∗ =xB and fit a linear logistic regression model (without an intercept term) using x∗as your predictor variables.

What are the values of the parameter estimates βˆ∗

1, . . . , βˆ∗

13 in your model?

[5 marks]

(iii) Construct the plot of βˆ

1, . . . , βˆ

256 from the model in part (b) against frequencies

1, . . . , 256, as shown in the figure above,

Comment on your smoothed curve.

[8 marks]

(iv) Obtain the training and test confusion matrices and error rates for your model

based on the filtered predictors.

[15 marks]

(e) Compare the error rates obtained in part (d) with those obtained for the logistic regression

model in part (b) and write a summary of your findings.

[5 marks]

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- 代做r语言编程|代做spss|代写web开发|代做... 2021-05-10
- Data留学生编程代做、代写analysis程序、Sql语言程序调试代做r语 2021-05-10
- 代写31748程序语言、代做programming编程设计、Java，Pyt 2021-05-10
- 代做cis 657编程、代写c/C++程序、C++编程调试帮做haskell 2021-05-10
- Com1005程序代写、代做java编程语言、代写java程序代做留学生pr 2021-05-10
- 代写sit283程序、代做c/C++，Python编程设计、Cs，Java程 2021-05-09
- C++程序代做、代写c++程序、代写program编程语言代做r语言编程|代 2021-05-09
- 代写0ccs0cse编程、代做r，Java，Python程序语言代写web开 2021-05-09
- Comp124编程语言代做、Java程序代做、代写program语言编程代写 2021-05-09
- Comp122编程语言代写、代做java程序语言、Java程序调试帮做has 2021-05-09
- 代做ele00041i 调试java Programming 2021-05-08
- 代做econ 2014-Assignment 1 Managerial... 2021-05-08
- 代写mast90044-Assignment 1 Thinking An... 2021-05-08
- 代做cs310-Assignment 2 Hash Tables 2021-05-08
- 代写5pm 调试java编程、Java编程代写 2021-05-08
- 代写cs544 Final Exam Preparation Guide... 2021-05-08
- 代做infs7450 Social Media Analytics 2021-05-08
- 代做iab201-Assignment 1 Information Mo... 2021-05-08
- 代写gu4265/Gr5265 Midterm Exam 2021-05-08
- 代做engn3213/6213 Finite State Machine... 2021-05-08