首页 > > 详细

EECS595: Natural Language Processing Homework 4

 EECS595: Natural Language Processing

Homework 4, Fall 2023
Due 10/30/2023
Student Name: xxx — uniqname: xxx
Submission Guidelines
1. Please insert your student information in line 63 of this LATEX file;
2. Please insert your answers between each pair of \begin{solution} and \end{solution};
3. Zip the files and submit to Canvas. Checklist: hw4.pdf.
Problem 1: Probabilistic Context Free Grammar
Your friend decides to build a Treebank. He finally produces a corpus which contains the following
three parse trees:
S
NP
John
VP
V1
said
SBAR
COMP
that
S
NP
Sally
VP
VP
V2
snored
ADVP
loudly
1
S
NP
Sally
VP
V1
declared
SBAR
COMP
that
S
NP
Bill
VP
VP
V2
ran
ADVP
quickly
S
NP
Fred
VP
V1
pronounced
SBAR
COMP
that
S
NP
Jeff
VP
VP
V2
swam
ADVP
elegantly
You then purchase the Treebank and decide to build a PCFG, and a parser, using your friend’s
data. Now answer the following three questions:
1. (Written) Show the PCFG that you would derive from this Treebank.
Solution:
2. (Written) Show two parse trees for the string “Jeff pronounced that Fred snored loudly”, and
calculate their probabilities under the PCFG.
Solution:
3. (Written) You are surprised that “Jeff pronounced that Fred snored loudly” has two possible
Page 2
parses, and that one of them - that Jeff is doing the pronouncing loudly - has relatively high
probability. This type of high attachment is never seen in the corpus, so the PCFG is clearly
missing something. You decide to fix the Treebank, by altering some non-terminal labels in
the corpus. Show one such transformation which results in a PCFG that gives zero probability
to parse trees with high attachments. (Your solution should systematically refine some non￾terminals in the Treebank, in a way that slightly increases the number of non-terminals in the
grammar, but allows the grammar to capture the distinction between high and low attachment
to VPs.)
Solution:
Problem 2: Dependency Parsing
This exercise is to get you familiar with dependency parsing and the Stanford CoreNLP [1] toolkit.
You may also need to consult the inventory of universal dependency relations. You have two options
to complete this exercise.
• Install the toolkit. Please check Stanza and follow the instructions to install the toolkit. You
may need to use the toolkit for your final project.
• Run the demo system. You can also use the demo system without installing the toolkit.
You should experiment with different sentences and paragraphs to get some feeling about how the
parser works. In particular, you need to run the following paragraph and answer some questions.
The unveiling event for the innovative ChatGPT was shared online yesterday. This
event, powered by the potent GPT-4, was projected for next month but was expedited
after AI enthusiasts showed an enormous interest. All individuals now have the chance
to explore its advanced capabilities. The AI community, though already familiar with
preceding models, is buzzing with discussions and analyses. OpenAI confirmed that
GPT-3.5/GPT-4 was the driving force behind ChatGPT, leading to its accelerated launch
and widespread acclaim.
Please answer the following questions:
1. (Written) Give three examples where the parsed results are incorrect.
Solution:
2. (Written) What would be the correct relation for each of these examples you identified above?
Consult the university dependency documentation of relations to answer this question.
Solution:
Page 3
3. (Written) What is your general impression on the parsed results? Does the length of the sentence
affect the performance?
Solution:
References
[1] Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014,
June). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd
annual meeting of the association for computational linguistics: system demonstrations (pp.
55-60).
Page 4
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!