首页 > > 详细

辅导CIS667编程、辅导Python设计程序、Python编程语言讲解辅导Database|辅导Web开发

HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST
Instructions: Answer the following questions and submit your answers on Blackboard in a single pdf file
named written.pdf. It is fine to photograph hand-written answers or type answers in MS Word, as long
as you collate your answers and convert them into this single pdf file before submission. It is also fine
to use Python to perform intermediate computations, as long as you show code snippets to explain your
reasoning. Questions begin on the following page.
Fall 2020
HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST
1. Consider a neural network y = f(w, x) with the following form:
v = conv(w, x)
y = tanh X
m,n
vm,n!
where w, x, and v are 2D arrays, and y is a scalar. The v = conv(w, x) operation is defined as follows:
vm,n =
X
i,j
wi,jxm+i,n+j
The indices i, j range over the rows and columns of w. The result vm,n is only defined for indices
m, n where xm+i,n+j will not produce an array out-of-bounds error. In other words, the shape of v
will be smaller than the shape of x.
The network is given a dataset with two images:
• x
(0):
+1 +1 -1 -1 -1 -1
+1 +1 +1 +1 +1 +1
+1 +1 +1 +1 +1 +1
• x
(1):
-1 +1 -1 +1 +1 +1
+1 -1 -1 +1 +1 +1
-1 -1 -1 +1 +1 +1
Each image x
(n)
is fed through the network to produce a hidden activation v
(n) = conv(w, x(n)
) and
an output activation y
(n) = f(w, x(n)
). The target outputs are -1 for x
(0) and +1 for x
(1). Therefore
the total squared error is
e = (y
(0) − (−1))2 + (y
(1) − (+1))2
The filter weights w will be optimized using gradient descent to minimize e. Currently, the filter
weights are:
w =

+1 −1
−1 +1 
For each of the following, give your answers to four decimal places (e.g., “0.1234”):
• For each image (n ∈ {0, 1}), what are the numeric values of the hidden activation v
(n)
and
output activation y
(n)
? What is the numeric value of the total squared error e?
• What is the numeric value of the gradient ∇we?
• After one step of gradient descent, using a learning rate η = 1000, what are the new numeric
values of the filter weights w? What are the new values of each v
(n)
and y
(n)
? What is the new
total squared error?
Fall 2020
HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST
2. Figure 1 shows one particular state of a simple toy domain. Each position contains either a wall, a
mouse, or nothing. This state will be encoded in a numeric array so that it can be used as input to a
neural network.
• Consider a monotonic encoding in which 0 represents empty positions, 1 represents walls, and 2
represents mice. What is the monotonic encoding of the state in Figure 1? Your answer should
be a 4 × 8 array with one entry per grid position.
• Consider a one-hot encoding in which indices 0, 1, and 2 correspond to walls, empty positions,
and mice, respectively. What is the one-hot encoding of the state in Figure 1? Your answer
should show the three “pages” of a 3 × 4 × 8 array, where there is a one-hot vector for each grid
position, and the one-hot vectors extend along the first array dimension.
Figure 1: Mouse grid world. Black = wall, gray = mouse, white = empty.
Fall 2020
HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST
3. Consider an artificial neuron with two synaptic inputs (x, y) ∈ R
2
and scalar output z ∈ R:
z = f(x, y) = sign(w0x + w1y + b)
The neuron parameters are weights (w0, w1) ∈ R
2
and bias b ∈ R.
The neuron is provided a dataset with four examples of input (x
(n)
, y(n)
) and target output z
(n)
, where
n ∈ {0, 1, 2, 3}. The goal is to find values for w0, w1, and b, such that f(x
(n)
, y(n)
) = z
(n)
for all n.
For dataset 1 below, do such values for w0, w1, and b exist? Repeat the question for datasets 2 and 3.
• Dataset 1:
n 0 1 2 3
x
(n) +2 -1 +3 -1
y
(n) +2 -1 -3 +1
z
(n) +1 +1 +1 -1
• Dataset 2:
n 0 1 2 3
x
(n)
-3 -2 +3 +2
y
(n)
-3 +1 -1 +3
z
(n) +1 -1 +1 -1
• Dataset 3:
n 0 1 2 3
x
(n)
-2 +3 +2 -2
y
(n) +1 -1 +2 -2
z
(n)
-1 -1 +1 +1
Fall 2020
HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST
4. • Consider the following three arrays:
a
(0) = [+0, +1, −1, +0, +0]
a
(1) = [−3, −2, −4, −3, −3]
a
(2) = [+1, +2, +0, +1, +1]
What are the element-wise differences a
(1) − a
(0) and a
(2) − a
(0)? What are the numeric values
(give your answer to four decimal places) of softmax(a
(n)
) for each n ∈ {0, 1, 2}?
• Suppose you are using a neural network to fit a dataset. The target output values fall in the range
[-1, 4]. Which of the following activation functions fk is most appropriate for the output layer?
f1(a) = 3.0 · tanh(a) − 1.0
f2(a) = 9 · sigmoid(a) − 8
f3(a) = 4 · relu(a) − 1
Here,
tanh(x) = (e
2x − 1)/(e
2x + 1) is the standard hyperbolic tangent,
sigmoid(x) = 1/(1 + e
−x
) is the standard logistic sigmoid, and
relu(x) = max(x, 0) is the standard linear rectifier.
Fall 2020
HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST
5. Consider the Markov Decision Process (MDP) shown in Figure 2. The agent uses a discount factor of
γ =
1
2
.
• Determine the probabilities for the unlabeled edges. What is the probability transition array for
this MDP? Your answer should consist of two 7 × 7 matrices, one for each action. Use the
convention that row index i and column index j reflect transitions from state si to state sj .
• Suppose the agent starts in state s0 at time-step t = 0, and then performs the following sequence
of actions: [a0, a1, a0]. What is the expected net discounted reward after this sequence is complete?
More formally, what is the numeric value of E[
P3
t=0 γ
tR(t)
]? Give your answer as an
exact fraction or to four decimal places (e.g., “0.1234”).
• Using an initial policy estimate π
(0) = [a1, a1, a1, a1, a0, a1, a1]
>, perform the policy iteration
algorithm to find an optimal policy in this MDP. For each iteration k ∈ {0, 1, 2}, report the
numerical values (to four decimal places) of the following: the current policy estimate π
(k)
, the
current “collapsed” probability transition matrix Pπ(k) , and the current utility vector estimate
u
(k)
. Is π
(2) an optimal policy?
Hint: In policy iteration you need to solve the linear system of equations (I − γPπ(k) )u
(k) = r for
u
(k)
. numpy has a method that can do this:
https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html
Figure 2: A toy MDP with 7 states and 2 actions. Each circle represents a state si and is labeled with
that state’s reward ri
. Arrows represent transitions between states and outer arrows are labeled with their
transition probability. Transition probabilities for inner arrows are not shown. If there is no arrow between
two states, it means that the corresponding transition has probability 0. The left and right subplots shows
transition probabilities when action a0 and a1 are performed, respectively.
Fall 2020

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!