#
讲解 COMPSCI 369、辅导 Java/Python语言编程

THE UNIVERSITY OF AUCKLAND

FIRST SEMESTER, 2023

COMPUTER SCIENCE

Computational Methods in Interdisciplinary Science

NOTE: This is a restricted book exam. You are allowed a single sheet of A4 paper with notes written

on it.

This exam has 16 questions, and it is worth 120 marks in total.

There are 4 sections.

Section A consists 4 short answer questions worth 30 marks in total.

Section B consists 5 short answer questions worth 20 marks in total.

Section C consists 4 short answer questions worth 32 marks in total.

Section D consists 3 short answer questions worth 38 marks in total.

Answer all questions

The exam is worth 55% of the ﬁnal grade

Page 1 of 7COMPSCI 369

Section A: Computational Biology, Numerical Integration &

Game Theory

Computational Game Theory

1. In lectures we discussed David Chess’s paper ‘Simulating the evolution of behavior: the iterated

prisoners’ dilemma problem’. In this paper, Chess reported on four phases in his model: “The Era

of Exploitation,” “The Nadir,” “The Growth of Trust,” and “Equilibrium.”

(a) Describe each of the four phases and their relation to each other. [4 marks]

(b) Explain two reasons why it was necessary to use computational methods to study this model.

[3 marks]

Modelling Dynamical Systems

2. The following equation speciﬁes a discrete-time dynamical system. In this equation, α is a parameter.

xt+1

= α min(xt, 1 − xt)

(a) When α < 1, there is a single ﬁxed point. What is it? [1 mark]

(b) When α = 1, there are an inﬁnite number of ﬁxed points. What are they? [2 marks]

(c) What would be appropriate to use as labels for each axis of a bifurcation diagram of this

system? [2 marks]

(d) Write pseudocode for generating a bifurcation diagram for this system. [10 marks]

3. Brieﬂy describe the Euler and Runge-Kutta methods for numerical integration and explain the

relationship between them. [4 marks]

4. Identify a situation where Euler integration would be perfectly accurate and explain why this is the

case. [4 marks]

Page 2 of 7COMPSCI 369

Section B: Sequence Alignment

5. The partially completed F matrix for calculating the local alignment of the sequences GCT and

TAACT is given below. The score matrix is given by s(a, b) = −2 when a 6= b and s(a, a) = 4.

The linear gap penalty is d = −3.

T C C A T

0 0 0 0 0 0

G 0 0 0 0 0 0

C 0 0 4 4 1 u

T 0 4 1 v w x

(a) Complete the matrix by ﬁnding values for u, v, w and x and showing traceback pointers.

[4 marks]

(b) Give the score for the best local alignment of these two sequences and provide an alignment

that has this score. [3 marks]

6. What is the biological motivation for using an afﬁne rather than a linear gap penalty? [2 marks]

7. Computationally, how can one efﬁciently perform alignment with an afﬁne gap penalty and what

is the computational cost of doing so when compared to a linear gap? Use asymptotic notation as

part of your answer. [4 marks]

8. Describe the main barrier to ﬁnding an exact solution to the multiple alignment problem. Use

asymptotic notation as part of your answer. [2 marks]

9. Describe the main steps of the heuristic algorithm we discussed in lectures for solving the multiple

alignment problem, including the use of neutral characters. (You do not need to give precise

formulae for how the distances are calculated.) [5 marks]

Page 3 of 7COMPSCI 369

Section C: Simulation and HMMs

10. What does it mean for a sequence of random variables X0, X1, X2, . . . to have the Markov property?

Express your answer in plain English and in mathematical notation. [2 marks]

11. You are given a method choice(x,prob), where the arrays x and prob are of equal length,

and the sum of the elements of prob is 1. choice(x,prob) returns x[i] with probability

prob[i].

Write a pseudo-code method simHMM(a,e,L,s) that takes as input a transition matrix a, an

emission matrix e, a length L and a start state s. It should return state and symbol sequences of

length L with the state sequence starting in state s. Use integers corresponding to array indices to

represent states and emissions. [6 marks]

12. Given the method choice(x,prob) as deﬁned in Question 11, write a pseudo-code method

randwalk(k) that simulates a random walk of length k starting at 0 where steps of -1 and +1

are equally likely. Assume the argument k is a positive integer. Your method should return an

array of length k where walk[i] is the position of the random walk after i steps. Show how you

can use this method to estimate the probability that the position of a random walker after 50 steps

is more than 10 steps from its starting point. [5 marks]

Page 4 of 7COMPSCI 369

13. Consider an HMM with states A, B, C each of which emit symbols Q, R, S, T. The transitions are

given by the following table which has omitted the transition probabilities into state C.

The model starts in state A 60% of the time, state C 40% of the time and never in state B.

The emission probabilities for the model are given by the following table.

Q R S T

A 0.4 0.2 0.15 0.15

B 0.2 0.6 0.1 0.1

C 0.05 0.2 0.2 0.55

(a) Write down the values of the missing elements in the transition matrix. [2 marks]

(b) Sketch a diagram of the HMM, showing all states, possible transitions and transition probabilities.

Include the begin state but no end state. Do not include emission probabilities in the

diagram. [3 marks]

(c) Explain why the length of a run of Bs in a state sequence follows a geometric distribution and

give the length of an average run of Bs. [3 marks]

(d) What is the joint probability P(x, π) of the state sequence π = ABB and the symbol sequence

x = QTR? Leave your answer as a product or sum of numbers. [3 marks]

(e) Complete the entries i, j and k in the forward matrix below using the recursion fk(i + 1) =

ek(xi+1)

P

l

alkfl(xi). Remember to show your working.

0 Q T

0 1 0 0

A 0 0.24 k

B 0 i

C 0 j

[5 marks]

(f) The forward algorithm is used to calculate P(x). When π = ABB and x =QRR, is P(x)

greater than, less than, or equal to P(x, π)? Justify your answer. [3 marks]

Page 5 of 7COMPSCI 369

Section D: Trees

14. Let the symmetric matrix

specify the pairwise distances, Dij , between the four sequences x1, . . . , x4.

(a) Construct a UPGMA tree from D showing your working. [5 marks]

(b) Will UPGMA or neighbour-joining (or both or neither) reconstruct the correct tree in this

case? Explain your answer. [2 marks]

(c) Describe when you would use neighbour-joining and when you would use UPGMA. [3 marks]

15. Consider the four aligned sequences, W,X,Y, and Z:

12345

W: CCGTT

X: GCAAT

Y: CCATT

Z: GAGAT

(a) Explain what parsimony informative means, and identify the parsimony informative sites in

the alignment. [2 marks]

(b) By calculating the parsimony score for each possible tree topology for these four taxa, ﬁnd

the maximum parsimony tree. [5 marks]

(c) Demonstrate (for example, on a single branch in a one of your trees) how ancestral reconstructions

can be used to estimate branch length on the maximum parsimony tree. [4 marks]

(d) Describe two signiﬁcant drawbacks of the parsimony method. [3 marks]

Page 6 of 7COMPSCI 369

16. (a) Why do we rely on heuristic methods to ﬁnd a maximum likelihood tree? Describe one such

heuristic and explain whether this heuristic will typically ﬁnd the tree that maximises the

likelihood. [4 marks]

(b) Given mutation rate parameter µ and normalised rate matrix Q, how do you calculate the

probability that a C mutates to a T along a lineage of length t = 3? (Recall we denote, for

example, the (A, A)th entry of a matrix B by BAA.) [3 marks]

(c) Let X and Y be sequences of length L. How can you use the calculation in part (b) to

calculate the probability that X mutates into Y over a lineage of length t = 3? Explain any

assumptions you are making. [2 marks]

(d) In order to efﬁciently calculate the likelihood of the tree, what assumption do we make about

the mutation process on different lineages? [2 marks]

(e) In parsimony and distance based methods, sites that are constant across all sequences are

not informative about the tree. Explain whether or not the same applies to likelihood based

methods. [3 marks]

Page 7 of 7