辅导 HIM3002、讲解 Python编程语言
Tung Wah College
HIM3002: Computer Programming for Healthcare
Individual assignment: Finding Patterns in Sequence
Background
In Lecture 6, we learned the programming techniques in finding patterns in biological sequence,
specifically finding fixed pattern and flexible pattern. In this tutorial exercise, you will have
further practices in finding patterns in sequences. Name your file “T06_PatternAnalyser.py”.
Task 1
Write a function that takes a sequence and an integer k as inputs. It returns True if the input
sequence has repeated sub-sequence of size k, and False otherwise. Your function should be
similar to that below.
def repeated_subseq(seq, k):
"""Return True if the sequence seq has repeated sub-sequence with size
k and False otherwise"""
# To be completed
print(repeated_subseq("ACGTAGAGGCGTATTAGCGT", 3))
print(repeated_subseq("ACGTAGAGGCGTATTAGCGT", 5))
The output is
True
False
Task 2
By making use of the function “re_all_match”, find the following patterns in the sequence.
a) A DNA pattern with four symbols, with “A” and “T” at the first and the last symbol,
e.g., “AGGT”, “ACTT”.
b) A DNA pattern with at least two symbols, with “A” and “T” at the first and the last
symbol, e.g., “AT”, “ACT”, “AGGGT”.
c) A DNA pattern with at least three symbols, with symbols “A” and “T” at the beginning
and at the end, and any symbols except “C” in between, e.g., “AGAT”, “AAGT”,
“AATGT” but not “ACGT”, assuming only “A”, “G”, “C” and “T” are in the sequence.
d) A protein pattern with 10 and 15 symbols, with “M” at the beginning and “D” at the
end.
For example,
a) Sequence: AGGTAGTTTGACGTTACTG
Found pattern: AGGT located at 0
Found pattern: AGTT located at 4
Found pattern: ACGT located at 10
b) Sequence: AGGTGCAAGTGACGAACAAG
Found pattern: AGGTGCAAGT located at 0
Found pattern: AAGT located at 6
Found pattern: AGT located at 7
c) Sequence: AGGTGCAAGTGACGAACAAG
Found pattern: AGGT located at 0
Found pattern: AAGT located at 6
Found pattern: AGT located at 7
d) Sequence:
CDEMECMEDDFEMECMEDDFEMECMEDDFEMECMEDDFEGHIEJMCEE
Found pattern: MECMEDDFEMECMED located at 3
Found pattern: MEDDFEMECMEDD located at 6
Found pattern: MECMEDDFEMECMED located at 12
Found pattern: MEDDFEMECMEDD located at 15
Found pattern: MECMEDDFEMECMED located at 21
Found pattern: MEDDFEMECMEDD located at 24
2