CS11 HW8: Lineage
In this assignment, you will augment your HW6 code (or start from code we provide you) to handle some
new queries in a recursive manner! Specifically, you have the option of starting from some code we provide
to you or augmenting your mutations.cpp code from HW6 (and HW7 if you completed the extra credit).
The second option is much more challenging than starting with the code we provide!
Regardless, run the command
pull-code11 hw08
to retrieve new test files and a starter lineage.cpp file.
Next, pick one of these options of how to proceed on this assignment:
1. Use lineage.cpp as provided as your starter code. We believe this option will make things easier for
you on this assignment (it already contains all the File I/O and struct design you need).
2. Use your mutations.cpp from the previous assignment(s) as your starter code. This option will add
an additional layer of challenge to this assignment. If you go this route, just make a copy of your
mutations.cpp file, put it in your hw08 directory, and rename it lineage.cpp (which will overwrite
our lineage.cpp). Make sure you can then compile this new lineage.cpp and run it on the old test
inputs from HW6 to double-check that you copied over your code correctly.
After selecting one of the above options, read the rest of this assignment to get the big picture of what we’re
asking of you. Then, complete the parts of this assignment in the order they’re presented here (starting with
Programming Part 1 of 2). This assignment will go much better for you if you follow our suggested
ordering here! We also suggest that you complete Programming Part 1 by Friday evening (we’ve already
given you an implementation plan for it below) or earlier to ensure you’ve given yourself enough time for
the rest of the assignment.
Learning Objectives
The purpose of this assignment is to:
• Practice thinking recursively.
• Practice designing recursive solutions to problems.
• Practice writing, testing, and debugging recursive functions.
Introduction
NASA is thrilled with the work you’ve done so far on the Mars Initiative (great job!), and would like you
to expand the scope of your research. Your previous mutations.cpp program could only answer queries
about a single mutation step. Now, we’re interested in how different strands of DNA may be connected over
multiple mutations. Let’s kick it up a notch and flex our new recursive problem solving muscles!
Your job is to augment your lineage.cpp program to respond to four new queries outlined below. Each
of these new queries corresponds to a new problem that your program should solve, and each of these new
1
problems must be solved recursively, i.e., with a function that calls itself. We have given you the recursive
designs for the first two queries; you must come up with the other two yourself!
The Data File Format
Your program will again answer queries pertaining to a single data file. The format will be exactly the same
as the data files for HW6, with one change: each gene will either have 0 or 1 mutations.
For those who took option (1) at the start of this assignment: your lineage.cpp program already has the
structs and file reading code implemented to read in and store the contents of these data files. It would
behoove you to understand that starter code now before proceeding to the next section!
For those who took option (2) at the start of this assignment: You may reuse the sampleA.data and
sampleB.data files from HW6 for additional testing if you like, but only the first mutation for each gene
(if it exists) will be used in this assignment. Furthermore, we will only be running tests using the new
sampleC.data and sampleD.data files provided with this assignment’s starter code. You may keep your file
reading code and struct design the same as for HW6 or modify it so each gene only gets associated with its
first mutation listed in the data file.
What We Provide
We have provided you with a demo program, lineage demo, that has all of the functionality described below.
We have also provided you with pared down versions of the two data files from HW6, sampleC.data and
sampleD.data, and four test input files (one for each of the new queries): query1 test.in, query2 test.in,
query3 test.in, and query4 test.in. For this assignment, you may assume that the mutation data files
adhere to the format described above, and that the user will always enter valid, well-formatted input. As
always, you should create additional files to help you test your program and ensure that it diffs properly.
Programming (Part 1 of 2)
Point breakdown: 5 for compiling, 5 for passing valgrind, 20 for passing the provided tests for these two
queries, 15 for passing our additional tests for these two queries
In this first part of the assignment, you are given two new queries to support (listed below). Each query
must be solved with a recursive function; if the function supporting the query contains any
loops, you will not receive credit. We have provided the designs for these recursive solutions here; your
job is to translate the designs into C++ recursive functions and test those functions sufficiently.
Evolution
We say that a gene can evolve into another gene if we can get from the first to the second via one or more
mutations. The user can perform this query by entering the command e followed by two arguments: the
source gene and the target gene. If the source can evolve into the target, your program must print:
[Source] can evolve into [Target]
2
Otherwise, your program must print:
[Source] cannot evolve into [Target]
How can you implement this query using a recursive function? First, as with any new function being designed,
you need to decide what the input and output should be. Here is a suggestion:
Input: the source gene (src) and the target gene (tgt). We assume these are represented with pointers
into an array of genes, and that all genes have a boolean member named “seen” that is set to false before
this function is called for the first time. (The “seen” member will help us terminate the function if we get
stuck in a cycle of mutations.)
Output: a boolean indicating whether src can evolve into tgt.
If the function is given a src and tgt and returns true, then we know that the source can evolve into the
target and thus the first message should be printed out. Otherwise, the evolution cannot occur, and we
should print out the second message. Make sure you understand this input/output relation and
how it helps us solve this query before moving on.
Since we’re trying to solve this recursively, we need to think about how evolution (seeing if we can get from
one gene to another over one or more mutations) can be described recursively. Specifically, we want to try to
describe the problem in terms of a smaller version of the same problem. The key insight is that, the question
“can src evolve into tgt?” can be equivalently asked as “can src mutate into some other gene, we’ll call
it new src, such that new src can evolve into tgt?” Note that we’ve now defined evolution in terms of a
simpler operation (one mutation) joined with a “smaller” version of the same problem. There’s our recursive
case!
Now we just need to come up with any base cases so that the recursion will stop at some point. It turns out
there are three base cases for this problem; the following psuedocode captures those base cases, as well as
the recursive case:
Pseudocode:
Base Cases:
• If src cannot mutate into another gene, it definitely cannot evolve into tgt.
• If src can mutate into tgt, then it can evolve into tgt in one step!
• If src was already “seen”, then tgt is not reachable from src via mutations. (See the next few steps
to understand why.)
Recursive Case:
• In all other cases, since we have just “seen” src, we need to remember that fact.
Then, let new src be the gene src can mutate into.
Finally, determine if new src is able to evolve into tgt.
First, follow these psuedocode instructions yourself on a few examples drawn out on paper. Once you have
followed them and see how they lead to the correct answer, translate this pseudocode into a C++ function
(each line of pseudocode corresponds to 1-2 lines of C++ code). Then test the function to make sure it
works before even thinking of moving onto the next one!
Evolution Steps
The user can perform this query by entering the command es followed by two arguments: the source gene
and the target gene. If the source can evolve into the target and it takes n mutations to do so, your program
must print:
3
It will take [n] evolutionary steps to get from [Source] to [Target]
Otherwise, your program must print:
It will take -1 evolutionary steps to get from [Source] to [Target]
The recursive solution for this problem is very similar to the solution for the previous query. The input is
identical, but the output should be an integer: -1 if src cannot evolve into tgt; n if src can evolve into tgt
in n mutations.
Here’s the pseudocode for this query’s recursive solution (it has the same assumptions as the previous query’s
pseudocode). To translate this one into a C++ function, you need to ask yourself: what should your function
return in each case?
Pseudocode:
Base Cases:
• If src cannot mutate into another gene, it definitely cannot evolve into tgt.
• If src can mutate into tgt, then it can evolve into tgt in one step!
• If src was already “seen”, then tgt is not reachable from src via mutations.
Recursive Case:
• In all other cases, since we have just “seen” src, we need to remember that fact.
Then, let new src be the gene src can mutate into.
Next, let n be the number of evolution steps to get from new src to tgt.
If n is not -1, set n to n + 1.
Return n.
Written
Now that you’ve seen a few recursive designs for gene queries (and implemented them with recursive functions), it’s time for you to practice making these designs yourself!
For the written portion of this assignment, you must describe your recursive design for solving queries 3 and
4 below. This should be done after you have coded and tested the recursive functions for Programming Part
1, but before you’ve written any code for Programming Part 2. The TAs will only help you with your
code for Programming Part 2 if you’ve both finished writing and testing the functions for Part
1 and you’ve come up with a recursive design for the new query you’re coding up.
More specifically, read about the queries in the next Programming section (Part 2), then come up with a
recursive design for each. You should list the following information about your recursive designs:
1. The data your recursive solution will take as input, and how that data is represented.
2. The output your recursive solution will produce.
3. The base case(s) of your solution.
4. The recursive case(s) of your solution.
5. An example input for each base case and each recursive case, and what your solution would produce
as output for each of those inputs.
Be sure to answer all five questions for both of the functions you’ll implement in the section below. Your
answers must be typed, saved as written.pdf, and uploaded to Gradescope.
4
Programming (Part 2 of 2)
Point breakdown: 20 for passing the provided tests for these two queries, 15 for passing our additional tests
for these two queries
Here are two more queries (query 3 and query 4) to solve with recursive functions. After reading what
they do, complete the Written Part 1 section above before writing any code. Then, you will use
those recursive designs to help guide your work! Again, you may not use any loops in the functions you
implement for these queries.
Energetic Evolution
The user can perform this query by entering the command ene followed by three arguments: the source
gene, the target gene, and an amount of energy. If the source can evolve into the target using at most the
given amount of energy, your program must print:
[Source] can evolve into [Target] with at most [ENERGY] evolutionary cost
Otherwise, your program must print:
[Source] cannot evolve into [Target] with at most [ENERGY] evolutionary cost
Look back at the recursive designs for the first two queries: how is this query different? What can you reuse
from those designs? What needs to change?
Evolutionary Path
The user can perform this query by entering the command path followed by two arguments: the source
gene and the target gene. If the source can evolve into the target, your program must print the sequence of
mutations:
[Source] -> [Step1] -> [Step2] -> [Step3] -> [Target]
(Note: the number of steps will vary depending on the number of mutations)
For example, if the [Source] is GATC, the [Target] is GCTA, and there are 2 steps in between:
GATC -> GTTC -> GTTA -> GCTA
Otherwise, your program must print:
There is no path from [Source] to [Target]
Like the other queries, you may not use any loops. You also should not call the recursive function that you
wrote for any of the previous queries. (You don’t need to, and you’re probably making an inefficient solution
if you do.)
5
Requirements and expectations for your solutions
Here are the requirements that your code must follow. Violating any of these requirements will result in a
score of 0 points for a functional correctness portion of this assignment. Make sure that you check your work
against these requirements well before the submission deadline!
• Your program must be valid C++ code and compile successfully. Specifically, running the following
command on the Halligan servers must produce no error messages:
g++ -o lineage -Wall -Wextra lineage.cpp
• Your solution may not include any external packages other than , , ,
and .
• You should only use exit() to quit the program if a file name wasn’t provided on the command line,
or if something goes wrong when you try to open the file named. You may not use exit() in any
other case, like for the quit query.
• Make sure you can run and diff your program using the exact commands shown above. Too often
students assume that their program should take in input in a way that differs from these commands,
leading to many tests failing; it is your responsibility to make sure you’re following our directions!
Here are our expectations for your code. Ignoring any of these expectations may result in an unsatisfactory
grade for either a functional correctness portion or a style portion of the assignment. Feel free to ask a
member of the course staff to look at your work if you’re not sure whether it meets an expectation here.
• Your lineage.cpp program must implement each query with a recursive function. No loops may
appear in any of these functions or any functions called by the recursive functions.
• All functions you write (except main()) must be preceded by a function contract.
• No function (including main()) should be larger than 30 lines (a few lines longer isn’t a huge deal).
• You should use diff to test your programs, and only feel confident about passing those tests if diff
does not produce any output whatsoever.
• Your programs should begin with a header comment with your name, the name of the program file,
the date you started working on it, and a succinct description of the program.
• Follow the typical line length, indentation, naming, and commenting conventions covered in the course
style guide.
How to organize and submit your homework
Your submission for this assignment will comprise these files:
• A PDF file written.pdf containing your recursive designs for queries 3 and 4.
• A file lineage.cpp containing your solution to the programming problem.
• A file README identifying yourself, anyone with whom you have collaborated or discussed the assignment,
and any other information you wish to pass on to us.
When you are ready to submit, you will do so in two parts. First, submit your written.pdf file to Gradescope. Second, submit the lineage.cpp and README files by running this command on the Halligan servers
in your hw08 directory:
submit11-hw08
6
Do submit early, then keep submitting until your work is complete; we always grade the last submission
received before the late token deadline.