DNSSecurity,zone file。
Purpose
This homework will teach you to explore DNS-related concepts, including zone files and their attributes, record types, domain names and associated security notions, etc. In each question you are asked to write code to provide a summery or do a certain computation, and to report the outcome back in a report. The output format of your code is not important (unless otherwise specified; e.g., last question), and will not be graded. Use your own judgment for structuring your code (e.g., functions).
Your code should take those files as two separate parameters (in order) on the command line.
Read more about zone files and their format here:
https://en.wikipedia.org/wiki/Zone_file
In this homework, you will use the Jaccard index (distance), defined here:
https://en.wikipedia.org/wiki/Jaccard_index
Zone file analysis
Write a code in your favorite programming language (among the three above) to provide a summery of the two zone files above. Include the answers to each question in your report. The summary should include the following information:
-
a. counts of the records in each file (that’s, the count of unique lines)
-
b. for each column (5 columns), a count of unique values (names, ttl, class, etc)
-
c. A list of the different record classes (sorted by name) with a count of the different names associated with the record type.
-
d. A list of the different record types (sorted by name) with a count of the different names associated with each record type.
-
e. A list of the different ttl values (sorted by value) with a count of the different names associated with each ttl value.
-
f. A list of the different record data values (sorted by record), and the associated name records with each of them.
Question
Based on the answers of the above question, do the following (you may develop your own code for answering the question, but the answer should go into the report).
-
a. Comment on the difference in the distribution (based on the answer above) between the root zone and the net zone files (by comparing the results of e to f). Your answer should include a comment on the size of the zone, the types of records, the different values of ttl, etc. Your comment should outline the intrinsic differences between the two zones (not only by size, but also by the diversity of the records and values in each of them).
-
b. (For each file) For each ttl value, provides a comment on the rationale of that value.
-
c. (For each file) For each record class value, provide a one-line definition (use external resources, but write in your own words).
-
d. (For each file) For each record type value, provide a one-line definition (use external resources, but write in your own words).
-
e. (For each file) for each record data, where a service provider could be used, list the most popular service providers (e.g., managed dns providers as inferred from common name servers). Why we cannot see a lot of different providers in the root, as opposed to .net zone file?
-
f. From your analysis and further investigation, list all security-related record types. Comment on the representation of those types in both zone files.
Answer the following questions
-
a. In domain name security, similarity relationships are established among domain names using forms of distance to group domains, and tell whether they are malicious or benign. For example, the jaccard index (see above), is often used to tell if two domain names are related. Write a code that uses the jaccard index to calculate a similarity score between each pair of domains in the first 10,000 names in the .net zone file. Draw (using a histogram) the distribution of the jaccard index values. The x axis should be the jaccard index values, and the y-axis should be number of pairs that have the given value of the x-axis. Include the diagram in your report and the code in your .zip file. DON’T PROVIDE THE OUTPUT OF THE EXECUTION OF YOUR CODE.
-
b. Using the same notion of entropy you used in homework 4, write a code that calculates the entropy of each domain name in the .net zone file. Your code should output the domain names as he first column, entropy as the second column (separated by a tab) and should be sorted by the entropy values (highest to lowest). For probability calculations, use the entire list of domains in the zone file.