THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY
Department of Computer Science and Engineering
MSBD5008: Introduction to Social Computing
Fall 2020 Assignment 1
IMPORTANT NOTES
Late submission: 25 marks will be deducted for every 24 hours after the deadline.
ZERO-Tolerance on Plagiarism: All involved parties will get zero mark.
NetworkX
In this question, you are required to use NetworkX to do basic data analysis on a Wikipedia vote network dataset. It
contains 7,115 nodes and 103,689 (directed) edges. The dataset can be downloaded from http://snap.stanford.
edu/data/wiki-Vote.html.
2. Output the following information related to degree:
average degree, average in-degree, average out-degree;
degree distribution (plot both the degree and frequency in log scale);
density (E/N2
), where E is the number of edges, and N is the number of nodes;
3. Find the largest strongly connected component (giant component), and output the number of nodes in it;
distribution of path length
average path length;
distribution of clustering coefficient;
average clustering coefficient.
5. Treat the network as undirected. Output the following information related to degree:
average degree;
degree distribution (plot both the degree and frequency in log scale);
density (E/N2
).
1
Deep Graph Library (DGL)
In this question, you are required to use DGL to build a graph neural network for node classification. The dataset
view?usp=sharing
1. Load the dataset with the following command:
This file contains a dictionary object with the following information of a directed graph:
nodes: a list containing the id’s of all the nodes in the graph;
labels: a list containing the label of each node;
num classes: the total number of node labels;
features: a matrix of size: number-of-nodes × feature-dimensionality;
source nodes: a list containing the source node-id of each (directed) edge;
target nodes: a list containing the target node-id of each (directed) edge;
train mask: a list (of values “True” or “False”) indicating whether each node is used in the training set or not;
val mask: This has the same format as train mask, and shows whether each node is used in the validation set
or not.
2. You have to use the graph neural network model dgl.nn.pytorch.conv.GINConv in DGL. It implements the following
neighborhood aggregation:
This model includes the graph neural network model discussed in class, but is more general. For details, read
https://docs.dgl.ai/api/python/nn.pytorch.html#dgl.nn.pytorch.conv.GINConv.
classification accuracy on a test set (which is hidden from you). We will use the following code to test your model.
Your code should include a test function (with your model and a mask as inputs) so that we do not need to retrain
print("Testing Acc {:.4}".format(accuracy))
Please also use the following functions
def save_checkpoint(checkpoint_path, model):
# state_dict: a Python dictionary object that:
# - for a model, maps each layer to its parameter tensor;
state = {’state_dict’: model.state_dict()}
torch.save(state, checkpoint_path)
print(’model saved to %s’ % checkpoint_path)
save_checkpoint("best_model.pth", model)
print(’model loaded from %s’ % checkpoint_path)
Submission Guidelines
Please submit two Python notebooks (A1.ipynb and A2.ipynb) and a report (report.pdf) for your results and conclusions.
The submitted folder should be Zip all the files into A1 awangab 12345678 (replace awangab with your ust
Note that the assignment should be clearly legible, otherwise you may lose some points if the assignment is difficult 