讲解 Homework 2讲解留学生SQL 程序

Homework 2

This homework is worth a total of 100 points.

Problem 1 is optional and will not be included in your final score. However, it is highly recommended that you complete this section.

Problem 2 - "Short Questions" is mandatory and will be graded.

Problem 1

The TensorFlow playground is a handy neural network simulator built by the TensorFlow team. In this exercise, you will train several binary classifiers injust a few clicks, and tweak the model’s

architecture and its hyperparameters to gain some intuition on how neural networks work and what their hyperparameters do. Take some time to explore the following:

1. The patterns learned by a neural net. Try training the default neural network by clicking the Run button (top left). Notice how it quickly finds a good solution for the classification task. The neurons in the first hidden layer have learned simple patterns, while the neurons in the second hidden layer have learned to combine the simple patterns of the first hidden layer into more complex patterns. In general, the more layers there are, the more complex the patterns can be.

2. Activation functions. Try replacing the tanh activation function with a ReLU activation function, and train the network again. Notice that it finds a solution even faster, but this time the boundaries are linear. This is due to the shape of the ReLU function.

3. The risk of local minima. Modify the network architecture to have just one hidden layer with three neurons. Train it multiple times (to reset the network weights, click the Reset button next to the Play button). Notice that the training time varies a lot, and sometimes it even gets stuck in a local minimum.

4. What happens when neural nets are too small. Remove one neuron to keep just two. Notice that the neural network is now incapable of finding a good solution, even if you try multiple times. The model has too few parameters and systematically underfits the training set.

5. What happens when neural nets are large enough. Set the number of neurons to eight, and train the network several times. Notice that it is now consistently fast and never gets stuck.

This highlights an important finding in neural network theory: large neural networks rarely get stuck in local minima, and even when they do these local optima are often almost as good as the global optimum. However, they can still get stuck on long plateaus for a long time.

6. The risk of vanishing gradients in deep networks. Select the spiral dataset (the bottom-right dataset under “DATA”), and change the network architecture to have four hidden layers with eight neurons each. Notice that training takes much longer and often gets stuck on plateaus for long periods of time. Also notice that the neurons in the highest layers (on the right) tend to evolve faster than the neurons in the lowest layers (on the left). This problem, called the vanishing gradients problem, can be alleviated with better weight initialization and other techniques, better optimizers (such as AdaGrad or Adam), or batch normalization.

7. Go further. Take an hour or so to play around with other parameters and get a feel for what they do, to build an intuitive understanding about neural networks.

Problem 2

Each question is worth 10 points.

For Problems 3 and 6:

. Please submit your work by providing the following files:

1. The Jupyter Notebook/Lab file in .ipynb format.

2. The converted .html version of the Jupyter Notebook/Lab.

. Ensure that your notebook includes:

。 Clear comments and documentation explaining each step of your process.

。 A detailed description of both the process and the results to demonstrate your understanding.

1. What is the fundamental idea behind support vector machines?

2. Why is it important to scale the inputs when using SVMs?

3. Train an SVM classifier on the wine dataset, which you can load using

sklearn.datasets.load_wine() . This dataset contains the chemical analyses of 178 wine samples produced by 3 different cultivators: the goal is to train a classification model capable of predicting the cultivator based on the wine’s chemical analysis. Since SVM classifiers are binary classifiers, you will need to use one-versus-all to classify all three classes. What accuracy can you reach?

4. What is the approximate depth of a decision tree trained (without restrictions) on a training set with one million instances?

5. If a decision tree is underfitting the training set, is it a good idea to try scaling the input features?

6. Train and fine-tune a decision tree for the moons dataset by following these steps:

a. Use make_moons(n_samples=10000, noise=0.4) to generate a moons dataset.

b. Use train_test_split() to split the dataset into a training set and a test set.

c. Use grid search with cross-validation (with the help of the GridSearchCV class) to find good hyperparameter values for a DecisionTreeClassifier . Hint: try various values for max_leaf_nodes .

d. Train it on the full training set using these hyperparameters, and measure your model’s performance on the test set. You should get `roughly 85% to 87% accuracy.

7. Why was the sigmoid activation function a key ingredient in training the first MLPs?

8. Name three popular activation functions. Can you draw them?

9. Suppose you have an MLP composed of one input layer with 10 passthrough neurons,

followed by one hidden layer with 50 artificial neurons, and finally one output layer with 3 artificial neurons. All artificial neurons use the ReLU activation function.

1. What is the shape of the input matrix X?

2. What are the shapes of the hidden layer’s weight matrix Wh and bias vector bh?

3. What are the shapes of the output layer’s weight matrix Wo and bias vector bo?

4. What is the shape of the network’s output matrix Y?

5. Write the equation that computes the network’s output matrix Y as a function of X, Wh, bh, Wo, and bo.

10. How many neurons do you need in the output layer if you want to classify email into spam or ham? What activation function should you use in the output layer? If instead you want to tackle MNIST, how many neurons do you need in the output layer, and which activation function should you use? What about for getting your network to predict housing prices?

联系我们