首页 >
> 详细

代写实现Adaboost分类器。

In this assignment, your task is to train an AdaBoost classifier on synthetic data. For reference, you are provided with the posterior `P(y = 1 | x)`

, with x regularly sampled over the domain `X = [0, 1] × [0, 1]`

, so that you can see, in the end, how the output of the AdaBoost classifier better approximates the posterior at each round.

Please read the assignment entirely before you start coding, in order to get a sense of how it is organized. In particular, note that the AdaBoost algorithm is only run at the very last cell of the “Train the classifier” section. Before that, a number of functions are defined, one of which you need to complete.

Fill in the missing parts to implement the Adaboost algorithm described in class (slide 64 of the course). This involves iterating over the following steps:

- a. Find the best weak learner h at each round.
- b. Using the weak learner’s weighted error e, compute t.
- c. Update the weight distribution D of the training samples.

Modify your loop to compute the loss at each round. Then, plot E and make sure that it is monotonically decreasing with time. Verify that E provides an upper bound for the number of errors.

First show the approximate posterior of your strong learner side-by-side with the original posterior. Then, show the approximate posteriors for each step at which the learner’s response has been saved. Make sure that they look increasingly similar to the original posterior.

- The response of a weak learner h for the sample x is
`h(x) ∈ {−1, 1}`

. - At each round we find the best weak learner. The overall response of the strong learner at round t for the sample x. In order to be coherent with the weak learner’s expression, we can also define
`H(x) = sign(f(x)) ∈ {1, 1}`

, which can also be called the overall response. However, in this assignment, we are only interested in f .

1 |
import matplotlib.pyplot as plt |

1 |
features, labels, posterior = construct_data(500, 'train', 'nonlinear', plusminus=True) |

The weak learner we use for this classification problem is a decision stump (see slide 63 of the course), whose response is defined as `h(x) = s(2[xd ≥ θ] − 1)`

, where

- d is the the dimension along which the decision is taken,
- [·] is 1 if is true and 0 otherwise,
- θ is the threshold applied along dimension d and
`s ∈ {−1, 1}`

is the polarity of the decision stump (this is a multiplicative factor, not a func- tion!).

For example, if s = 1, the decision stump will consider that all samples whose d-th feature is greater than θ are in the positive class `(h(x) = +1)`

, and all samples with a feature strictly lower than are in the negative class `(h(x) = -1)`

.

1 |
def evaluate_stump(features, coordinate_wl, polarity_wl, theta_wl): |

At each round of AdaBoost, the samples are reweighted, thus producing a new classification prob- lem, where the samples with a larger weigth count more in the classification error. The first step of a new round is to find the weak learner with the best performance for this new problem, that is, with the smallest classification error:

Notes on the implementation:

- The error is normalized in the course’s slides, but in practice you don’t need to, since the weights themselves are already normalized in the main loop of the algorithm.
- When searching for the best weak learner, you don’t need to consider all possible combi- nations of θ, d, s. For a given dimension d, the relevant values to try are the x (where i indexes the training samples).

1 |
def find_best_weak_learner(weights, features, labels): |

1 |
npoints = features.shape[0] |

1 |
## TODO (Question 2) |

It can be shown (cf. slide 69 of the course*) that the AdaBoost strong classifier’s response con- verges to half the posterior log-ratio:

Therefore, we can check how good the response gets in terms of approximating the posterior.

*NB: There is a typo in this slide, the 2 1 is missing.

1 |
approx_posterior_10 = 1 / (1 + np.exp(-2 * f_10)) |

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- 代写cs3014 Google Analytics Customer Rev 2020-01-21
- 代写cmpsc121 Structs代写留学生c/C++实验... 2020-01-21
- 代写mis6326 Data Management调试存储过程作业、数据库编 2020-01-21
- 代写msci 581作业、代做marketing Analytics作业、P 2020-01-20
- Software课程作业代做、代写java，C/C++程序设计作业、Pyth 2020-01-20
- Tcss 372作业代做、代写python，Java编程语言作业、代做c/C 2020-01-20
- Emergency Facilities作业代写、代写r编程设计作业、R课程 2020-01-18
- Cis 413/513作业代做、代写data Structures作业、Ja 2020-01-18
- 代写ia626留学生作业、Python程序设计作业调试、代做data课程作业 2020-01-18
- Mat00027i作业代写、Java程序语言作业调试、Mathematica 2020-01-17
- 代做kt Model作业、代写java，Python编程设计作业、代做c/C 2020-01-17
- Data Set课程作业代做、代写r程序语言作业、Ltcret留学生作业代做 2020-01-17
- 代写rstudio留学生作业、代做r编程设计作业、代写r课程设计作业代做数据 2020-01-17
- 代写cs2250 Delimiter Matching代做数据结... 2020-01-16
- 代写cs12b Edit Distance帮写java实验作业... 2020-01-16
- 代写mins325 Filereader And Filewriter代... 2020-01-16
- 代写cosi131 Tunnels帮写java实验作业 2020-01-16
- 代写inm312 Balancebit Software代写留学... 2020-01-16
- 代写cs61b Maze Solver代写java课程设计 2020-01-16
- Program留学生作业代做、C/C++编程语言作业代写、代做java，Py 2020-01-14