-
Notifications
You must be signed in to change notification settings - Fork 6
Nearest Neighbor Classifier
Given a new example <math>X_i^{unseen}</math>, the general concept behind this method is to find, within the training set, which example <math>X_m^{train}</math> is the closest, given a distance metric <math>dist(i,m)</math> (which is normally Euclidean), and return the class <math>c</math> this particular example <math>m</math> belongs to <math>\widehat{Y}_i^{unseen} = Y_m^{train}</math>.
This classifier is categorical, can predict <math>C</math> classes, and gives a priori no information about confidence.
<math>Y \;\; \epsilon \;\; \mathbb{R}^{N1}</math> taking <math>C</math> different categorical values.
In order to find the nearest example we want to minimize the distance <math>dist(i,m)</math> between the new example <math>X_{i} \;\; \epsilon \;\; \mathbb{R}^{1n}</math> and the <math>m^{th}</math> training example <math>X_{m}^{train}</math>:
<math>\epsilon = dist(X_{i}^{unseen},X_{m}^{train}) = \sqrt{X_{i} \cdot X_{m}^T}</math>
We will have to search within the training set, so with a big database we will have to play it smart so it does not take forever, which might be <math>O(M*N)</math> for an exhaustive search.
There are methods to reduce the <math>N</math> dimension of the examples. We call them Dimensionality Reduction algorithms.
There are also methods to reduce the <math>M</math> dimension, which will typically consist on selecting a subset of examples or creating representative examples artificially (clustering), which are supposed to be a good approximation of the original 1-Nearest Neighbor classification. The Support Vector Machines (explained below) tackle this problem by finding a constant-time classifier that successfully separates the classes, thus turning the computational cost of search into <math>O(N)</math>.
If, however, we don't want to sacrifice the number of examples, we can use a tree data structure like a Ball Tree, which will reduce the Nearest-Neighbor search cost to <math>O(log(M)N)</math>.
This algorithm is intuitive and its exhaustive search version is easy to implement.
Additionally, it does not need any time for training. Setting up a tree data structure to optimize search can however take additional time.
The 1-Nearest Neighbor Classifier Classifier (1-NN or simply NN Classifier) is not very robust to noise, which can be seen as outlayers. In order to have a classifier that generalizes a bit more, we can take not 1 but <math>k</math> nearest neighbors and perform a majority-vote, thus being able to ignore isolated misplaced examples out outliers. In the above graphs, the left one corresponds to a 1-NN and the right one to a k-NN. We can see that the boundaries become smoother, tolerant to outliers.
The k-Nearest Neighbor classifier (k-NN), in turn, has the problem of not being robust to skewed distributed classes. When facing this problem, it is common to assign weights to every <math>k</math> neighbor. These weights can be pre-defined or dependent on the actual data, like the distance to each <math>k</math> neighbor.
Despite of these improvements, a Nearest-Neighbor Classifier is very dependent on the training set, and whenever it does not represent well the problem's distribution (like having too little training examples), it will perform poorly.
- Machine Learning
- Deep Learning