Skip to content

Nearest Neighbor Classifier

Oscar Serra edited this page Feb 24, 2015 · 3 revisions

Table of Contents

Nearest Neighbor Classifier (NN)

Given a new example <math>X_i^{unseen}</math>, the general concept behind this method is to find, within the training set, which example <math>X_m^{train}</math> is the closest, given a distance metric <math>dist(i,m)</math> (which is normally Euclidean), and return the class <math>c</math> this particular example <math>m</math> belongs to <math>\widehat{Y}_i^{unseen} = Y_m^{train}</math>.

Objective

This classifier is categorical, can predict <math>C</math> classes, and gives a priori no information about confidence.

<math>Y \;\; \epsilon \;\; \mathbb{R}^{N1}</math> taking <math>C</math> different categorical values.

In order to find the nearest example we want to minimize the distance <math>dist(i,m)</math> between the new example <math>X_{i} \;\; \epsilon \;\; \mathbb{R}^{1n}</math> and the <math>m^{th}</math> training example <math>X_{m}^{train}</math>:

<math>\epsilon = dist(X_{i}^{unseen},X_{m}^{train}) = \sqrt{X_{i} \cdot X_{m}^T}</math>

Model and Learning

We will have to search within the training set, so with a big database we will have to play it smart so it does not take forever, which might be <math>O(M*N)</math> for an exhaustive search.

There are methods to reduce the <math>N</math> dimension of the examples. We call them Dimensionality Reduction algorithms.

There are also methods to reduce the <math>M</math> dimension, which will typically consist on selecting a subset of examples or creating representative examples artificially (clustering), which are supposed to be a good approximation of the original 1-Nearest Neighbor classification. The Support Vector Machines (explained below) tackle this problem by finding a constant-time classifier that successfully separates the classes, thus turning the computational cost of search into <math>O(N)</math>.

If, however, we don't want to sacrifice the number of examples, we can use a tree data structure like a Ball Tree, which will reduce the Nearest-Neighbor search cost to <math>O(log(M)N)</math>.

Discussion

Advantages

This algorithm is intuitive and its exhaustive search version is easy to implement.

Additionally, it does not need any time for training. Setting up a tree data structure to optimize search can however take additional time.

Extensions

The 1-Nearest Neighbor Classifier Classifier (1-NN or simply NN Classifier) is not very robust to noise, which can be seen as outlayers. In order to have a classifier that generalizes a bit more, we can take not 1 but <math>k</math> nearest neighbors and perform a majority-vote, thus being able to ignore isolated misplaced examples out outliers. In the above graphs, the left one corresponds to a 1-NN and the right one to a k-NN. We can see that the boundaries become smoother, tolerant to outliers.

The k-Nearest Neighbor classifier (k-NN), in turn, has the problem of not being robust to skewed distributed classes. When facing this problem, it is common to assign weights to every <math>k</math> neighbor. These weights can be pre-defined or dependent on the actual data, like the distance to each <math>k</math> neighbor.

Limitations

Despite of these improvements, a Nearest-Neighbor Classifier is very dependent on the training set, and whenever it does not represent well the problem's distribution (like having too little training examples), it will perform poorly.

Further reading

Clone this wiki locally