# k-Nearest Neighbors Classification (k-NN)¶

$$k$$-NN classification algorithm infers the class for the new feature vector by computing majority vote of the $$k$$ nearest observations from the training set.

 Operation Computational methods Programming Interface Training Brute-force k-d tree train(…) train_input train_result Inference Brute-force k-d tree infer(…) infer_input infer_result

## Mathematical formulation¶

### Training¶

Let $$X = \{ x_1, \ldots, x_n \}$$ be the training set of $$p$$-dimensional feature vectors, let $$Y = \{ y_1, \ldots, y_n \}$$ be the set of class labels, where $$y_i \in \{ 0, \ldots, c-1 \}$$, $$1 \leq i \leq n$$. Given $$X$$, $$Y$$ and the number of nearest neighbors $$k$$, the problem is to build a model that allows distance computation between the feature vectors in training and inference sets at the inference stage.

#### Training method: brute-force¶

The training operation produces the model that stores all the feature vectors from the initial training set $$X$$.

#### Training method: k-d tree¶

The training operation builds a $$k$$-$$d$$ tree that partitions the training set $$X$$ (for more details, see k-d Tree).

### Inference¶

Let $$X' = \{ x_1', \ldots, x_m' \}$$ be the inference set of $$p$$-dimensional feature vectors. Given $$X'$$, the model produced at the training stage and the number of nearest neighbors $$k$$, the problem is to predict the label $$y_j'$$ for each $$x_j'$$, $$1 \leq j \leq m$$, by performing the following steps:

1. Identify the set $$N(x_j') \subseteq X$$ of the $$k$$ feature vectors in the training set that are nearest to $$x_j'$$ with respect to the Euclidean distance.

2. Estimate the conditional probability for the $$l$$-th class as the fraction of vectors in $$N(x_j')$$ whose labels $$y_j$$ are equal to $$l$$:

(1)$P_{jl} = \frac{1}{| N(x_j') |} \Big| \big\{ x_r \in N(x_j') : y_r = l \big\} \Big|, \quad 1 \leq j \leq m, \; 0 \leq l < c.$
3. Predict the class that has the highest probability for the feature vector $$x_j'$$:

(2)$y_j' = \mathrm{arg}\max_{0 \leq l < c} P_{jl}, \quad 1 \leq j \leq m.$

#### Inference method: brute-force¶

Brute-force inference method determines the set $$N(x_j')$$ of the nearest feature vectors by iterating over all the pairs $$(x_j', x_i)$$ in the implementation defined order, $$1 \leq i \leq n$$, $$1 \leq j \leq m$$. The final prediction is computed according to the equations (1) and (2).

#### Inference method: k-d tree¶

K-d tree inference method traverses the $$k$$-$$d$$ tree to find feature vectors associated with a leaf node that are closest to $$x_j'$$, $$1 \leq j \leq m$$. The set $$\tilde{n}(x_j')$$ of the currently-known nearest $$k$$-th neighbors is progressively updated during tree traversal. The search algorithm limits exploration of the nodes for which the distance between the $$x_j'$$ and respective part of the feature space is not less than the distance between $$x_j'$$ and the most distant feature vector from $$\tilde{n}(x_j')$$. Once tree traversal is finished, $$\tilde{n}(x_j') \equiv N(x_j')$$. The final prediction is computed according to the equations (1) and (2).

## Usage example¶

### Training¶

knn::model<> run_training(const table& data,
const table& labels) {
const std::int64_t class_count = 10;
const std::int64_t neighbor_count = 5;
const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

const auto result = train(knn_desc, data, labels);

return result.get_model();
}


### Inference¶

table run_inference(const knn::model<>& model,
const table& new_data) {
const std::int64_t class_count = 10;
const std::int64_t neighbor_count = 5;
const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

const auto result = infer(knn_desc, model, new_data);

print_table("labels", result.get_labels());
}


## Examples¶

Batch Processing:

Batch Processing:

Batch Processing:

## Programming Interface¶

All types and functions in this section are declared in the oneapi::dal::knn namespace and be available via inclusion of the oneapi/dal/algo/knn.hpp header file.

### Descriptor¶

template<typename Float = detail::descriptor_base<>::float_t, typename Method = detail::descriptor_base<>::method_t, typename Task = detail::descriptor_base<>::task_t>
class descriptor
Template Parameters
• Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

• Method – Tag-type that specifies an implementation of algorithm. Can be method::v1::brute_force or method::v1::kd_tree.

• Task – Tag-type that specifies type of the problem to solve. Can be task::v1::classification.

Constructors

descriptor(std::int64_t class_count, std::int64_t neighbor_count)

Creates a new instance of the class with the given class_count and neighbor_count property values.

Public Methods

auto &set_class_count(std::int64_t value)
auto &set_neighbor_count(std::int64_t value)

#### Method tags¶

struct brute_force

Tag-type that denotes brute-force computational method.

struct kd_tree

Tag-type that denotes k-d tree computational method.

using by_default = brute_force

Alias tag-type for brute-force computational method.

#### Task tags¶

struct classification

Tag-type that parameterizes entities used for solving classification problem.

using by_default = classification

Alias tag-type for classification task.

### Model¶

template<typename Task = task::by_default>
class model
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::v1::classification.

Constructors

model()

Creates a new instance of the class with the default property values.

### Training train(...)¶

#### Input¶

template<typename Task = task::by_default>
class train_input
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::v1::classification.

Constructors

train_input(const table &data, const table &labels)

Creates a new instance of the class with the given data and labels property values.

Properties

const table &data = table{}

The training set X.

Getter & Setter
const table & get_data() const
auto & set_data(const table &data)
const table &labels = table{}

Vector of labels y for the training set X.

Getter & Setter
const table & get_labels() const
auto & set_labels(const table &labels)

#### Result¶

template<typename Task = task::by_default>
class train_result
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::v1::classification.

Constructors

train_result()

Creates a new instance of the class with the default property values.

Properties

const model<Task> &model = model<Task>{}

The trained k-NN model.

Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

#### Operation¶

template<typename Descriptor>
knn::train_result train(const Descriptor &desc, const knn::train_input &input)
Template Parameters
• desc – k-NN algorithm descriptor knn::desc

• input – Input data for the training operation

Preconditions
input.data.has_data == true
input.labels.has_data == true
input.data.row_count == input.labels.row_count
input.labels.column_count == 1
input.labels[i] >= 0
input.labels[i] < desc.class_count

### Inference infer(...)¶

#### Input¶

template<typename Task = task::by_default>
class infer_input
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::v1::classification.

Constructors

infer_input(const table &data, const model<Task> &model)

Creates a new instance of the class with the given model and data property values.

Properties

const table &data = table{}

The dataset for inference $$X'$$.

Getter & Setter
const table & get_data() const
auto & set_data(const table &data)
const model<Task> &model = model<Task>{}

The trained k-NN model.

Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &m)

#### Result¶

template<typename Task = task::by_default>
class infer_result
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::v1::classification.

Constructors

infer_result()

Creates a new instance of the class with the default property values.

Properties

const table &labels = table{}

The predicted labels.

Getter & Setter
const table & get_labels() const
auto & set_labels(const table &value)

#### Operation¶

template<typename Descriptor>
knn::infer_result infer(const Descriptor &desc, const knn::infer_input &input)
Template Parameters
• desc – k-NN algorithm descriptor knn::desc

• input – Input data for the inference operation

Preconditions
input.data.has_data == true
Postconditions
result.labels.row_count == input.data.row_count
result.labels.column_count == 1
result.labels[i] >= 0
result.labels[i] < desc.class_count