Problem Description

Given 800 data from DataTrain_Tugas3_AI.csv that contains five attributes \(X_1\), \(X_2\), \(X_3\), \(X_4\), and \(X_4\). It also has a single output that contains four labels (named \(Y\)) \(0\), \(1\), \(2\), and \(3\). We need to build a classification system with k-Nearest Neighbors (kNN) for determining a label from data testing in DataTest_Tugas3_AI.csv. Then, create a file TebakanTugas3.csv which is contains a single column about 200 rows of labels (\(0\), \(1\), \(2\), or \(3\)).

Implementation

Suppose that several helpers function already included. Source code available on wisn/knncpp.

Preparation

First, collect all data train from DataTrain_Tugas3_AI.csv file.

Data data_train = readCSV("data/DataTrain_Tugas3_AI.csv");

\[ \\ \]

Second, collect all data test from DataTest_Tugas3_AI.csv file.

Data data_test = readCSV("data/DataTest_Tugas3_AI.csv");

\[ \\ \]

Lastly, build a function to calculate distance between object. We will using Euclidean distance.

// Included in kNN abstract data type
  double distance(Datum a, Datum b) {
    double ret = 0;
    ret += pow(a.x1 - b.x1, 2);
    ret += pow(a.x2 - b.x2, 2);
    ret += pow(a.x3 - b.x3, 2);
    ret += pow(a.x4 - b.x4, 2);
    ret += pow(a.x5 - b.x5, 2);

    return sqrt(ret);
  }

\[ \\ \]

kNN Prediction

This is our main focus. We are trying to predict the classification result based on kNN algorithm with Euclidean distance.

// Included in kNN abstract data type
  char predictions(Datum test) {
    int train_size = this->train.size();
    
    vector<Point> points;
    for (int j = 0; j < train_size; j++) {
      Datum train = this->train[j];

      Point point;
      point.label = train.y;
      point.distance = this->distance(test, train);

      points.push_back(point);
    }

    auto comparator = [](Point x, Point y) {
      if (x.distance == y.distance)
        return x.label < y.label;

      return x.distance < y.distance;
    };

    sort(points.begin(), points.end(), comparator);

    int labels[4];
    fill(labels, labels + 4, 0);
    for (int i = 0; i < this->k; i++) {
      labels[points[i].label - '0'] += 1;
    }

    int label = 0;
    int best = labels[0];
    for (int i = 0; i < 4; i++) {
      if (labels[i] > best) {
        best = labels[i];
        label = i;
      }
    }

    return '0' + label;
  }

\[ \\ \]

Now, try to gets some data based on \(K = 5\) value.

cout << '[';

kNN knn (data_train, 5);
for (int i = 0; i < (int) data_test.size(); i++) {
  if (i > 0) cout << ", ";

  Datum test = data_test[i];
  cout << knn.predictions(test);
}

cout << ']' << endl;
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 0, 0, 2, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 3, 1, 1, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 0, 1, 0, 0, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 3, 0, 3, 3, 2, 2, 2, 2, 2, 0, 2, 3, 2, 2, 3, 3, 2, 2, 3, 2, 2, 3, 0, 3, 3, 2, 2, 2, 2, 2, 2, 0, 3, 2, 2, 2, 3, 3, 3, 2, 2, 3, 3, 3, 3, 2, 3, 2, 2, 3, 2, 3, 3, 2, 3, 2, 3, 2, 2, 2, 2, 3, 0, 2, 2, 2, 2, 2, 3, 3, 3, 3, 2, 2, 1, 2, 3, 2, 3, 3, 2, 3, 2, 3, 3, 3, 3, 3, 3, 3]

\[ \\ \]

Our kNN prediction function works just as expected. However, does \(K = 5\) the right answer? We don’t know! Hence, we need to build a validation set to find a better \(K\) value. In this case, let’s just use k-fold cross validation and calculate the accuracy based on it. Why k-fold cross validation? Well, it just happened that I understand this one.

kNN Validation

Data validation_data = randomnize(data_train);
Data validation_train, validation_test;

int n = data_train.size();
int m = data_test.size();
for (int i = 0; i < (n - m); i++)
  validation_train.push_back(validation_data[i]);

for (int i = (n - m); i < n; i++)
  validation_test.push_back(validation_data[i]);

int max_k = m;
vector<pair<int, double>> accuracy (max_k);
for (int k = 1; k <= max_k; k++) {
  kNN knn (validation_train, k);
  int match = 0;

  for (int i = 0; i < m; i++) {
    Datum test = validation_test[i];
    int predictions = knn.predictions(test);
    if (predictions == test.y)
      match += 1;
  }

  accuracy[k - 1] = {k, (match / (double) m) * 100};
}

cout << "Accuracy calculation:" << endl;
cout << '[';
for (int i = 0; i < max_k; i++) {
  if (i > 0) cout << ", ";

  cout << "(K = " << accuracy[i].first << ", " << accuracy[i].second << "%)";
}

cout << ']' << endl;
Accuracy calculation:
[(K = 1, 81%), (K = 2, 80.5%), (K = 3, 83%), (K = 4, 82%), (K = 5, 84.5%), (K = 6, 83%), (K = 7, 86%), (K = 8, 84.5%), (K = 9, 85.5%), (K = 10, 85.5%), (K = 11, 86%), (K = 12, 85.5%), (K = 13, 86.5%), (K = 14, 86.5%), (K = 15, 87.5%), (K = 16, 86.5%), (K = 17, 86%), (K = 18, 85.5%), (K = 19, 85%), (K = 20, 85.5%), (K = 21, 86%), (K = 22, 85.5%), (K = 23, 85%), (K = 24, 83.5%), (K = 25, 83%), (K = 26, 83.5%), (K = 27, 84%), (K = 28, 83%), (K = 29, 84%), (K = 30, 85%), (K = 31, 84%), (K = 32, 85%), (K = 33, 84%), (K = 34, 84%), (K = 35, 83.5%), (K = 36, 83.5%), (K = 37, 82.5%), (K = 38, 84%), (K = 39, 84.5%), (K = 40, 84%), (K = 41, 83.5%), (K = 42, 84%), (K = 43, 84%), (K = 44, 84%), (K = 45, 85%), (K = 46, 85%), (K = 47, 85%), (K = 48, 84%), (K = 49, 84%), (K = 50, 83.5%), (K = 51, 83.5%), (K = 52, 83.5%), (K = 53, 83.5%), (K = 54, 83.5%), (K = 55, 83%), (K = 56, 82.5%), (K = 57, 82.5%), (K = 58, 82.5%), (K = 59, 83%), (K = 60, 82.5%), (K = 61, 84%), (K = 62, 82.5%), (K = 63, 82.5%), (K = 64, 82.5%), (K = 65, 81.5%), (K = 66, 81.5%), (K = 67, 80.5%), (K = 68, 81%), (K = 69, 81%), (K = 70, 81%), (K = 71, 81%), (K = 72, 81%), (K = 73, 81%), (K = 74, 82%), (K = 75, 81%), (K = 76, 81.5%), (K = 77, 81.5%), (K = 78, 82%), (K = 79, 82%), (K = 80, 82.5%), (K = 81, 81.5%), (K = 82, 82%), (K = 83, 83%), (K = 84, 82%), (K = 85, 82.5%), (K = 86, 83%), (K = 87, 82.5%), (K = 88, 82%), (K = 89, 82%), (K = 90, 82.5%), (K = 91, 82%), (K = 92, 82.5%), (K = 93, 81.5%), (K = 94, 81%), (K = 95, 81.5%), (K = 96, 80.5%), (K = 97, 81%), (K = 98, 81%), (K = 99, 81%), (K = 100, 81%), (K = 101, 81%), (K = 102, 80%), (K = 103, 80.5%), (K = 104, 81%), (K = 105, 80.5%), (K = 106, 80%), (K = 107, 80%), (K = 108, 80%), (K = 109, 79.5%), (K = 110, 79.5%), (K = 111, 80%), (K = 112, 80%), (K = 113, 80.5%), (K = 114, 80.5%), (K = 115, 80%), (K = 116, 79%), (K = 117, 79.5%), (K = 118, 78.5%), (K = 119, 78.5%), (K = 120, 79%), (K = 121, 79%), (K = 122, 79%), (K = 123, 79%), (K = 124, 78.5%), (K = 125, 79%), (K = 126, 78%), (K = 127, 79%), (K = 128, 79.5%), (K = 129, 79.5%), (K = 130, 79%), (K = 131, 79%), (K = 132, 78%), (K = 133, 78.5%), (K = 134, 78.5%), (K = 135, 78.5%), (K = 136, 78.5%), (K = 137, 78.5%), (K = 138, 78.5%), (K = 139, 78.5%), (K = 140, 78%), (K = 141, 77%), (K = 142, 76.5%), (K = 143, 76.5%), (K = 144, 76.5%), (K = 145, 77%), (K = 146, 76.5%), (K = 147, 76.5%), (K = 148, 76%), (K = 149, 76%), (K = 150, 75.5%), (K = 151, 75%), (K = 152, 75%), (K = 153, 75.5%), (K = 154, 75.5%), (K = 155, 75.5%), (K = 156, 74.5%), (K = 157, 75%), (K = 158, 74.5%), (K = 159, 75%), (K = 160, 74.5%), (K = 161, 74.5%), (K = 162, 73%), (K = 163, 73%), (K = 164, 73%), (K = 165, 73%), (K = 166, 72%), (K = 167, 72%), (K = 168, 72%), (K = 169, 72.5%), (K = 170, 71.5%), (K = 171, 71%), (K = 172, 71%), (K = 173, 71%), (K = 174, 70.5%), (K = 175, 70.5%), (K = 176, 70%), (K = 177, 69%), (K = 178, 69.5%), (K = 179, 69%), (K = 180, 69%), (K = 181, 69%), (K = 182, 68.5%), (K = 183, 69%), (K = 184, 69%), (K = 185, 67.5%), (K = 186, 67.5%), (K = 187, 66.5%), (K = 188, 65.5%), (K = 189, 64%), (K = 190, 63.5%), (K = 191, 64%), (K = 192, 62%), (K = 193, 62.5%), (K = 194, 61.5%), (K = 195, 61%), (K = 196, 60.5%), (K = 197, 60.5%), (K = 198, 59.5%), (K = 199, 58%), (K = 200, 58%)]

\[ \\ \]

Now, based on the accuracy table above, we could determine which \(K\) value is better.

int current_k = accuracy[0].first;
double current_accuracy = accuracy[0].second;
for (int i = 0; i < max_k; i++) {
  if (accuracy[i].second > current_accuracy) {
    current_accuracy = accuracy[i].second;
    current_k = accuracy[i].first;
  }
}

K = current_k;
cout << K << " with " << current_accuracy << " percentile." << endl;
K = 15 with 87.5 percentile.

\[ \\ \]

kNN Solution

We now already retrieved the best \(K\) value. Apply it to the kNN predictions function.

vector<char> labels;
kNN knn (data_train, K);
int n = data_test.size();
for (int i = 0; i < n; i++) {
  Datum test = data_test[i];
  labels.push_back(knn.predictions(test));
}

cout << "Retrieved solution:" << endl;
cout << '[';

for (int i = 0; i < n; i++) {
  if (i > 0) cout << ", ";

  cout << labels[i];
}

cout << ']' << endl;
Retrieved solution:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 3, 1, 0, 0, 1, 1, 0, 1, 2, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 3, 0, 3, 3, 2, 2, 2, 2, 2, 2, 0, 3, 2, 2, 3, 3, 2, 2, 3, 2, 2, 3, 2, 3, 3, 2, 2, 2, 2, 2, 2, 0, 3, 2, 2, 2, 3, 3, 3, 2, 2, 3, 3, 3, 3, 2, 1, 0, 2, 3, 2, 3, 3, 2, 3, 2, 3, 2, 2, 2, 2, 3, 0, 2, 2, 2, 2, 2, 3, 3, 3, 3, 2, 2, 2, 2, 3, 2, 3, 3, 2, 3, 2, 3, 3, 3, 3, 3, 3, 3]

\[ \\ \]

Store the solution to TebakanTugas3.csv and we are all set!

---
title: "Data Classification with k-Nearest Neighbors (kNN) in C++11"
author: "Wisnu Adi Nurcahyo"
date: "December 2nd, 2018"

output:
  html_notebook:
    css: presentation.css
---

## Problem Description

Given 800 data from `DataTrain_Tugas3_AI.csv` that contains five attributes $X_1$, $X_2$, $X_3$, $X_4$, and $X_4$.
It also has a single output that contains four labels (named $Y$) $0$, $1$, $2$, and $3$.
We need to build a classification system with k-Nearest Neighbors (kNN) for determining a label from data testing in `DataTest_Tugas3_AI.csv`.
Then, create a file `TebakanTugas3.csv` which is contains a single column about 200 rows of labels ($0$, $1$, $2$, or $3$).

## Implementation

Suppose that several helpers function already included.
Source code available on [wisn/knncpp](https://github.com/wisn/knncpp/blob/master/knn.cpp).

### Preparation

First, collect all data train from `DataTrain_Tugas3_AI.csv` file.

```{cpp}
Data data_train = readCSV("data/DataTrain_Tugas3_AI.csv");
```

$$
\\
$$

Second, collect all data test from `DataTest_Tugas3_AI.csv` file.

```{cpp}
Data data_test = readCSV("data/DataTest_Tugas3_AI.csv");
```

$$
\\
$$

Lastly, build a function to calculate distance between object.
We will using Euclidean distance.

```{cpp}
// Included in kNN abstract data type
  double distance(Datum a, Datum b) {
    double ret = 0;
    ret += pow(a.x1 - b.x1, 2);
    ret += pow(a.x2 - b.x2, 2);
    ret += pow(a.x3 - b.x3, 2);
    ret += pow(a.x4 - b.x4, 2);
    ret += pow(a.x5 - b.x5, 2);

    return sqrt(ret);
  }
```

$$
\\
$$

### kNN Prediction

This is our main focus.
We are trying to predict the classification result based on kNN algorithm with Euclidean distance.

```{cpp}
// Included in kNN abstract data type
  char predictions(Datum test) {
    int train_size = this->train.size();
    
    vector<Point> points;
    for (int j = 0; j < train_size; j++) {
      Datum train = this->train[j];

      Point point;
      point.label = train.y;
      point.distance = this->distance(test, train);

      points.push_back(point);
    }

    auto comparator = [](Point x, Point y) {
      if (x.distance == y.distance)
        return x.label < y.label;

      return x.distance < y.distance;
    };

    sort(points.begin(), points.end(), comparator);

    int labels[4];
    fill(labels, labels + 4, 0);
    for (int i = 0; i < this->k; i++) {
      labels[points[i].label - '0'] += 1;
    }

    int label = 0;
    int best = labels[0];
    for (int i = 0; i < 4; i++) {
      if (labels[i] > best) {
        best = labels[i];
        label = i;
      }
    }

    return '0' + label;
  }
```

$$
\\
$$

Now, try to gets some data based on $K = 5$ value.

```{cpp}
cout << '[';

kNN knn (data_train, 5);
for (int i = 0; i < (int) data_test.size(); i++) {
  if (i > 0) cout << ", ";

  Datum test = data_test[i];
  cout << knn.predictions(test);
}

cout << ']' << endl;
```

```
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 0, 0, 2, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 3, 1, 1, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 0, 1, 0, 0, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 3, 0, 3, 3, 2, 2, 2, 2, 2, 0, 2, 3, 2, 2, 3, 3, 2, 2, 3, 2, 2, 3, 0, 3, 3, 2, 2, 2, 2, 2, 2, 0, 3, 2, 2, 2, 3, 3, 3, 2, 2, 3, 3, 3, 3, 2, 3, 2, 2, 3, 2, 3, 3, 2, 3, 2, 3, 2, 2, 2, 2, 3, 0, 2, 2, 2, 2, 2, 3, 3, 3, 3, 2, 2, 1, 2, 3, 2, 3, 3, 2, 3, 2, 3, 3, 3, 3, 3, 3, 3]
```

$$
\\
$$

Our kNN prediction function works just as expected.
However, does $K = 5$ the right answer?
We don't know!
Hence, we need to build a validation set to find a better $K$ value.
In this case, let's just use k-fold cross validation and calculate the accuracy based on it.
Why k-fold cross validation?
Well, it just happened that I understand this one.

### kNN Validation

```{cpp}
Data validation_data = randomnize(data_train);
Data validation_train, validation_test;

int n = data_train.size();
int m = data_test.size();
for (int i = 0; i < (n - m); i++)
  validation_train.push_back(validation_data[i]);

for (int i = (n - m); i < n; i++)
  validation_test.push_back(validation_data[i]);

int max_k = m;
vector<pair<int, double>> accuracy (max_k);
for (int k = 1; k <= max_k; k++) {
  kNN knn (validation_train, k);
  int match = 0;

  for (int i = 0; i < m; i++) {
    Datum test = validation_test[i];
    int predictions = knn.predictions(test);
    if (predictions == test.y)
      match += 1;
  }

  accuracy[k - 1] = {k, (match / (double) m) * 100};
}

cout << "Accuracy calculation:" << endl;
cout << '[';
for (int i = 0; i < max_k; i++) {
  if (i > 0) cout << ", ";

  cout << "(K = " << accuracy[i].first << ", " << accuracy[i].second << "%)";
}

cout << ']' << endl;
```

```
Accuracy calculation:
[(K = 1, 81%), (K = 2, 80.5%), (K = 3, 83%), (K = 4, 82%), (K = 5, 84.5%), (K = 6, 83%), (K = 7, 86%), (K = 8, 84.5%), (K = 9, 85.5%), (K = 10, 85.5%), (K = 11, 86%), (K = 12, 85.5%), (K = 13, 86.5%), (K = 14, 86.5%), (K = 15, 87.5%), (K = 16, 86.5%), (K = 17, 86%), (K = 18, 85.5%), (K = 19, 85%), (K = 20, 85.5%), (K = 21, 86%), (K = 22, 85.5%), (K = 23, 85%), (K = 24, 83.5%), (K = 25, 83%), (K = 26, 83.5%), (K = 27, 84%), (K = 28, 83%), (K = 29, 84%), (K = 30, 85%), (K = 31, 84%), (K = 32, 85%), (K = 33, 84%), (K = 34, 84%), (K = 35, 83.5%), (K = 36, 83.5%), (K = 37, 82.5%), (K = 38, 84%), (K = 39, 84.5%), (K = 40, 84%), (K = 41, 83.5%), (K = 42, 84%), (K = 43, 84%), (K = 44, 84%), (K = 45, 85%), (K = 46, 85%), (K = 47, 85%), (K = 48, 84%), (K = 49, 84%), (K = 50, 83.5%), (K = 51, 83.5%), (K = 52, 83.5%), (K = 53, 83.5%), (K = 54, 83.5%), (K = 55, 83%), (K = 56, 82.5%), (K = 57, 82.5%), (K = 58, 82.5%), (K = 59, 83%), (K = 60, 82.5%), (K = 61, 84%), (K = 62, 82.5%), (K = 63, 82.5%), (K = 64, 82.5%), (K = 65, 81.5%), (K = 66, 81.5%), (K = 67, 80.5%), (K = 68, 81%), (K = 69, 81%), (K = 70, 81%), (K = 71, 81%), (K = 72, 81%), (K = 73, 81%), (K = 74, 82%), (K = 75, 81%), (K = 76, 81.5%), (K = 77, 81.5%), (K = 78, 82%), (K = 79, 82%), (K = 80, 82.5%), (K = 81, 81.5%), (K = 82, 82%), (K = 83, 83%), (K = 84, 82%), (K = 85, 82.5%), (K = 86, 83%), (K = 87, 82.5%), (K = 88, 82%), (K = 89, 82%), (K = 90, 82.5%), (K = 91, 82%), (K = 92, 82.5%), (K = 93, 81.5%), (K = 94, 81%), (K = 95, 81.5%), (K = 96, 80.5%), (K = 97, 81%), (K = 98, 81%), (K = 99, 81%), (K = 100, 81%), (K = 101, 81%), (K = 102, 80%), (K = 103, 80.5%), (K = 104, 81%), (K = 105, 80.5%), (K = 106, 80%), (K = 107, 80%), (K = 108, 80%), (K = 109, 79.5%), (K = 110, 79.5%), (K = 111, 80%), (K = 112, 80%), (K = 113, 80.5%), (K = 114, 80.5%), (K = 115, 80%), (K = 116, 79%), (K = 117, 79.5%), (K = 118, 78.5%), (K = 119, 78.5%), (K = 120, 79%), (K = 121, 79%), (K = 122, 79%), (K = 123, 79%), (K = 124, 78.5%), (K = 125, 79%), (K = 126, 78%), (K = 127, 79%), (K = 128, 79.5%), (K = 129, 79.5%), (K = 130, 79%), (K = 131, 79%), (K = 132, 78%), (K = 133, 78.5%), (K = 134, 78.5%), (K = 135, 78.5%), (K = 136, 78.5%), (K = 137, 78.5%), (K = 138, 78.5%), (K = 139, 78.5%), (K = 140, 78%), (K = 141, 77%), (K = 142, 76.5%), (K = 143, 76.5%), (K = 144, 76.5%), (K = 145, 77%), (K = 146, 76.5%), (K = 147, 76.5%), (K = 148, 76%), (K = 149, 76%), (K = 150, 75.5%), (K = 151, 75%), (K = 152, 75%), (K = 153, 75.5%), (K = 154, 75.5%), (K = 155, 75.5%), (K = 156, 74.5%), (K = 157, 75%), (K = 158, 74.5%), (K = 159, 75%), (K = 160, 74.5%), (K = 161, 74.5%), (K = 162, 73%), (K = 163, 73%), (K = 164, 73%), (K = 165, 73%), (K = 166, 72%), (K = 167, 72%), (K = 168, 72%), (K = 169, 72.5%), (K = 170, 71.5%), (K = 171, 71%), (K = 172, 71%), (K = 173, 71%), (K = 174, 70.5%), (K = 175, 70.5%), (K = 176, 70%), (K = 177, 69%), (K = 178, 69.5%), (K = 179, 69%), (K = 180, 69%), (K = 181, 69%), (K = 182, 68.5%), (K = 183, 69%), (K = 184, 69%), (K = 185, 67.5%), (K = 186, 67.5%), (K = 187, 66.5%), (K = 188, 65.5%), (K = 189, 64%), (K = 190, 63.5%), (K = 191, 64%), (K = 192, 62%), (K = 193, 62.5%), (K = 194, 61.5%), (K = 195, 61%), (K = 196, 60.5%), (K = 197, 60.5%), (K = 198, 59.5%), (K = 199, 58%), (K = 200, 58%)]
```

$$
\\
$$

Now, based on the accuracy table above, we could determine which $K$ value is better.

```{cpp}
int current_k = accuracy[0].first;
double current_accuracy = accuracy[0].second;
for (int i = 0; i < max_k; i++) {
  if (accuracy[i].second > current_accuracy) {
    current_accuracy = accuracy[i].second;
    current_k = accuracy[i].first;
  }
}

K = current_k;
cout << K << " with " << current_accuracy << " percentile." << endl;
```

```
K = 15 with 87.5 percentile.
```

$$
\\
$$

### kNN Solution

We now already retrieved the best $K$ value.
Apply it to the kNN predictions function.

```{cpp}
vector<char> labels;
kNN knn (data_train, K);
int n = data_test.size();
for (int i = 0; i < n; i++) {
  Datum test = data_test[i];
  labels.push_back(knn.predictions(test));
}

cout << "Retrieved solution:" << endl;
cout << '[';

for (int i = 0; i < n; i++) {
  if (i > 0) cout << ", ";

  cout << labels[i];
}

cout << ']' << endl;
```

```
Retrieved solution:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 3, 1, 0, 0, 1, 1, 0, 1, 2, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 3, 0, 3, 3, 2, 2, 2, 2, 2, 2, 0, 3, 2, 2, 3, 3, 2, 2, 3, 2, 2, 3, 2, 3, 3, 2, 2, 2, 2, 2, 2, 0, 3, 2, 2, 2, 3, 3, 3, 2, 2, 3, 3, 3, 3, 2, 1, 0, 2, 3, 2, 3, 3, 2, 3, 2, 3, 2, 2, 2, 2, 3, 0, 2, 2, 2, 2, 2, 3, 3, 3, 3, 2, 2, 2, 2, 3, 2, 3, 3, 2, 3, 2, 3, 3, 3, 3, 3, 3, 3]
```

$$
\\
$$

Store the solution to `TebakanTugas3.csv` and we are all set!
