An Improvised k-NN Respecting Diversity of Data for Network Intrusion Detection
Authors: Yasir Hamid, Balasaraswathi Ranganathan, Ludovic Journaux, Qaiser Farooq Sugumaran Muthukumarasamy
Number of views: 269
Network Intrusion Detection is a complex classification problem aimed at discriminating the legitimate from illegitimate and potentially harmful network connections over the communication network. What adds to the complexity of the problem is the near real-time response to a threat, imbalanced datasets to deal with and finally the data being mixed in nature with some features being numeric some discrete and some nominal. In this work we have applied Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset and eliminate the skewness of the class distribution. The success of k-Nearest Neighbour (k-NN) depends upon the set of neighbours deemed to be very close or similar to a data point which is in turn determined by the similarity/distance metric employed, where most of the metrics employed in literature deal with numeric data only, and either need conversion of categorical features to numeric features or simply eliminated the categorical features, which often leads to reduction in the results. As for this work is considered, we take into consideration both the categories of features simultaneously by replacing the conventional Euclidean metric with Gower metric, which is better suited for mixed data. Gower metric provides a mechanism to deal with heterogeneous features differently and ultimately yields a quantifiable value that determines the similarity of the two instances. Experimental results show that improvised version of k-NN outperforms its conventional counterpart in terms of the Accuracy, Detection Rate, Precision, Recall, f-Measure, and Receiver Operating Characteristic (ROC) curve.