A Novel Classification Model Based on Hybrid K-Means and Neural Network for Classification Problems

Cui Chenghu, Arit Thammano

Abstract


We propose a new classification model—a new classification model for clustering overlapping problems based on K-Means and neural networks. K-means clustering algorithm belongs to unsupervised learning. It is a classic algorithm for solving clustering problems. Since this algorithm calculates its categories based on distance, the results tend to converge to the local optimal solution and have poor boundary clustering properties. The K-Means classification algorithm defines clusters by the distance between the cluster center value and the target object, and the optimal result is obtained through continuous iteration. Therefore, clustering results are overlapped, and there are often outliers that do not belong to the current cluster, resulting in unsatisfactory clustering results. Our model offers a new method to segment non-ideal data in overlapping regions. Since clustering algorithms cannot effectively identify and classify this part of the data, we split this part of the data and train it using a neural network. The results are then integrated into the clustered data. In the experiment, the k-fold cross-validation method ensures the model stability of the results. We used the accuracy to evaluate the quality of the model, and we used standard deviation and mean deviation to detect clustering results. Five sets of experimental data from the cross-experiment show that compared with the K-Means classification model, the accuracy of our model is effectively improved.

 

Doi: 10.28991/HIJ-2024-05-03-012

Full Text: PDF


Keywords


Overlapping Clustering; K-Means Classification; Neural Network; Machine Learning.

References


Hassoun, A., Aït-Kaddour, A., Abu-Mahfouz, A. M., Rathod, N. B., Bader, F., Barba, F. J., Biancolillo, A., Cropotova, J., Galanakis, C. M., Jambrak, A. R., Lorenzo, J. M., Måge, I., Ozogul, F., & Regenstein, J. (2023). The fourth industrial revolution in the food industry—Part I: Industry 4.0 technologies. Critical Reviews in Food Science and Nutrition, 63(23), 6547–6563. doi:10.1080/10408398.2022.2034735.

Meiring, G. A. M., & Myburgh, H. C. (2015). A review of intelligent driving style analysis systems and related artificial intelligence algorithms. Sensors (Switzerland), 15(12), 30653–30682. doi:10.3390/s151229822.

Celebi, M. E., & Aydin, K. (2016). Unsupervised learning algorithms. Unsupervised Learning Algorithms, 8, 55-70. doi:10.1007/978-3-319-24211-8.

Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-means algorithm: A comprehensive survey and performance evaluation. Electronics (Switzerland), 9(8), 1–12. doi:10.3390/electronics9081295.

Azar, A. T., Gaber, T., Oliva, D., Ṭulbah, M. F., & Hassanien, A. E. (2020). Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), 1153, 247–257.

Chenghu, C., Jinna, H., Visavakitcharoen, A., Temdee, P., & Chaisricharoen, R. (2019). Identifying the effectiveness of arabica drip coffee on individual human brainwave. ECTI DAMT-NCON 2019 - 4th International Conference on Digital Arts, Media and Technology and 2nd ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering, 1–4. doi:10.1109/ECTI-NCON.2019.8692298.

Chen, G., Liu, Y., & Ge, Z. (2019). K-means Bayes algorithm for imbalanced fault classification and big data application. Journal of Process Control, 81, 54–64. doi:10.1016/j.jprocont.2019.06.011.

Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178–210. doi:10.1016/j.ins.2022.11.139.

Wang, Z., Du, X., & Wu, L. (2022). AI-Based Secure Construction of University Information Services Platform. Security and Communication Networks, 2022(1), 1939796. doi:10.1155/2022/1939796.

Khanmohammadi, S., Adibeig, N., & Shanehbandy, S. (2017). An improved overlapping k-means clustering method for medical applications. Expert Systems with Applications, 67, 12–18. doi:10.1016/j.eswa.2016.09.025.

Fränti, P., & Sieranoja, S. (2018). K-means properties on six clustering benchmark datasets. Applied Intelligence, 48(12), 4743–4759. doi:10.1007/s10489-018-1238-7.

Danganan, A. E., & De Los Reyes, E. (2021). Ehmcoke: An enhanced overlapping clustering algorithm for data analysis. Bulletin of Electrical Engineering and Informatics, 10(4), 2212–2222. doi:10.11591/EEI.V10I4.2547.

Chen, Y. C., Chen, Y. L., & Lu, J. Y. (2021). MK-Means: Detecting evolutionary communities in dynamic networks. Expert Systems with Applications, 176, 114807. doi:10.1016/j.eswa.2021.114807.

Nie, F., Li, Z., Wang, R., & Li, X. (2023). An Effective and Efficient Algorithm for K-Means Clustering with New Formulation. IEEE Transactions on Knowledge and Data Engineering, 35(4), 3433–3443. doi:10.1109/TKDE.2022.3155450.

Sieranoja, S., & Fränti, P. (2022). Adapting k-means for graph clustering. Knowledge and Information Systems, 64(1), 115–142. doi:10.1007/s10115-021-01623-y.

Gan, G., & Ng, M. K. P. (2017). K-Means Clustering with Outlier Removal. Pattern Recognition Letters, 90, 8–14. doi:10.1016/j.patrec.2017.03.008.

Wang, P., Shi, H., Yang, X., & Mi, J. (2019). Three-way k-means: integrating k-means and three-way decision. International journal of machine learning and cybernetics, 10, 2767-2777. doi:10.1007/s13042-018-0901-y.

Lu, X., Ye, X., & Cheng, Y. (2024). An overlapping minimization-based over-sampling algorithm for binary imbalanced classification. Engineering Applications of Artificial Intelligence, 133, 108107. doi:10.1016/j.engappai.2024.108107.

Afridi, M. K., Azam, N., & Yao, J. T. (2020). Variance based three-way clustering approaches for handling overlapping clustering. International Journal of Approximate Reasoning, 118, 47–63. doi:10.1016/j.ijar.2019.11.011.

Dai, Q., Wang, L. hui, Xu, K. long, Du, T., & Chen, L. fang. (2024). Class-overlap detection based on heterogeneous clustering ensemble for multi-class imbalance problem. Expert Systems with Applications, 255, 124558. doi:10.1016/j.eswa.2024.124558.

Zhou, Q., & Sun, B. (2024). Adaptive K-means clustering based under-sampling methods to solve the class imbalance problem. Data and Information Management, 8(3), 100064. doi:10.1016/j.dim.2023.100064.

Zhu, J., Jiang, Z., Evangelidis, G. D., Zhang, C., Pang, S., & Li, Z. (2019). Efficient registration of multi-view point sets by K-means clustering. Information Sciences, 488, 205–218. doi:10.1016/j.ins.2019.03.024.

Ros, F., & Riad, R. (2024). Feature and Dimensionality Reduction for Clustering with Deep Learning. Springer Nature, XI, 268. doi:10.1007/978-3-031-48743-9.

Lücke, J., & Forster, D. (2019). k-means as a variational EM approximation of Gaussian mixture models. Pattern Recognition Letters, 125, 349–356. doi:10.1016/j.patrec.2019.04.001.

Liu, X., Fan, K., Huang, X., Ge, J., Liu, Y., & Kang, H. (2024). Recent advances in artificial intelligence boosting materials design for electrochemical energy storage. Chemical Engineering Journal, 490. doi:10.1016/j.cej.2024.151625.

Vuttipittayamongkol, P., Elyan, E., Petrovski, A., & Jayne, C. (2018). Overlap-Based Undersampling for Improving Imbalanced Data Classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11314 LNCS, 689–697. doi:10.1007/978-3-030-03493-1_72.

Huang, S., Kang, Z., Xu, Z., & Liu, Q. (2021). Robust deep k-means: An effective and simple method for data clustering. Pattern Recognition, 117, 107996. doi:10.1016/j.patcog.2021.107996.

Saputra, D. M., Saputra, D., & Oswari, L. D. (2020). Effect of Distance Metrics in Determining K-Value in K-Means Clustering Using Elbow and Silhouette Method. Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019), 341–346. doi:10.2991/aisr.k.200424.051.

Shi, H., Wang, P., Yang, X., & Yu, H. (2022). An Improved Mean Imputation Clustering Algorithm for Incomplete Data. Neural Processing Letters, 54(5), 3537–3550. doi:10.1007/s11063-020-10298-5.

Anastassiou, G. A. (2023). Multiple general sigmoids based Banach space valued neural network multivariate approximation. Cubo, 25(3), 411–439. doi:10.56754/0719-0646.2503.411.

Patel, J., Advani, H., Paul, S., & Maiti, T. K. (2022). VLSI Implementation of Neural Network Based Emergent Behavior Model for Robot Control. 2022 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics, DISCOVER 2022 - Proceedings, 197–200. doi:10.1109/DISCOVER55800.2022.9974734.

Szu, H., Yeh, C., Rogers, G., Jenkins, M., Farsaie, A., & Lee, C. H. (1992). Speed up Performances on MIMD Machines. In Proceedings of the International Joint Conference on Neural Networks, 3, 742–747. doi:10.1109/IJCNN.1992.227063.

Nguyen, V. A., Shafieezadeh-Abadeh, S., Kuhn, D., & Esfahani, P. M. (2023). Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization. Mathematics of Operations Research, 48(1), 1–37. doi:10.1287/moor.2021.1176.

Sefira, R., Setiawan, A., Hidayatullah, R., & Darmayanti, R. (2024). The Influence of the Snowball Throwing Learning Model on Pythagorean Theorem Material on Learning Outcomes. Journal Edutechnium Journal of Educational Technology, 2(1), 1–7.

Cheng, Y., Li, Q., & Wan, F. (2021). Financial Risk Management using Machine Learning Method. Proceedings - 2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence, MLBDBI 2021, 133–139. doi:10.1109/MLBDBI54094.2021.00034.

Wang, L. H., Dai, Q., Wang, J. Y., Du, T., & Chen, L. (2024). Undersampling based on generalized learning vector quantization and natural nearest neighbors for imbalanced data. International Journal of Machine Learning and Cybernetics, 1–26. doi:10.1007/s13042-024-02261-w.


Full Text: PDF

DOI: 10.28991/HIJ-2024-05-03-012

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 CUI CHENGHU