Machine Learning-Based News Classification: Comparison of KNN Accuracy with Hyperparameter Tuning

Muhamad Nur Gunawan; Nuryasin; Syopiansyah Jaya Putra; Sarah Arhami

doi:10.60083/jidt.vi0.661

Download

Published: Jul 17, 2025

DOI: https://doi.org/10.60083/jidt.vi0.661

Keywords:

Text Classification, KNN Algorithm, Parameter Tuning, CRISP-DM, Digital News

Muhamad Nur Gunawan

Syarif Hidayatullah Islamic State University

Nuryasin

Syarif Hidayatullah Islamic State University

Syopiansyah Jaya Putra

Syarif Hidayatullah Islamic State University

Sarah Arhami

Syarif Hidayatullah Islamic State University

Abstract

This study aims to develop an automatic news text classification system using the K-Nearest Neighbor (KNN) algorithm with a hyperparameter tuning approach. Manual classification by editors is considered inefficient, so an accurate and lightweight automated approach is needed. News datasets were obtained through web scraping of bbc.com sites with five main categories, namely business, technology, entertainment, science, and health. This research follows the CRISP-DM methodology which consists of six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. Feature representation is done using TF-IDF and preprocessing includes stopword removal as well as pattern-based noise cleaning. Two experimental scenarios were performed: first, using complete data without balancing; Second, using more balanced undersampling data. Hyperparameter tuning was performed with k-value variations from 1 to 50 and validated with 5-fold cross-validation. The results showed that the model with balanced data and a value of k=11 produced an accuracy, precision, recall, and F1-score of 95%. The system was also successfully implemented into a Flask-based web application that can be used by news editors for real-time text classification. This study emphasizes the importance of parameter optimization and preprocessing in text classification and shows that simple algorithms such as KNN remain competitive if supported by good data processing.

How to Cite

Muhamad Nur Gunawan, Nuryasin, Syopiansyah Jaya Putra, & Sarah Arhami. (2025). Machine Learning-Based News Classification: Comparison of KNN Accuracy with Hyperparameter Tuning . Jurnal Informasi Dan Teknologi, 114-120. https://doi.org/10.60083/jidt.vi0.661

Issue

2025, Vol. 7, No. 2

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

[1] C.-H. Chan, A. Sun, and E.-P. Lim, “Automated Online News Classification with Personalization BT - 4th Int. Conference Available: [Online]. Asian Digit. Libr.,” 2001, pp. 1–10. [Online]. Available: http://ncsi-net.ncsi.iisc.ernet.in/gsdl/collect/icco/index/assoc/HASH01de.dir/doc
[2] V. Korde, “Text Classification and Classifiers: A Survey,” Int. J. Artif. Intell. Appl., vol. 3, no. 2, pp. 85–99, 2012, doi: 10.5121/ijaia.2012.3208.
[3] R. Jindal, R. Malhotra, and A. Jain, “Techniques for text classification: Literature review and current trends,” Webology, vol. 12, no. 2, pp. 1–28, 2015.
[4] V. Bijalwan, V. Kumar, P. Kumari, and J. Pascual, “KNN based machine learning approach for text and document mining,” Int. J. Database Theory Appl., vol. 7, no. 1, pp. 61–70, 2014, doi: 10.14257/ijdta.2014.7.1.06.
[5] B. Trstenjak, S. Mikac, and D. Donko, “KNN with TF-IDF based framework for text categorization,” Procedia Eng., vol. 69, pp. 1356–1364, 2014, doi: 10.1016/j.proeng.2014.03.129.
[6] Z. E. Rasjid and R. Setiawan, “Performance Comparison and Optimization of Text Document Classification using k-NN and Naïve Bayes Classification Techniques,” Procedia Comput. Sci., vol. 116, pp. 107–112, 2017, doi: 10.1016/j.procs.2017.10.017.
[7] M. DEI, “Hyperparameter Tuning Explained Tuning Phases, Tuning Methods, Bayesian Optimization, and Sample Code!” 2019. [Online]. Available: https://towardsdatascience.com/hyperparameter-tuning-explained-d0ebb2bald35
[8] S. M. H. Dadgar, M. S. Araghi, and M. M. Farahani, “A novel text mining approach based on TF-IDF and support vector machine for news classification BT - Proc. 2nd IEEE Int. Conf. Eng. Technol. ICETECH 2016,” 2016, pp. 112–116. doi: 10.1109/ICETECH.2016.7569223.
[9] T. Pranckevičius and V. Marcinkevičius, “Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification,” Balt. J. Mod. Comput., vol. 5, no. 2, pp. 221–232, 2017, doi: 10.22364/bjmc.2017.5.2.05.
[10] Z. Jianqiang and G. Xiaolin, “Comparison research on text pre-processing methods on twitter sentiment analysis,” IEEE Access, vol. 5, pp. 2870–2879, 2017, doi: 10.1109/ACCESS.2017.2672677.
[11] M. A. Fauzi, A. Z. Arifin, S. C. Gosaria, and I. S. Prabowo, “Indonesian News Classification Using Naïve Bayes and Two-Phase Feature Selection Model,” Indones. J. Electr. Eng. Comput. Sci., vol. 2, no. 3, pp. 401–408, 2016, doi: 10.11591/ijeecs.v2.i2.pages.
[12] G. Piatetsky, “CRISP-DM, still the top methodology for analytics, data mining, or data science projects.” 2014. [Online]. Available: https://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html
[13] Google, “Colaboratory - Google.” 2020. [Online]. Available: https://research.google.com/colaboratory/faq.html
[14] F. (web framework) Wikipedia, “Flask (web framework) Wikipedia.” 2020. [Online]. Available: https://en.wikipedia.org/wiki/Flask_(web_framework)
[15] F. Martinez-Plumed, “CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories,” IEEE Trans. Knowl. Data Eng., vol. 4347, no. c, p. 1, 2019, doi: 10.1109/tkde.2019.2962680.
[16] J. Rajshree, S. B. Gaur, C. K. R., and M. Amit, “Text Classification using KNN with different Features Selection Methods Abstra,” Int. J. Res. Publ. Vol. 8-Issue. 1, July 2018 Text, 2018.
[17] Arifin, “Classification of Emotions in Indonesian TextsUsing K-NN Method,” Int. J. Inf. Electron. Eng., vol. 2, no. 6, 2012, doi: 10.7763/ijiee.2012.v2.237.
[18] M. Sanjay, “Why and how to Cross Validate a Model.” 2018. [Online]. Available: https://towardsdatascience.com/why-and-how-to-cross-validate-a-model-d6424b45261f
[19] X. Fang and J. Zhan, “Sentiment analysis using product review data,” J. Big Data, vol. 2, no. 1, 2015, doi: 10.1186/s40537-015-0015-2.
[20] K. L. Sumathy and M. Chidambaram, “Text Mining: Concepts, Applications, Tools and Issues An Overview,” Int. J. Comput. Appl., vol. 80, no. 4, pp. 29–32, 2013, doi: 10.5120/13851-1685.
[21] K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Inf., vol. 10, no. 4, pp. 1–68, 2019, doi: 10.3390/info10040150.
[22] Informatikologi, “Algoritma K-Nearest Neighbor (K-NN) INFORMATIKALOGI.” 2017. [Online]. Available: https://informatikalogi.com/algoritma-k-nn-k-nearest-neighbor/#1
[23] Informatikologi, “Vector Space Model (VSM) dan Pengukuran Jarak pada Information Retrieval (IR) INFORMATIKALOGI.” 2016. [Online]. Available: https://informatikalogi.com/vector-space-model-pengukuran-jarak/#1
[24] N. Newman, R. Fletcher, A. Kalogeropoulos, and R. K. Nielsen, “Digital News Report 2019,” pp. 70–72, 2019, doi: 10.2139/ssrn.2619576.

Article Sidebar

Main Article Content

Abstract

Article Details

References