Classification of Multiple Emotions in Indonesian Text Using The K-Nearest Neighbor Method

Authors

  • Ahmad Zamsuri Universitas Lancang Kuning
  • Sarjon Defit Universitas Putra Indonesia “YPTK” Padang
  • Gunadi Widi Nurcahyo Universitas Putra Indonesia “YPTK” Padang

DOI:

https://doi.org/10.37385/jaets.v4i2.1964

Keywords:

Emotions, TF-IDF, BoW, KNN, Data Splitting

Abstract

Emotions are expressions manifested by individuals in response to what they see or experience. In this study, emotions were examined through individuals' tweets regarding the election issues in Indonesia in 2024. The collected tweets were then labeled based on emotions using the emotion wheel, which consisted of six categories: joy, love, surprise, anger, fear, and sadness. After the labeling process, the next step involved weighting using TF-IDF (Term Frequency-Inverse Document Frequency) and Bag-of-Words (BoW) techniques. Subsequently, the model was evaluated using the K-Nearest Neighbor (KNN) algorithm with three different data splitting ratios: 80:20, 70:30, and 60:40. From the six labels used in the modeling process, the accuracy was then calculated, and the labels were subsequently merged into positive and negative categories. Then the modeling was conducted using the same process with the six labels. The results of this study revealed that the utilization of TF-IDF outperformed BoW. The highest accuracy was achieved with the 80:20 data splitting ratio, attaining 58% accuracy for the six-label classification and 79% accuracy for the two-label classification

Downloads

Download data is not yet available.

Author Biographies

Ahmad Zamsuri, Universitas Lancang Kuning

 

 

Sarjon Defit, Universitas Putra Indonesia “YPTK” Padang

 

 

Gunadi Widi Nurcahyo, Universitas Putra Indonesia “YPTK” Padang

 

 

References

Aloqaily, A., Al-hassan, M., Salah, K., Elshqeirat, B., Almashagbah, M., & Al Hussein Bin Abdullah, P. (2020). Sentiment Analysis For Arabic Tweets Datasets: Lexicon-Based And Machine Learning Approaches. Journal of Theoretical and Applied Information Technology, 29, 4. www.jatit.org

Alturayeif, N., & Luqman, H. (2021). Fine-grained sentiment analysis of arabic covid-19 tweets using bert-based transformers and dynamically weighted loss function. Applied Sciences (Switzerland), 11(22). https://doi.org/10.3390/app112210694

Alzami, F., Udayanti, E. D., Prabowo, D. P., & Megantara, R. A. (2020). Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 235–242. https://doi.org/10.22219/kinetik.v5i3.1066

Anam, M. K., Mahendra, M. I., Agustin, W., Rahmaddeni, & Nurjayadi. (2022). Framework for Analyzing Netizen Opinions on BPJS Using Sentiment Analysis and Social Network Analysis (SNA). Intensif, 6(1), 2549–6824. https://doi.org/10.29407/intensif.v6i1.15870

Angloher, G., Banik, S., Bartolot, D., Benato, G., Bento, A., Bertolini, A., Breier, R., Bucci, C., Burkhart, J., Canonica, L., D’Addabbo, A., Di Lorenzo, S., Einfalt, L., Erb, A., Feilitzsch, F. v., Iachellini, N. F., Fichtinger, S., Fuchs, D., Fuss, A., … Waltenberger, W. (2023). Towards an automated data cleaning with deep learning in CRESST. European Physical Journal Plus, 138(1). https://doi.org/10.1140/epjp/s13360-023-03674-2

Atika Sari, C., & Hari Rachmawanto, E. (2022). Sentiment Analyst on Twitter Using the K-Nearest Neighbors (KNN) Algorithm Against Covid-19 Vaccination. Journal of Applied Intelligent System, 7(2), 135–145. https://doi.org/10.33633/jais.v7i2.6734

Awan, S. E., Bennamoun, M., Sohel, F., Sanfilippo, F. M., Chow, B. J., & Dwivedi, G. (2018). Feature selection and transformation by machine learning reduce variable numbers and improve prediction for heart failure readmission or death. PLoS ONE, 14(6). https://doi.org/10.1371/journal.pone.0218760

Chenna, A., Srinivas, B., & Nagaraju, S. (2021). Emotion And Sentiment Analysis From Twitter Text. Turkish Journal of Computer and Mathematics Education, 12(12), 4614–4620.

Fauzi, M. A. (2019). Word2Vec model for sentiment analysis of product reviews in Indonesian language. International Journal of Electrical and Computer Engineering (IJECE), 9(1), 525. https://doi.org/10.11591/ijece.v9i1.pp525-530

Fernandes.J, B., Bhargavi, Ch., Arshad, S., Kumar, S., & Sandeep, G. (2020). Emotion recognition in speech signals using optimization based multi-SVNN classifier. International Journal Of Scientific & Technology Research, 9(1), 3998–4001.

Friedman, R. (2023). Tokenization in the Theory of Knowledge. Encyclopedia, 3(1), 380–386. https://doi.org/10.3390/encyclopedia3010024

Graciyal, D. G., & Viswam, D. (2021). Social Media and Emotional Well-being: Pursuit of Happiness or Pleasure. Asia Pacific Media Educator, 31(1), 99–115. https://doi.org/10.1177/1326365X211003737

Gu, S., Wang, F., Patel, N. P., Bourgeois, J. A., & Huang, J. H. (2019). A model for basic emotions using observations of behavior in Drosophila. In Frontiers in Psychology (Vol. 10, Issue APR). Frontiers Media S.A. https://doi.org/10.3389/fpsyg.2019.00781

Gunawan, L., Anggreainy, M. S., Wihan, L., Santy, Lesmana, G. Y., & Yusuf, S. (2022). Support vector machine based emotional analysis of restaurant reviews. International Conference on Computer Science and Computational Intelligence, 216, 479–484. https://doi.org/10.1016/j.procs.2022.12.160

Hustinawaty, Dwiputra, R. A. A., & Rumambi, T. (2019). Public Sentiment Analysis Of Pasar Lama Tangerang Using K-Nearest Neighbor Method And Programming Language R. Jurnal Ilmiah Informatika Komputer, 24(2), 129–133. https://doi.org/10.35760/ik.2019.v24i2.2367

Iglesias, C. A., & Moreno, A. (2020). Sentiment Analysis for social media. In Applied Science (Special Issue). www.mdpi.com/journal/applsci

Irfan, M. R., Fauzi, M. A., Tibyani, T., & Mentari, N. D. (2018). Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-Nearest Neighbor. International Journal of Electrical and Computer Engineering (IJECE), 8(6), 5409. https://doi.org/10.11591/ijece.v8i6.pp5409-5414

Juluru, K., Shih, H. H., Murthy, K. N. K., & Elnajjar, P. (2021). Bag-of-words technique in natural language processing: A primer for radiologists. Radiographics, 41(5), 1420–1426. https://doi.org/10.1148/rg.2021210025

Junadhi, Agustin, Rifqi, M., & Anam, M. K. (2022). Sentiment Analysis of Online Lectures using K-Nearest Neighbors based on Feature Selection. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 11(3), 216–225. https://doi.org/10.23887/janapati.v11i3.51531

Kang, H., Yoo, S. J., & Han, D. (2012). Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications, 39(5), 6000–6010. https://doi.org/10.1016/j.eswa.2011.11.107

Kaur, R., & Bhardwaj, V. (2019). Gurmukhi Text Emotion Classification System using TF-IDF and N-gram Feature Set Reduced using APSO. International Journal on Emerging Technologies, 10(3), 352–362. www.researchtrend.net

Khattak, A., Asghar, M. Z., Ishaq, Z., Bangyal, W. H., & Hameed, I. A. (2021). Enhanced concept-level sentiment analysis system with expanded ontological relations for efficient classification of user reviews. Egyptian Informatics Journal, 22(4), 455–471. https://doi.org/10.1016/j.eij.2021.03.001

Kiran Kumar, P., & Kumar, I. (2021). Emotion detection and sentiment analysis of text. Proceedings of the International Conference on Innovative Computing & Communication (ICICC), 1–4. https://ssrn.com/abstract=3884914

Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 1(68). https://doi.org/10.3390/info10040150

Kusumawati, N., Maspupah, U., Sari, D. R., Hamzah, A., Lukito, D., & Dwi Saputra, D. (2022). based on Surat Keputusan Dirjen Risbang SK Nomor 85/M/KPT/2020 Comparing Algorithm For Sentiment Analysis In Healthcare And Social Security Agency (BPJS Kesehatan). Techno Nusa Mandiri?: Journal of Computing and Information Technology As an Accredited Journal Rank, 19(1). https://doi.org/10.33480/techno.v19i1.3167

Madhavan, M. V., Pande, S., Umekar, P., Mahore, T., & Kalyankar, D. (2021). Comparative analysis of detection of email spam with the aid of machine learning approaches. IOP Conference Series: Materials Science and Engineering, 1022(1). https://doi.org/10.1088/1757-899X/1022/1/012113

Menaouer, B., Zahra, A. F., & Mohammed, S. (2022). Multi-Class Sentiment Classification for Healthcare Tweets Using Supervised Learning Techniques. International Journal of Service Science, Management, Engineering, and Technology, 13(1), 1–23. https://doi.org/10.4018/ijssmet.298669

Mujahid, M., Lee, E., Rustam, F., Washington, P. B., Ullah, S., Reshi, A. A., & Ashraf, I. (2021). Sentiment analysis and topic modeling on tweets about online education during covid-19. Applied Sciences (Switzerland), 11(18). https://doi.org/10.3390/app11188438

Nugroho, K. S., Bachtiar, F. A., & Mahmudy, W. F. (2022). Detecting Emotion in Indonesian Tweets: A Term-Weighting Scheme Study. Journal of Information Systems Engineering and Business Intelligence, 8(1), 61–70. https://doi.org/10.20473/jisebi.8.1.61-70

Pace-Schott, E. F., Amole, M. C., Aue, T., Balconi, M., Bylsma, L. M., Critchley, H., Demaree, H. A., Friedman, B. H., Gooding, A. E. K., Gosseries, O., Jovanovic, T., Kirby, L. A. J., Kozlowska, K., Laureys, S., Lowe, L., Magee, K., Marin, M. F., Merner, A. R., Robinson, J. L., … VanElzakker, M. B. (2019). Physiological feelings. Neuroscience and Biobehavioral Reviews, 103, 267–304. https://doi.org/10.1016/j.neubiorev.2019.05.002

Pamuji, A. (2021). Performance of the K-Nearest Neighbors Method on Analysis of Social Media Sentiment. Juisi, 07(01), 32–37.

Pandian, M. N. R., & Balasubramani, M. (2020). An Efficient Hybrid Classification Algorithm For Heart Prediction In Data Mininig. European Journal of Molecular & Clinical Medicine, 7(4), 1946–1954.

Putra, R. S., Agustin, W., Anam, M. K., Lusiana, L., & Yaakub, S. (2022). The Application of Naïve Bayes Classifier Based Feature Selection on Analysis of Online Learning Sentiment in Online Media. Jurnal Transformatika, 20(1), 44. https://doi.org/10.26623/transformatika.v20i1.5144

Ramdani, C. M. S., Rachman, A. N., & Setiawan, R. (2022). Comparison of the Multinomial Naive Bayes Algorithm and Decision Tree with the Application of AdaBoost in Sentiment Analysis Reviews PeduliLindungi Application. International Journal of Information System & Technology Akreditasi, 6(4), 419–430. https://doi.org/10.30645/ijistech.v6i4.257

Rani, S., & Singh Gill, N. (2020). Hybrid Model For Twitter Data Sentiment Analysis Based On Ensemble Of Dictionary Based Classifier And Stacked Machine Learning Classifiers-SVM, KNN AND C5.0. Journal of Theoretical and Applied Information Technology, 29, 4. www.jatit.org

Rifai, W., & Winarko, E. (2019). Modification of Stemming Algorithm Using A Non Deterministic Approach To Indonesian Text. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(4), 379. https://doi.org/10.22146/ijccs.49072

Ritha, N., Hayaty, N., Matulatan, T., Uperiati, A., Rathomi, M., Bettiza, M., & Farasalsabila, F. (2023). Sentiment Analysis of Health Protocol Policy Using K-Nearest Neighbor and Cosine Similarity. ICSEDTI, 1–9. https://doi.org/10.4108/eai.11-10-2022.2326274

Saifullah, S., Fauziyah, Y., & Aribowo, A. S. (2021). Comparison of machine learning for sentiment analysis in detecting anxiety based on social media data. Jurnal Informatika, 15(1), 45. https://doi.org/10.26555/jifo.v15i1.a20111

Sailunaz, K., & Alhajj, R. (2019). Emotion and sentiment analysis from Twitter text. Journal of Computational Science, 36, 1–18. https://doi.org/10.1016/j.jocs.2019.05.009

Sajib, M. I., Shargo, S. M., & Hossain, Md. A. (2019). Comparison of the efficiency of Machine Learning algorithms on Twitter Sentiment Analysis of Pathao. International Conference on Computer and Information Technology, 1–6. https://doi.org/10.1109/ICCIT48885.2019.9038208

Samsir, Irmayani, D., Edi, F., Harahap, J. M., Jupriaman, Rangkuti, R. K., Ulya, B., & Watrianthos, R. (2021). Naives Bayes Algorithm for Twitter Sentiment Analysis. Journal of Physics: Conference Series, 1933(1). https://doi.org/10.1088/1742-6596/1933/1/012019

Santhosh Baboo, S., & Amirthapriya, M. (2022). Sentiment Analysis And Automatic Emotion Detection Analysis Of Twitter Using Machine Learning Classifiers. International Journal of Mechanical Engineering, 7(2), 1161–1171.

Sarimole, F. M., & Rosiana, A. (2022). Classification of Maturity Levels in Areca Fruit Based on HSV Image Using the KNN Method. Journal of Applied Engineering and Technological Science (JAETS), 4(1), 64–73. https://doi.org/10.37385/jaets.v4i1.951

Satyanarayana, K., Shankar, D., & Raju, D. (2021). An Approach For Finding Emotions Using Seed Dataset With Knn Classifier. In Turkish Journal of Computer and Mathematics Education (Vol. 12, Issue 10).

Uddin, S., Haque, I., Lu, H., Moni, M. A., & Gide, E. (2022). Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Scientific Reports, 12(1). https://doi.org/10.1038/s41598-022-10358-x

Jaya, R. T., & Wahyudi, T. (2022). Classification of Booster Vaccination Symptoms Using Naive Bayes Algorithm and C4.5. Journal of Applied Engineering and Technological Science (JAETS), 4(1), 131–138. https://doi.org/10.37385/jaets.v4i1.941

Wang, Y., Zhang, Y., Lu, Y., & Yu, X. (2020). A Comparative Assessment of Credit Risk Model Based on Machine Learning - a Case Study of Bank Loan Data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069

Downloads

Published

2023-06-12

How to Cite

Zamsuri, A., Defit, S., & Nurcahyo, G. W. (2023). Classification of Multiple Emotions in Indonesian Text Using The K-Nearest Neighbor Method. Journal of Applied Engineering and Technological Science (JAETS), 4(2), 1012–1021. https://doi.org/10.37385/jaets.v4i2.1964