Product Codefication Accuracy With Cosine Similarity And Weighted Term Frequency And Inverse Document Frequency (TF-IDF)
DOI:
https://doi.org/10.37385/jaets.v2i2.210Keywords:
TF-IDF, Cosine Similarity, Term Frequency, Invers Document Frequency, Search AccuracyAbstract
In the SiPaGa application, the codefication search process is still inaccurate, so OPD often make mistakes in choosing goods codes. So we need Cosine Similarity and TF-IDF methods that can improve the accuracy of the search. Cosine Similarity is a method for calculating similarity by using keywords from the code of goods. Term Frequency and Inverse Document (TFIDF) is a way to give weight to a one-word relationship (term). The purpose of this research is to improve the accuracy of the search for goods codification. Codification of goods processed in this study were 14,417 data sourced from the Goods and Price Planning Information System (SiPaGa) application database. The search keywords were processed using the Cosine Similarity method to see the similarities and using TF-IDF to calculate the weighting. This research produces the calculation of cosine similarity and TF-IDF weighting and is expected to be applied to the SiPaGa application so that the search process on the SiPaGa application is more accurate than before. By using the cosine sismilarity algorithm and TF-IDF, it is hoped that it can improve the accuracy of the search for product codification. So that OPD can choose the product code as desired
Downloads
References
Amrizal, V. (2018). Penerapan Metode Term Frequency Inverse Document Frequency (Tf-Idf) Dan Cosine Similarity Pada Sistem Temu Kembali Informasi Untuk Mengetahui Syarah Hadits Berbasis Web (Studi Kasus: Hadits Shahih Bukhari-Muslim). Jurnal Teknik Informatika. https://doi.org/10.15408/jti.v11i2.8623
Arroyo-Fernández, I., Méndez-Cruz, C. F., Sierra, G., Torres-Moreno, J. M., & Sidorov, G. (2019). Unsupervised sentence representations as word information series: Revisiting TF–IDF. Computer Speech and Language. https://doi.org/10.1016/j.csl.2019.01.005
Charlet, D., & Damnati, G. (2018). SimBow at SemEval-2017 Task 3: Soft-Cosine Semantic Similarity between Questions for Community Question Answering. https://doi.org/10.18653/v1/s17-2051
Deviyanto, A., & Wahyudi, M. D. R. (2018). PENERAPAN ANALISIS SENTIMEN PADA PENGGUNA TWITTER MENGGUNAKAN METODE K-NEAREST NEIGHBOR. JISKA (Jurnal Informatika Sunan Kalijaga). https://doi.org/10.14421/jiska.2018.31-01
Hafeez, S., & Patil, B. (2017). Using Explicit Semantic Similarity for an Improved Web Explorer with ontology and TF-IDF. International Journal Of Advance Scientific Research And Engineering Trends Using.
Kharismadita, P., & Rahutomo, F. (2017). Implementasi Tokenizing Plus Pada Sistem Pendeteksi Kemiripan Jurnal SkripsI. Jurnal Informatika Polinema, 2(1), 24. https://doi.org/10.33795/jip.v2i1.50
Kim, S. W., & Gil, J. M. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-Centric Computing and Information Sciences. https://doi.org/10.1186/s13673-019-0192-7
Luo, C., Zhan, J., Xue, X., Wang, L., Ren, R., & Yang, Q. (2018). Cosine normalization: Using cosine similarity instead of dot product in neural networks. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-030-01418-6_38
Naf’an, M. Z., Burhanuddin, A., & Riyani, A. (2019). Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen. Jurnal Linguistik Komputasional (JLK). https://doi.org/10.26418/jlk.v2i1.17
Nkisi-Orji, I., Wiratunga, N., Massie, S., Hui, K. Y., & Heaven, R. (2019). Ontology alignment based on word embedding and random forest classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-030-10925-7_34
Nurdiansyah, Y., Andrianto, A., & Kamshal, L. (2019). New book classification based on Dewey Decimal Classification (DDC) law using tf-idf and cosine similarity method. Journal of Physics: Conference Series. https://doi.org/10.1088/1742-6596/1211/1/012044
Putra, Randi Rian, C. W. (2018). IMPLEMENTASI DATA MINING PEMILIHAN PELANGGAN POTENSIAL MENGGUNAKANة. IEEE Communications Surveys and Tutorials. https://doi.org/10.1109/COMST.2015.2457491
Putra, R. R., Wadisman, C., Sains, F., Teknologi, D., Pembangunan, U., & Medan, P. B. (2018). IMPLEMENTASI DATA MINING PEMILIHAN PELANGGAN POTENSIAL MENGGUNAKAN ALGORITMA K-MEANS IMPLEMENTATION OF DATA MINING FOR POTENTIAL CUSTOMER SELECTION USING K-MEANS ALGORITHM. Journal of Information Technology and Computer Science.
Rozeva, A., & Zerkova, S. (2017). Assessing semantic similarity of texts - Methods and algorithms. AIP Conference Proceedings. https://doi.org/10.1063/1.5014006
Sejati, F. B., Hendradi, P., & Pujiarto, B. (2019). Deteksi Plagiarisme Karya Ilmiah Dengan Pemanfaatan Daftar Pustaka Dalam Pencarian Kemiripan Tema Menggunakan Metode Cosine Similarity (Studi Kasus: Di Universitas Muhammadiyah Magelang). Jurnal Komtika. https://doi.org/10.31603/komtika.v2i2.2594
Siregar, R. R. A., Sinaga, F. A., & Arianto, R. (2017). Aplikasi Penentuan Dosen Penguji Skripsi Menggunakan Metode TF-IDF dan Vector Space Model. Computatio : Journal of Computer Science and Information Systems. https://doi.org/10.24912/computatio.v1i2.1014
Thongtan, T., & Phienthrakul, T. (2019). Sentiment classification using document embeddings trained with cosine similarity. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop. https://doi.org/10.18653/v1/p19-2057
Yasni, L., Subroto, I. M. I., & Haviana, S. F. C. (2018). Implementasi Cosine Similarity Matching Dalam Penentuan Dosen Pembimbing Tugas Akhir. Transmisi. https://doi.org/10.14710/transmisi.20.1.22-28
Zhu, Z., Liang, J., Li, D., Yu, H., & Liu, G. (2019). Hot Topic Detection Based on a Refined TF-IDF Algorithm. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2893980