Analysis of SMS Spam Detection using Tf-Idf: A Study On SMS Spam Collection Dataset
DOI:
https://doi.org/10.59188/jurnalsostech.v4i4.1214Keywords:
SMS spam detection, TF-IDF analysis, classification metrics, SMS Spam Collection datasetAbstract
This study explores the detection of SMS spam utilizing TF-IDF analysis on a dataset containing a collection of text messages labeled as spam or ham (non-spam). The dataset comprises messages suitable for spam detection analysis using TF-IDF techniques. The research aims to evaluate the effectiveness of TF-IDF in distinguishing between spam and spam (non-spam) messages. The analysis involves examining the precision, recall, and F1-score metrics to assess the performance of the classification model. The results demonstrate promising outcomes, with a high accuracy rate achieved in classifying spam and ham (non-spam) messages. Additionally, the study provides insights into the distribution of spam and ham (non-spam) labels in the test data, further enhancing our understanding of SMS spam detection techniques.
References
Abid, Muhammad Adeel, Saleem Ullah, Muhammad Abubakar Siddique, Muhammad Faheem Mushtaq, Wajdi Aljedaani, and Furqan Rustam. 2022. “Spam SMS Filtering Based on Text Features and Supervised Machine Learning Techniques.” Multimedia Tools and Applications 81(28):39853–71.
Artama, M., I. N. Sukajaya, and G. Indrawan. 2020. “Classification of Official Letters Using TF-IDF Method.” P. 12001 in Journal of Physics: Conference Series. Vol. 1516. IOP Publishing.
Cahyani, Denis Eka, and Irene Patasik. 2021. “Performance Comparison of Tf-Idf and Word2vec Models for Emotion Text Classification.” Bulletin of Electrical Engineering and Informatics 10(5):2780–88.
Chen, Chao, Jun Zhang, Xiao Chen, Yang Xiang, and Wanlei Zhou. 2015. “6 Million Spam Tweets: A Large Ground Truth for Timely Twitter Spam Detection.” Pp. 7065–70 in 2015 IEEE international conference on communications (ICC). IEEE.
Hosseinpour, Shaghayegh, and Hadi Shakibian. 2023. “An Ensemble Learning Approach for Sms Spam Detection.” Pp. 125–28 in 2023 9th International Conference on Web Research (ICWR). IEEE.
Julis, M. Rubin, and S. Alagesan. 2020. “Spam Detection in SMS Using Machine Learning through Textmining.” Internatıonal Journal Of Scıentıfıc & Technology Research 9(02).
Junior, Antonio P. Castro, Gabriel A. Wainer, and Wesley P. Calixto. 2022. “Weighting Construction by Bag-of-Words with Similarity-Learning and Supervised Training for Classification Models in Court Text Documents.” Applied Soft Computing 124:108987.
Lubis, A. Ridho, Mahyuddin K. M. Nasution, O. Salim Sitompul, and E. Muisa Zamzami. 2021. “The Effect of the TF-IDF Algorithm in Times Series in Forecasting Word on Social Media.” Indones. J. Electr. Eng. Comput. Sci 22(2):976.
Nahari, Galit, Tzachi Ashkenazi, Ronald P. Fisher, Pär‐Anders Granhag, Irit Hershkowitz, Jaume Masip, Ewout H. Meijer, Zvi Nisin, Nadav Sarid, and Paul J. Taylor. 2019. “‘Language of Lies’: Urgent Issues and Prospects in Verbal Lie Detection Research.” Legal and Criminological Psychology 24(1):1–23.
Pimpalkar, AMIT PURUSHOTTAM, and R. Jeberson Retna Raj. 2020. “Influence of Pre-Processing Strategies on the Performance of ML Classifiers Exploiting TF-IDF and BOW Features.” ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal 9(2):49.
Sykora, Martin, Suzanne Elayan, and Thomas W. Jackson. 2020. “A Qualitative Analysis of Sarcasm, Irony and Related# Hashtags on Twitter.” Big Data & Society 7(2):2053951720972735.
Teja Nallamothu, Phani, and Mohd Shais Khan. 2023. “Machine Learning for SPAM Detection.” Asian Journal of Advances in Research 6(1):167–79.
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Jurnal Sosial Teknologi
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA). that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.