Analysis of SMS Spam Detection using Tf-Idf: A Study On SMS Spam Collection Dataset

Authors

  • Nesan Jaya Saputra President University, Indonesia

DOI:

https://doi.org/10.59188/jurnalsostech.v4i4.1214

Keywords:

SMS spam detection, TF-IDF analysis, classification metrics, SMS Spam Collection dataset

Abstract

This study explores the detection of SMS spam utilizing TF-IDF analysis on a dataset containing a collection of text messages labeled as spam or ham (non-spam). The dataset comprises messages suitable for spam detection analysis using TF-IDF techniques. The research aims to evaluate the effectiveness of TF-IDF in distinguishing between spam and spam (non-spam) messages. The analysis involves examining the precision, recall, and F1-score metrics to assess the performance of the classification model. The results demonstrate promising outcomes, with a high accuracy rate achieved in classifying spam and ham (non-spam) messages. Additionally, the study provides insights into the distribution of spam and ham (non-spam) labels in the test data, further enhancing our understanding of SMS spam detection techniques.

References

Abid, Muhammad Adeel, Saleem Ullah, Muhammad Abubakar Siddique, Muhammad Faheem Mushtaq, Wajdi Aljedaani, and Furqan Rustam. 2022. “Spam SMS Filtering Based on Text Features and Supervised Machine Learning Techniques.” Multimedia Tools and Applications 81(28):39853–71.

Artama, M., I. N. Sukajaya, and G. Indrawan. 2020. “Classification of Official Letters Using TF-IDF Method.” P. 12001 in Journal of Physics: Conference Series. Vol. 1516. IOP Publishing.

Cahyani, Denis Eka, and Irene Patasik. 2021. “Performance Comparison of Tf-Idf and Word2vec Models for Emotion Text Classification.” Bulletin of Electrical Engineering and Informatics 10(5):2780–88.

Chen, Chao, Jun Zhang, Xiao Chen, Yang Xiang, and Wanlei Zhou. 2015. “6 Million Spam Tweets: A Large Ground Truth for Timely Twitter Spam Detection.” Pp. 7065–70 in 2015 IEEE international conference on communications (ICC). IEEE.

Hosseinpour, Shaghayegh, and Hadi Shakibian. 2023. “An Ensemble Learning Approach for Sms Spam Detection.” Pp. 125–28 in 2023 9th International Conference on Web Research (ICWR). IEEE.

Julis, M. Rubin, and S. Alagesan. 2020. “Spam Detection in SMS Using Machine Learning through Textmining.” Internatıonal Journal Of Scıentıfıc & Technology Research 9(02).

Junior, Antonio P. Castro, Gabriel A. Wainer, and Wesley P. Calixto. 2022. “Weighting Construction by Bag-of-Words with Similarity-Learning and Supervised Training for Classification Models in Court Text Documents.” Applied Soft Computing 124:108987.

Lubis, A. Ridho, Mahyuddin K. M. Nasution, O. Salim Sitompul, and E. Muisa Zamzami. 2021. “The Effect of the TF-IDF Algorithm in Times Series in Forecasting Word on Social Media.” Indones. J. Electr. Eng. Comput. Sci 22(2):976.

Nahari, Galit, Tzachi Ashkenazi, Ronald P. Fisher, Pär‐Anders Granhag, Irit Hershkowitz, Jaume Masip, Ewout H. Meijer, Zvi Nisin, Nadav Sarid, and Paul J. Taylor. 2019. “‘Language of Lies’: Urgent Issues and Prospects in Verbal Lie Detection Research.” Legal and Criminological Psychology 24(1):1–23.

Pimpalkar, AMIT PURUSHOTTAM, and R. Jeberson Retna Raj. 2020. “Influence of Pre-Processing Strategies on the Performance of ML Classifiers Exploiting TF-IDF and BOW Features.” ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal 9(2):49.

Sykora, Martin, Suzanne Elayan, and Thomas W. Jackson. 2020. “A Qualitative Analysis of Sarcasm, Irony and Related# Hashtags on Twitter.” Big Data & Society 7(2):2053951720972735.

Teja Nallamothu, Phani, and Mohd Shais Khan. 2023. “Machine Learning for SPAM Detection.” Asian Journal of Advances in Research 6(1):167–79.

Downloads

Published

2024-04-30

How to Cite

Jaya Saputra, N. (2024). Analysis of SMS Spam Detection using Tf-Idf: A Study On SMS Spam Collection Dataset. Jurnal Sosial Teknologi, 4(4), 213–217. https://doi.org/10.59188/jurnalsostech.v4i4.1214