Implementasi Machine Learning untuk Klasifikasi Email Spam Menggunakan Indobert, Hugging Face Transfomers dan Streamlit
DOI:
https://doi.org/10.59188/jurnalsostech.v6i1.32659Keywords:
deteksi email spam, klasifikasi teks, IndoBERT, deep learning, keamanan digitalAbstract
Perkembangan teknologi informasi menjadikan email sebagai media utama komunikasi modern, namun dominasi ini juga memunculkan tantangan serius berupa peningkatan email spam yang mengganggu produktivitas dan mengancam keamanan digital. Metode deteksi spam konvensional berbasis kata kunci dan aturan sederhana semakin tidak efektif karena tidak mampu mengikuti perkembangan pola bahasa spam yang dinamis dan kontekstual. Keterbatasan utama metode tersebut terletak pada ketidakmampuannya memahami makna semantik dalam teks email. Penelitian ini bertujuan merancang dan mengimplementasikan sistem deteksi email spam otomatis dan akurat menggunakan model bahasa modern. Metode yang diterapkan adalah klasifikasi teks biner berbasis deep learning. Proses penelitian meliputi pra-pemrosesan data untuk membersihkan dan menstandarkan teks email, tokenisasi menggunakan tokenizer IndoBERT, serta tahap klasifikasi dengan model IndoBERT yang di-fine-tune pada dataset email berbahasa Indonesia. Dataset dibagi ke dalam data latih, validasi, dan uji guna memastikan validitas serta kemampuan generalisasi model. Hasil evaluasi menunjukkan kinerja yang sangat baik, dengan akurasi sebesar 97% pada data uji, presisi 98%, dan recall 95% untuk kelas spam. Pengujian sistem secara end-to-end juga membuktikan keberhasilan implementasi model dalam skenario penggunaan nyata. Penelitian ini menyimpulkan bahwa pemanfaatan model bahasa lokal seperti IndoBERT merupakan pendekatan yang efektif dan andal untuk deteksi email spam, serta berpotensi menjadi dasar pengembangan sistem keamanan digital yang lebih canggih di masa mendatang.
References
Akraman, R., Candiwan, C., & Priyadi, Y. (2018). Pengukuran Kesadaran Keamanan Informasi Dan Privasi Pada Pengguna Smartphone Android Di Indonesia. JURNAL SISTEM INFORMASI BISNIS, 8(2). https://doi.org/10.21456/vol8iss2pp1-8
Assiroj, P., Kurnia, A., & Alam, S. (2023). The performance of Naïve Bayes, support vector machine, and logistic regression on Indonesia immigration sentiment analysis. Bulletin of Electrical Engineering and Informatics, 12(6). https://doi.org/10.11591/eei.v12i6.5688
Friska Aditia Indriyani, Ahmad Fauzi, & Sutan Faisal. (2023). Analisis sentimen aplikasi tiktok menggunakan algoritma naïve bayes dan support vector machine. TEKNOSAINS : Jurnal Sains, Teknologi Dan Informatika, 10(2). https://doi.org/10.37373/tekno.v10i2.419
Jáñez-Martino, F., Alaiz-Rodríguez, R., González-Castro, V., Fidalgo, E., & Alegre, E. (2023). A review of spam email detection: analysis of spammer strategies and the dataset shift problem. Artificial Intelligence Review, 56(2). https://doi.org/10.1007/s10462-022-10195-4
Jáñez-Martino, F., Alaiz-Rodríguez, R., González-Castro, V., Fidalgo, E., & Alegre, E. (2025). Spam email classification based on cybersecurity potential risk using natural language processing. Knowledge-Based Systems, 310. https://doi.org/10.1016/j.knosys.2024.112939
Karim, A., Azam, S., Shanmugam, B., Kannoorpatti, K., & Alazab, M. (2019). A comprehensive survey for intelligent spam email detection. In IEEE Access (Vol. 7). https://doi.org/10.1109/ACCESS.2019.2954791
Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. In Information (Switzerland) (Vol. 10, Number 4). https://doi.org/10.3390/info10040150
Kurniawan, N. D., Ferdian, P. R., & Hidayati, N. (2025). Analisis Sentimen Algoritma Naïve Bayes, Support Vector Machine, dan Random Forest Pada Ulasan Aplikasi Ajaib. Jurnal Nasional Teknologi Dan Sistem Informasi, 11(1). https://doi.org/10.25077/teknosi.v11i1.2025.87-97
Martani, B. A. C., & Budi Setiawan, E. (2022). Naïve Bayes-Support Vector Machine Combined BERT to Classified Big Five Personality on Twitter. Jurnal RESTI, 6(6). https://doi.org/10.29207/resti.v6i6.4378
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2022). Deep Learning-Based Text Classification. In ACM Computing Surveys (Vol. 54, Number 3). https://doi.org/10.1145/3439726
Perwira, R. I., Permadi, V. A., Purnamasari, D. I., & Agusdin, R. P. (2025). Domain-Specific Fine-Tuning of IndoBERT for Aspect-Based Sentiment Analysis in Indonesian Travel User-Generated Content. Journal of Information Systems Engineering and Business Intelligence, 11(1). https://doi.org/10.20473/jisebi.11.1.30-40
Putri, L. I., Ananta, G. P., & Syafa’at, I. (2024). Is the Problem Based Learning Using Media Puzzle Effective on Students’ Mathematical Connection Ability? Al Ibtida: Jurnal Pendidikan Guru MI, 11(2). https://doi.org/10.24235/al.ibtida.snj.v11i2.15048
Subowo, E. (2024). Implementasi Pembelajaran Mendalam dalam Klasifikasi Sentimen Ulasan Aplikasi: Evaluasi Model BERT, LSTM, dan CNN. Jurnal Surya Informatika, 14(2). https://doi.org/10.48144/suryainformatika.v14i2.1973
Talaat, A. S. (2023). Sentiment analysis classification system using hybrid BERT models. Journal of Big Data, 10(1). https://doi.org/10.1186/s40537-023-00781-w
Yefferson, D. Y., Lawijaya, V., & Girsang, A. S. (2024). Hybrid model: IndoBERT and long short-term memory for detecting Indonesian hoax news. IAES International Journal of Artificial Intelligence, 13(2). https://doi.org/10.11591/ijai.v13.i2.pp1913-1924
Yulianti, E., & Nissa, N. K. (2024). ABSA of Indonesian customer reviews using IndoBERT: single-sentence and sentence-pair classification approaches. Bulletin of Electrical Engineering and Informatics, 13(5). https://doi.org/10.11591/eei.v13i5.8032
Zhao, L., Gao, W., & Fang, J. (2024). Optimizing Large Language Models on Multi-Core CPUs: A Case Study of the BERT Model. Applied Sciences (Switzerland), 14(6). https://doi.org/10.3390/app14062364
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Riza Adrianti Supono, Muhammad Irgi Imani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA). that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.




