Machine learning models against fraud: a hybrid approach to protecting billions in transactions
DOI:
https://doi.org/10.47187/perf.v1i33.324Keywords:
Credit card fraud, Machine learning, Class imbalance, SMOTE and ADASYN, Voting Classifier, SHAPAbstract
Credit card fraud is a contemporary problem that significantly affects banking and consumers, reporting global losses of $33.5 billion for 2022, with an increasing trend over the years. This work addresses this issue through the implementation of machine learning models, focusing on the design, evaluation, and improvement of fraudulent transaction identification with high precision and accuracy.
The developed models faced a significant class imbalance, for which techniques such as SMOTE and ADASYN were implemented, improving the representation of the minority class corresponding to fraud cases. Additionally, Principal Component Analysis (PCA) was used to reduce dimensionality and optimize computational performance.
The results demonstrated that, in terms of scalability and adaptability, the neural network model exhibited excellent performance with large datasets. For hybrid models, Voting Classifier was implemented, achieving an optimal balance between adaptability, precision, and efficiency by combining the strengths of various models. The system's interpretability was enhanced through the implementation of SHAP, allowing for the explanation of model decisions in fraudulent transaction detection.
Downloads
References
Nilson Report. Card fraud losses reach $32.34 billion [Internet]. 2023. Available from: https://nilsonreport.com/article_archive_id=4161.
Asociación de Bancos Privados del Ecuador. Boletín macroeconómico - marzo 2024 [Internet]. Quito: ASOBANCA; 2024. Available from: https://asobanca.org.ec/wp-content/uploads/2024/03/Boletin-macroeconomico-Marzo-2024.pdf.
Asociación de Bancos Privados del Ecuador. La era de la banca digital en Ecuador [Internet]. Quito: ASOBANCA; 2023. Available from: https://asobanca.org.ec/wp-content/uploads/2023/07/La-era-de-la-banca-digital-en-Ecuador-2.pdf.
Abdallah A, Maarof MA, Zainal A. Fraud detection system: A survey. J Netw Comput Appl. 2016;68:90-113. Available from: https://doi.org/10.1016/j.jnca.2016.04.007.
Carcillo F, Le Borgne YA, Caelen O, Kessaci Y, Oblé F, Bontempi G. Combining unsupervised and supervised learning in credit card fraud detection. Inf Sci (Ny). 2021;557:317-31. Available from: https://doi.org/10.1016/j.ins.2019.05.042.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321-57. Available from: https://doi.org/10.1613/jair.953.
Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861-74. Available from: https://doi.org/10.1016/j.patrec.2005.10.010.
Dal Pozzolo A, Caelen O, Johnson RA, Bontempi G. Calibrating probability with undersampling for unbalanced classification. In: 2015 IEEE Symposium Series on Computational Intelligence. IEEE; 2015. p. 159-66. Available from: https://doi.org/10.1109/SSCI.2015.33.
Pozzolo AD, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl. 2014;41(10):4915-28. Available from: https://doi.org/10.1016/j.eswa.2014.02.026.
Jurgovsky J, Granitzer M, Ziegler K, Calabretto S, Portier PE, He-Guelton L, Caelen O. Sequence classification for credit-card fraud detection. Expert Syst Appl. 2018;100:234-45. Available from: https://doi.org/10.1016/j.eswa.2018.01.037.
Candès EJ, Li X, Ma Y, Wright J. Robust principal component analysis?. J ACM. 2011;58(3):1-37. Available from: https://doi.org/10.1145/1970392.1970395.
Lee CW, Fu MW, Wang CC, Azis MI. Evaluating machine learning algorithms for financial fraud detection: insights from Indonesia. Mathematics [Internet]. 2025;13(4):600. Available from: https://doi.org/10.3390/math13040600.
More A. Survey of resampling techniques for improving classification performance in unbalanced datasets [Preprint]. arXiv:1608.06048 [Internet]. 2016. Available from: https://arxiv.org/abs/1608.06048.
Zhu X, Wang H, Xu L, Li H. Predicting stock prices by using a hybrid model of ARIMA and KNN. Neural Comput Appl. 2019;31(8):3893-904. Available from: https://doi.org/10.1007/s00521-017-3288-x.
Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data. ACM; 2000. p. 93-104. Available from: https://doi.org/10.1145/342009.335388.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273-97. Available from: https://doi.org/10.1007/BF00994018.
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-44. Available from: https://doi.org/10.1038/nature14539.
Patra P, Vedansh S, Ved V, Singh A, Mishra S, Kumar A. A sampling-based logistic regression model for credit card fraud estimation. In: Swaroop A, Polkowski Z, Correia SD, Virdee B, editors. Proceedings of Data Analytics and Management. ICDAM 2023. Lecture Notes in Networks and Systems, vol 788. Singapore: Springer; 2023. p. 209-21. Available from: https://doi.org/10.1007/978-981-99-6553-3_16.
Mohammed U, Wajiga GM, Nata’ala A, Abdullahi BM. Comparative analysis of Random Forest and Logistic Regression models for detecting fraud in bank transactions based on performance metrics. Res J Pure Sci Technol. 2024;7(4):1-12. Available from: https://doi.org/10.56201/rjpst.v7.no4.2024.pg1.12.
Jose NN, Arigela AK, Vivekanandan G, Ravikumar S, Naganathan SBT, Venu N. Optimizing payment transaction security: utilizing gradient boosting machines for fraud detection. In: 2024 10th International Conference on Communication and Signal Processing (ICCSP); 2024 Apr; [ciudad]. Available from: https://doi.org/10.1109/ICCSP60870.2024.10543774.
Johnson P, et al. Scalable fraud detection systems using hybrid architectures. Appl Soft Comput. 2024;112:108872.
Zhang K, Wu L, Sun Y. Performance analysis of hybrid models in imbalanced datasets. Expert Syst Appl. 2023;185:115648.
Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13(2):281-305. Available from: https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf.
Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst. 2012. p. 2951-9. doi:10.48550/arXiv.1206.2944.
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence; 1995. p. 1137-45.
Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997;30(7):1145-59.
Ngai EWT, Hu Y, Wong YH, Chen Y, Sun X. The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decis Support Syst. 2011;50(3):559-69. Available from: https://doi.org/10.1016/j.dss.2010.08.006.
Bhattacharyya S, Jha S, Tharakunnel K, Westland JC. Data mining for credit card fraud: A comparative study. Decis Support Syst. 2011;50(3):602-13. Available from: https://doi.org/10.1016/j.dss.2010.08.008.
Pelegrina GD, Duarte LT, Grabisch M. A k-additive Choquet integral-based approach to approximate the SHAP values for local interpretability in machine learning [Preprint]. arXiv:2211.02166. 2022.
Kou Y, Lu CT, Sirwongwattana S, Huang YP. Survey of fraud detection techniques. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control; 2004 Mar 21-23; Taipei, Taiwan. Piscataway (NJ): IEEE; 2004. p. 749-54. doi:10.1109/ICNSC.2004.1297040.
Aleskerov E, Freisleben B, Rao B. CARDWATCH: A neural network-based database mining system for credit card fraud detection. In: Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering; 1997. p. 220-6. doi:10.1109/CIFER.1997.618940.
Phua C, Lee V, Smith K, Gayler R. A comprehensive survey of data mining-based fraud detection research [Preprint]. arXiv:1009.6119. 2010.
Lucas Y, Protopopescu A, Lemaire V, Velcin J, Sidibe A. Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Future Gener Comput Syst. 2020;102:393-402. Available from: https://doi.org/10.1016/j.future.2019.08.007.
Randhawa K, Jain S, Singh G. Credit card fraud detection using AdaBoost and majority voting. Procedia Comput Sci. 2018;132:1049-57. Available from: https://doi.org/10.1016/j.procs.2018.05.219.
West J, Bhattacharya M. Intelligent financial fraud detection: A comprehensive review. Comput Secur. 2016;57:47-66. Available from: https://doi.org/10.1016/j.cose.2015.09.005.
Jiang C, Song H, Wang J, Han Z, Li L. A hybrid fraud detection method in credit card transactions based on dynamic selection of base classifiers. Clust Comput. 2019;22(4):8353-68. Available from: https://doi.org/10.1007/s10586-017-1589-4.
Smith J, Brown R. Hybrid models in fraud detection: A comprehensive review. J Mach Learn Res. 2023;15(3):145-60.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30 (NIPS 2017); 2017 Dec 4-9; Long Beach, CA, USA. doi:10.48550/arXiv.1705.07874.
Wang Y, Chen H, Li X. Understanding financial fraud through explainable AI. IEEE Trans Neural Netw Learn Syst. 2023;34(2):892-903.
Martinez R, Thompson E. Computational efficiency in modern fraud detection systems. J Big Data. 2023;10(1):45-62.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.