MODELOS DE APRENDIZAJE AUTOMÁTICO CONTRA EL FRAUDE: UN ENFOQUE HÍBRIDO PARA PROTEGER BILLONES EN TRANSACCIONES

Josué Vladimir Galarza Tulcanazo; Pablo Andrés Trejo Tapia

doi:10.47187/perf.v1i33.324

Autores/as

Josué Vladimir Galarza Tulcanazo Universidad Central del Ecuador, Facultad de Ciencias Económicas, Estadística, Quito, Ecuador
Pablo Andrés Trejo Tapia Universidad Central del Ecuador, Facultad de Ciencias Económicas, Estadística, Quito, Ecuador

DOI:

https://doi.org/10.47187/perf.v1i33.324

Palabras clave:

Fraude con tarjetas de crédito, Aprendizaje automático, Desequilibrio de clases, SMOTE y ADASYN, Voting Classifier, SHAP

Resumen

El fraude con tarjetas de crédito es un problema contemporáneo que afecta significativamente a la banca y a los consumidores, reportando pérdidas globales de 33.500 millones de dólares para 2022, con una tendencia creciente a lo largo de los años. Este trabajo aborda esta problemática mediante la implementación de modelos de aprendizaje automático, enfocándose en el diseño, evaluación y mejora de la identificación de transacciones fraudulentas con alta precisión y exactitud.
Los modelos desarrollados enfrentaron un desequilibrio significativo en las clases, para lo cual se implementaron técnicas como SMOTE y ADASYN, que mejoraron la representación de la clase minoritaria correspondiente a los casos de fraude. Asimismo, se utilizó el Análisis de Componentes Principales (PCA) con el fin de reducir la dimensionalidad y optimizar el rendimiento computacional.
Los resultados demostraron que, en términos de escalabilidad y adaptabilidad, el modelo de redes neuronales exhibió un excelente desempeño con conjuntos de datos grandes. Para los modelos híbridos, se implementó Voting Classifier, logrando un equilibrio óptimo entre adaptabilidad, precisión y eficiencia mediante la combinación de las fortalezas de diversos modelos. La interpretabilidad del sistema se mejoró mediante la implementación de SHAP, permitiendo explicar las decisiones del modelo en la detección de transacciones fraudulentas.
Palabras claves: Fraude con tarjetas de crédito, Aprendizaje automático, Desequilibrio de clases, SMOTE y ADASYN, Voting Classifier, SHAP.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Nilson Report. Card fraud losses reach $32.34 billion [Internet]. 2023. Available from: https://nilsonreport.com/article_archive_id=4161.

Asociación de Bancos Privados del Ecuador. Boletín macroeconómico - marzo 2024 [Internet]. Quito: ASOBANCA; 2024. Available from: https://asobanca.org.ec/wp-content/uploads/2024/03/Boletin-macroeconomico-Marzo-2024.pdf.

Asociación de Bancos Privados del Ecuador. La era de la banca digital en Ecuador [Internet]. Quito: ASOBANCA; 2023. Available from: https://asobanca.org.ec/wp-content/uploads/2023/07/La-era-de-la-banca-digital-en-Ecuador-2.pdf.

Abdallah A, Maarof MA, Zainal A. Fraud detection system: A survey. J Netw Comput Appl. 2016;68:90-113. Available from: https://doi.org/10.1016/j.jnca.2016.04.007.

Carcillo F, Le Borgne YA, Caelen O, Kessaci Y, Oblé F, Bontempi G. Combining unsupervised and supervised learning in credit card fraud detection. Inf Sci (Ny). 2021;557:317-31. Available from: https://doi.org/10.1016/j.ins.2019.05.042.

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321-57. Available from: https://doi.org/10.1613/jair.953.

Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861-74. Available from: https://doi.org/10.1016/j.patrec.2005.10.010.

Dal Pozzolo A, Caelen O, Johnson RA, Bontempi G. Calibrating probability with undersampling for unbalanced classification. In: 2015 IEEE Symposium Series on Computational Intelligence. IEEE; 2015. p. 159-66. Available from: https://doi.org/10.1109/SSCI.2015.33.

Pozzolo AD, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl. 2014;41(10):4915-28. Available from: https://doi.org/10.1016/j.eswa.2014.02.026.

Jurgovsky J, Granitzer M, Ziegler K, Calabretto S, Portier PE, He-Guelton L, Caelen O. Sequence classification for credit-card fraud detection. Expert Syst Appl. 2018;100:234-45. Available from: https://doi.org/10.1016/j.eswa.2018.01.037.

Candès EJ, Li X, Ma Y, Wright J. Robust principal component analysis?. J ACM. 2011;58(3):1-37. Available from: https://doi.org/10.1145/1970392.1970395.

Lee CW, Fu MW, Wang CC, Azis MI. Evaluating machine learning algorithms for financial fraud detection: insights from Indonesia. Mathematics [Internet]. 2025;13(4):600. Available from: https://doi.org/10.3390/math13040600.

More A. Survey of resampling techniques for improving classification performance in unbalanced datasets [Preprint]. arXiv:1608.06048 [Internet]. 2016. Available from: https://arxiv.org/abs/1608.06048.

Zhu X, Wang H, Xu L, Li H. Predicting stock prices by using a hybrid model of ARIMA and KNN. Neural Comput Appl. 2019;31(8):3893-904. Available from: https://doi.org/10.1007/s00521-017-3288-x.

Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data. ACM; 2000. p. 93-104. Available from: https://doi.org/10.1145/342009.335388.

Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273-97. Available from: https://doi.org/10.1007/BF00994018.

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-44. Available from: https://doi.org/10.1038/nature14539.

Patra P, Vedansh S, Ved V, Singh A, Mishra S, Kumar A. A sampling-based logistic regression model for credit card fraud estimation. In: Swaroop A, Polkowski Z, Correia SD, Virdee B, editors. Proceedings of Data Analytics and Management. ICDAM 2023. Lecture Notes in Networks and Systems, vol 788. Singapore: Springer; 2023. p. 209-21. Available from: https://doi.org/10.1007/978-981-99-6553-3_16.

Mohammed U, Wajiga GM, Nata’ala A, Abdullahi BM. Comparative analysis of Random Forest and Logistic Regression models for detecting fraud in bank transactions based on performance metrics. Res J Pure Sci Technol. 2024;7(4):1-12. Available from: https://doi.org/10.56201/rjpst.v7.no4.2024.pg1.12.

Jose NN, Arigela AK, Vivekanandan G, Ravikumar S, Naganathan SBT, Venu N. Optimizing payment transaction security: utilizing gradient boosting machines for fraud detection. In: 2024 10th International Conference on Communication and Signal Processing (ICCSP); 2024 Apr; [ciudad]. Available from: https://doi.org/10.1109/ICCSP60870.2024.10543774.

Johnson P, et al. Scalable fraud detection systems using hybrid architectures. Appl Soft Comput. 2024;112:108872.

Zhang K, Wu L, Sun Y. Performance analysis of hybrid models in imbalanced datasets. Expert Syst Appl. 2023;185:115648.

Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13(2):281-305. Available from: https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf.

Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst. 2012. p. 2951-9. doi:10.48550/arXiv.1206.2944.

Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence; 1995. p. 1137-45.

Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997;30(7):1145-59.

Ngai EWT, Hu Y, Wong YH, Chen Y, Sun X. The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decis Support Syst. 2011;50(3):559-69. Available from: https://doi.org/10.1016/j.dss.2010.08.006.

Bhattacharyya S, Jha S, Tharakunnel K, Westland JC. Data mining for credit card fraud: A comparative study. Decis Support Syst. 2011;50(3):602-13. Available from: https://doi.org/10.1016/j.dss.2010.08.008.

Pelegrina GD, Duarte LT, Grabisch M. A k-additive Choquet integral-based approach to approximate the SHAP values for local interpretability in machine learning [Preprint]. arXiv:2211.02166. 2022.

Kou Y, Lu CT, Sirwongwattana S, Huang YP. Survey of fraud detection techniques. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control; 2004 Mar 21-23; Taipei, Taiwan. Piscataway (NJ): IEEE; 2004. p. 749-54. doi:10.1109/ICNSC.2004.1297040.

Aleskerov E, Freisleben B, Rao B. CARDWATCH: A neural network-based database mining system for credit card fraud detection. In: Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering; 1997. p. 220-6. doi:10.1109/CIFER.1997.618940.

Phua C, Lee V, Smith K, Gayler R. A comprehensive survey of data mining-based fraud detection research [Preprint]. arXiv:1009.6119. 2010.

Lucas Y, Protopopescu A, Lemaire V, Velcin J, Sidibe A. Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Future Gener Comput Syst. 2020;102:393-402. Available from: https://doi.org/10.1016/j.future.2019.08.007.

Randhawa K, Jain S, Singh G. Credit card fraud detection using AdaBoost and majority voting. Procedia Comput Sci. 2018;132:1049-57. Available from: https://doi.org/10.1016/j.procs.2018.05.219.

West J, Bhattacharya M. Intelligent financial fraud detection: A comprehensive review. Comput Secur. 2016;57:47-66. Available from: https://doi.org/10.1016/j.cose.2015.09.005.

Jiang C, Song H, Wang J, Han Z, Li L. A hybrid fraud detection method in credit card transactions based on dynamic selection of base classifiers. Clust Comput. 2019;22(4):8353-68. Available from: https://doi.org/10.1007/s10586-017-1589-4.

Smith J, Brown R. Hybrid models in fraud detection: A comprehensive review. J Mach Learn Res. 2023;15(3):145-60.

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30 (NIPS 2017); 2017 Dec 4-9; Long Beach, CA, USA. doi:10.48550/arXiv.1705.07874.

Wang Y, Chen H, Li X. Understanding financial fraud through explainable AI. IEEE Trans Neural Netw Learn Syst. 2023;34(2):892-903.

Martinez R, Thompson E. Computational efficiency in modern fraud detection systems. J Big Data. 2023;10(1):45-62.