Temporal-Semantic Fusion Network for Identification of Online Gambling and Child Exploitation Financial Transactions
Downloads
This study addresses the detection of illicit digital payments, specifically online gambling and child exploitation, which are frequently hidden within legitimate transaction streams. The primary objective is to overcome the limitations of traditional rule-based systems and unimodal models that struggle with class imbalance and sophisticated evasion. It is proposed that the Temporal-Semantic Fusion Network (TSFN), a novel architecture integrating Temporal Convolutional Networks (TCN) for numerical sequences and FinBERT for semantic textual encoding. The key novelty is a bidirectional cross-modal attention mechanism that enables dynamic information exchange between behavioral patterns and transaction descriptions. Evaluated on 10,000 synthetic transactions, TSFN achieved a macro F1-score of 0.847, outperforming concatenation-based fusion by 6.5 percentage points (p < 0.001). Significant improvements were noted in minority classes, with F1-scores of 0.823 for gambling and 0.741 for exploitation, while maintaining a 99.4% precision rate on legitimate data. Ablation studies confirm that bidirectional attention allows the model to adaptively prioritize temporal features for gambling and semantic cues for exploitation. This research provides a robust framework for multimodal financial crime detection, offering a significant improvement in identifying complex illicit patterns compared to existing benchmarks.
Downloads
[1] Ozili, P. K. (2018). Impact of digital finance on financial inclusion and stability. Borsa Istanbul Review, 18(4), 329–340. doi:10.1016/j.bir.2017.12.003.
[2] Gomber, P., Koch, J. A., & Siering, M. (2017). Digital Finance and FinTech: current research and future research directions. Journal of Business Economics, 87(5), 537–580. doi:10.1007/s11573-017-0852-x.
[3] Laxman, V., Ramesh, N., Jaya Prakash, S. K., & Aluvala, R. (2024). Emerging threats in digital payment and financial crime: A bibliometric review. Journal of Digital Economy, 3, 205–222. doi:10.1016/j.jdec.2025.04.002.
[4] Europol. (2021). Europol: Internet Organised Crime Threat Assessment (IOCTA) 2021. Computer Fraud & Security, 2021(12), 4. doi:10.1016/s1361-3723(21)00125-1.
[5] Statista. (2025). Online travel market size worldwide from 2017 to 2024, with a forecast until 2030. Statista, Hamburg, Germany. Available online: https://www.statista.com/markets/420/topic/493/leisure-travel/#statistic3 (accessed on May 2026).
[6] NCMEC. (2023). CyberTipline 2021 Report. National Center for Missing & Exploited Children, Virginia, United States. Available online: https://www.missingkids.org/content/dam/missingkids/pdfs/2021-CyberTipline-Report.pdf (accessed on May 2026).
[7] Correa Bahnsen, A., Aouada, D., Stojanovic, A., & Ottersten, B. (2016). Feature engineering strategies for credit card fraud detection. Expert Systems with Applications, 51, 134–142. doi:10.1016/j.eswa.2015.12.030.
[8] Dal Pozzolo, A., Caelen, O., Le Borgne, Y. A., Waterschoot, S., & Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10), 4915–4928. doi:10.1016/j.eswa.2014.02.026.
[9] Shenvi, P., Samant, N., Kumar, S., & Kulkarni, V. (2019). Credit Card Fraud Detection using Deep Learning. IEEE 5th International Conference for Convergence in Technology, I2CT 2019, 1–5. doi:10.1109/I2CT45611.2019.9033906.
[10] Lucas, Y., Portier, P. E., Laporte, L., He-Guelton, L., Caelen, O., Granitzer, M., & Calabretto, S. (2020). Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Future Generation Computer Systems, 102, 393–402. doi:10.1016/j.future.2019.08.029.
[11] He, Y., & Zhao, J. (2019). Temporal Convolutional Networks for Anomaly Detection in Time Series. Journal of Physics: Conference Series, 1213(4), 42050. doi:10.1088/1742-6596/1213/4/042050.
[12] Alaygut, T., & Sefer, E. (2025). Financial Statement Fraud Detection with a Categorical-to-Numerical Data Representation. ICAIF 2025 - 6th ACM International Conference on AI in Finance, 62–70. doi:10.1145/3768292.3770372.
[13] Wang, G., Ma, J., & Chen, G. (2023). Attentive statement fraud detection: Distinguishing multimodal financial data with fine-grained attention. Decision Support Systems, 167, 113913. doi:10.1016/j.dss.2022.113913.
[14] Passas, N. (2025). Cryptocurrencies, Blockchain, and Financial Crimes. International Journal of Criminology and Sociology, 14, 76–89. doi:10.6000/1929-4409.2025.14.08.
[15] Chen, Y., Zhao, C., Xu, Y., Nie, C., & Zhang, Y. (2025). Deep Learning in Financial Fraud Detection: Innovations, Challenges, and Applications. Data Science and Management, 1-48. doi:10.1016/j.dsm.2025.08.002.
[16] Polleti, G., Santana, M., & Fontes, E. (2025). Open Banking Foundational Model: Learning Language Representations from Few Financial Transactions. arXiv preprint arXiv:2511.12154. doi:10.48550/arXiv.2511.12154.
[17] Andersson, S., Carlbringorcid, P., Lyonorcid, K., Bermell, M., & Lindner, P. (2025). Insights into the temporal dynamics of identifying problem gambling on an online casin0: A machine learning study on routinely collected individual account data. Journal of Behavioral Addictions, 14(1), 490–500. doi:10.1556/2006.2025.00013.
[18] Zhang, Z., Han, D., Wu, S., Sun, W., & Shi, S. (2025). Identification and Detection of Illegal Gambling Websites and Analysis of User Behavior. Computer Science and Information Systems, 22(3), 859–879. doi:10.2298/CSIS240930019Z.
[19] Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv Preprint, arXiv:1908.10063. doi:10.48550/arXiv.1908.10063
[20] Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602–613. doi:10.1016/j.dss.2010.08.008.
[21] Wu, B., Chao, K. M., & Li, Y. (2024). Heterogeneous graph neural networks for fraud detection and explanation in supply chain finance. Information Systems, 121, 102335. doi:10.1016/j.is.2023.102335.
[22] Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2017). Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision, 2017-October, 2999–3007. doi:10.1109/ICCV.2017.324.
[23] Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models from Natural Language Supervision. Proceedings of Machine Learning Research, 139, 8748–8763.
[24] Li, J., Li, D., Xiong, C., & Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. International Conference on Machine Learning, 12888–12900.
[25] Gadzicki, K., Khamsehashari, R., & Zetzsche, C. (2020). Early vs late fusion in multimodal convolutional neural networks. Proceedings of 2020 23rd International Conference on Information Fusion, FUSION 2020, 1-6. doi:10.23919/FUSION45008.2020.9190246.
[26] Yang, Y., Uy, M. C. S., & Huang, A. (2020). FINBERT: A pretrained language model for financial communications. arXiv Preprint, arXiv:2006.08097. doi:10.48550/arXiv.2006.08097.
[27] Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271. doi:10.48550/arXiv.1803.01271.
[28] Sezer, O. B., Gudelek, M. U., & Ozbayoglu, A. M. (2020). Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Applied Soft Computing Journal, 90, 106181. doi:10.1016/j.asoc.2020.106181.
[29] Cheng, D., Xiang, S., Shang, C., Zhang, Y., Yang, F., & Zhang, L. (2020). Spatio-temporal attention-based neural network for credit card fraud detection. AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 34(01), 362–369. doi:10.1609/aaai.v34i01.5371.
[30] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, 4171–4186.
[31] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-December, 5999–6009. doi:10.1201/9781003561460-19.
[32] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv Preprint, arXiv:1409.0473. doi:10.48550/arXiv.1409.0473.
[33] Tan, H., & Bansal, M. (2019). LXMert: Learning cross-modality encoder representations from transformers. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 5100–5111. doi:10.18653/v1/D19-1514.
[34] Nagrani, A., Yang, S., Arnab, A., Jansen, A., Schmid, C., & Sun, C. (2021). Attention Bottlenecks for Multimodal Fusion. Advances in Neural Information Processing Systems, 17, 14200–14213.
[35] Lu, J., Batra, D., Parikh, D., & Lee, S. (2019). Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in Neural Information Processing Systems, 32.
[36] Zadeh, A., Liang, P. P., Vanbriesen, J., Poria, S., Tong, E., Cambria, E., Chen, M., & Morency, L. P. (2018). Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 1, 2236-2246. doi:10.18653/v1/p18-1208.
[37] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv Preprint, arXiv:2010.11929. doi:10.48550/arXiv.2010.11929.
[38] Morgado, P., Li, Y., & Vasconcelos, N. (2020). Learning representations from audio-visual spatial alignment. Advances in Neural Information Processing Systems, 2020-December, 33, 4733–4744.
[39] Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., & Krishnan, D. (2020). Supervised contrastive learning. Advances in Neural Information Processing Systems, 2020-December, 33, 18661–18673.
[40] Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv Preprint, arXiv:1711.05101. doi:10.48550/arXiv.1711.05101.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.





















