Temporal-Semantic Fusion Network for Identification of Online Gambling and Child Exploitation Financial Transactions

TSFN Multimodal Fusion Illicit Financial Activities Cross-Modal Attention Financial Crime Detection

Authors

Vol. 7 No. 2 (2026): June
Research Articles

Downloads

This study addresses the detection of illicit digital payments, specifically online gambling and child exploitation, which are frequently hidden within legitimate transaction streams. The primary objective is to overcome the limitations of traditional rule-based systems and unimodal models that struggle with class imbalance and sophisticated evasion. It is proposed that the Temporal-Semantic Fusion Network (TSFN), a novel architecture integrating Temporal Convolutional Networks (TCN) for numerical sequences and FinBERT for semantic textual encoding. The key novelty is a bidirectional cross-modal attention mechanism that enables dynamic information exchange between behavioral patterns and transaction descriptions. Evaluated on 10,000 synthetic transactions, TSFN achieved a macro F1-score of 0.847, outperforming concatenation-based fusion by 6.5 percentage points (p < 0.001). Significant improvements were noted in minority classes, with F1-scores of 0.823 for gambling and 0.741 for exploitation, while maintaining a 99.4% precision rate on legitimate data. Ablation studies confirm that bidirectional attention allows the model to adaptively prioritize temporal features for gambling and semantic cues for exploitation. This research provides a robust framework for multimodal financial crime detection, offering a significant improvement in identifying complex illicit patterns compared to existing benchmarks.