A Novel Hybrid ViT-CNN Approach for Pneumonia and Lung Opacity Detection in X-Ray Images

Automated Medical Imaging Chest X-Ray Classification Convolutional Neural Network Grad-CAM Visualization Hybrid Deep Learning Architecture

Authors

Downloads

In order to automatically classify chest X-ray pictures into three diagnostic categories—Normal, Lung Opacity, and Viral Pneumonia—this study presents a novel hybrid deep learning architecture that combines the Vision Transformer (ViT) with a Convolutional Neural Network (CNN). The suggested model successfully addresses the drawbacks of single-architecture systems by fusing the ResNet-18 CNN's expertise in local texture analysis with the ViT's global feature representation capability. According to experimental assessments, the hybrid ViT-CNN architecture outperforms the state-of-the-art methods, achieving 94.2% classification accuracy with precision, recall, and F1-scores continuously above 94% for the majority of categories. Even in complicated situations where traditional methods usually falter, like distinguishing between lung opacity and normal patients, the model exhibits strong performance. Additionally, it performs well in discrimination, with AUC values above 0.95 in every class. The system is ideal for real-time clinical deployment because it maintains a high computational efficiency, generating conclusions in about 0.0012 seconds per image. Grad-CAM visualization makes it evident which areas of the image are important for making diagnostic decisions, hence validating the model's interpretability. All things considered, this work establishes a new benchmark for chest X-ray classification performance and offers a useful foundation for automated diagnostic assistance in resource-constrained healthcare settings.