A Governed Lakehouse DataOps Architecture: Design and Evaluation in Healthcare

DataOps Agile Data Management Data Driven Enterprise Data Governance Medallion Lakehouse Data Pipeline Healthcare

Authors

Vol. 7 No. 2 (2026): June
Research Articles

Downloads

Healthcare organizations increasingly require secure, governed and AI-ready data pipelines capable of handling heterogeneous and sensitive data sources. This study aims to design and evaluate a unified DataOps reference architecture that operationalizes the full data lifecycle through a governed Medallion Lakehouse model. Methodologically, the proposed architecture integrates data-centric CI/CD, Infrastructure as Code, workflow orchestration, governance and metadata management, monitoring, and explicit promotion contracts across Bronze, Silver, and Gold layers. The framework was implemented and evaluated in a controlled healthcare testbed using approximately 3.5 GB of multi-source clinical data over a 25-day workload. The findings show that the proposed architecture achieved a DataOps Operational Excellence Index (DOEI) of 0.92, an ingestion throughput of approximately 100 MB/s, a data quality score of 97.87% and a 72% reduction in infrastructure provisioning time, from 3 hours to 50 minutes. The main novelty of this work lies in combining a governed Lakehouse-based DataOps architecture with explicit promotion contracts and a composite benchmarking index for assessing operational maturity. This improvement provides a reproducible, auditable, and scalable framework for secure data operations in regulated environments such as healthcare.