Technical Architecture

Under the hood of the Oregon FQHC Landscape

Architecture
Python
CI/CD
Author

E. Pitzer

Published

December 4, 2025

Overview

This project represents a strategic shift from manual, GUI-based reporting (Tableau) to Code-First Analytics Engineering.

The goal was to build a self-healing, automated data product that tracks Health Center Service Delivery Sites in Oregon without requiring manual intervention.

Advanced Analytics & QA

Data Enrichment: Successfully ingested 2024 UDS Grantee Data via direct FOIA endpoint, implementing a robust pandas cleaning layer to handle mixed numeric/text columns in government Excel files.

Relational Joining: Engineered a deterministic join between HRSA geospatial data and UDS patient demographics using BHCMISID keys, achieving a 97% match rate (318/328 sites).

Automated Quality Assurance: Integrated pytest into the CI/CD pipeline. The system now enforces data integrity checks (e.g., Uninsured < Total Patients, Lat/Lon within Oregon bounds) before every deployment, preventing “silent failures” in production.

The Pipeline

The system is architected as a serverless ETL pipeline running on GitHub Actions.

%%{init: {'theme': 'base'}}%%
flowchart LR
    A[HRSA Data Warehouse] -->|HTTPS Request| B(Ingest Script)
    B -->|Raw .xlsx| C{Clean & Transform}
    C -->|Pandas| D[Oregon Subset .csv]
    D -->|Quarto| E[Static HTML Site]
    E -->|Deploy| F[GitHub Pages]
    
    style B fill:#e3f2fd
    style C fill:#e3f2fd
    style D fill:#e3f2fd
    style E fill:#e3f2fd
    
    classDef runner fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    class B,C,D,E runner

Automated Data Pipeline Architecture