PGK Data Platform
DWH + Data Lake + Delta Lake — 8+ consumer products, Oracle TCO $19.8M → $0, vendor selection across 6 tool classes
What doesn't work
Company data fragmented across 5+ stores: SAP BW, Oracle IBD, Vertica, Cognos TM1, dozens of product databases. Product teams spent weeks searching for data. Oracle DBs deployed on unlicensed hypervisors — audit penalty $19.8M (833 unlicensed CPU cores). No unified business glossary or data catalog.
Architectural approach
Corporate data platform in 4 stages: DWH → Data Lake → Delta Lake → Data Gateway. Vendor selection across 6 tool classes (DWH, ETL, Data Catalog, Business Glossary, MDM, Data Quality). Unified data catalog with business glossary. MDM/RDM for master data. Migration from Oracle to license-compliant stack.
What made it hard
Discovering the $19.8M licensing risk — 833 unlicensed Oracle CPU cores on VMware — required immediate action while 8+ products depended on those databases. Vendor selection across 6 tool classes: every vendor promised 'everything out-of-box,' real validation required POC on live data. Migrating from SAP BW without stopping business reporting — data had to flow continuously.
My role & contribution
CTO / Technical Director
Initiated and led the Oracle migration. Personally conducted vendor selection across 6 tool classes (DWH, ETL, Data Catalog, Business Glossary, MDM, Data Quality). Designed the 4-stage migration architecture. Identified the $19.8M licensing risk (833 unlicensed CPU cores) and developed the remediation plan.
How it looks
Real screenshots
System architecture
How it works
Stage 0: prototyping business glossary, data catalog, ETL and DWH. Vendor selection: DWH comparison (Vertica vs Greenplum vs ClickHouse), ETL (Informatica vs NiFi vs AirFlow), MDM (Gartner MQ 2021). Stage 1: source consolidation, data marts for 8+ products (Optimizer, Navigator, Predictive Maintenance, Demand Forecasting, Sales Planning, PM). Stage 2: Data Quality and security. Stage 3: Data Lake (Hadoop/Cloudera), Delta Lake.
Why this way
4-stage migration instead of big bang
Simultaneous replacement of all data stores with new stack (big bang migration)
Big bang: 8+ products depend on data — simultaneous migration would paralyze business. Staged approach: each stage delivers measurable results, products migrate when ready.
Continuous product operation during migration. Each stage is a separate business case with ROI
Results
- 01
- Oracle TCO $19.8M → $0 (833 unlicensed CPU cores)
- 02
- 8+ products on unified DWH
- 03
- 4 implementation stages (0–3)
- 04
- Vendor selection across 6 tool classes
- 05
- Unified Data Gateway for all consumers
- 06
- Business glossary + data catalog + MDM/RDM
Impact on business
Eliminated $19.8M licensing risk — critical for a company with tens of billions in revenue. Vendor selection across 6 classes prevents platform choice errors. Reduced data onboarding from weeks to days. Foundation for all data-driven products (IBP, Predictive Maintenance, Navigator).
Algorithms & patterns
Technologies
- SAP BW
- Oracle
- Vertica
- Hadoop/Cloudera
- Apache Kafka
- AirFlow
- NiFi
- Informatica
- Delta Lake
- MDM/RDM
- Data Gateway