PGK Digital Platform
₽93.4M budget, ₽397M savings over 5 years — 26 products, 21 infrastructure tools, DevOps from 1.75 to 5 products per engineer
What doesn't work
Russia's largest private freight railcar operator (140K+ railcars) suffered from duplication: 45 of 83 services had functional clones across products. A DevOps engineer handled only 1.75 products. Time-to-Market for a new product — 16 weeks. No reusable services, unified standards, or DSML infrastructure for Data Science.
Architectural approach
Designed and launched a unified digital platform of 26 products: reusable services (service mesh), unified development standards, 21 infrastructure tools, DSML platform for Data Science (JupyterHub, MLFlow, AirFlow, DVC). Architecture control process for duplication — every new service checked against existing ones.
What made it hard
Political resistance: every product team considered their stack unique and resisted platform migration. Had to prove savings with TCO numbers to the Board. Audit of 83 services revealed 45 duplicates, but each owner insisted 'their version is better.' Running 26 products in parallel with a limited DevOps team — prioritization by utilization, not by who shouts loudest.
My role & contribution
CTO / Technical Director
Personally audited 83 services and identified 45 duplicates. Selected the stack of 21 infrastructure tools. Designed the DSML platform (JupyterHub, MLFlow, AirFlow, DVC). Developed the 5-year TCO model. Led the architecture team. Presented results to the Board of Directors.
How it looks
Real screenshots
System architecture
How it works
Audit of 83 services → identified 45 duplicates → target reuse process. Unified 21 infrastructure tools (Zabbix, ELK, Prometheus, Grafana, ArgoCD, Vault, Rancher, Nexus, Kafka, PostgreSQL PRO, Ansible, Terraform, GitLab CI/CD, Sentry). DSML platform: JupyterHub + MLFlow + AirFlow + DVC + Gurobi. DevSecOps: CheckMarx (SAST), SIEM, Infowatch. Testing: JMeter, Selenium, Loadrunner, TestIT. 4 phases: prototyping → MVP → scaling → rollout.
Why this way
Platform approach instead of product autonomy
Each product team chooses its own stack and infrastructure independently
Autonomy: 45 of 83 services turned out to be duplicates. DevOps spent time on unique configurations instead of scaling. Infrastructure costs grew linearly with each product.
26 products on unified platform with 21 standardized tools. T2M from 16 to 4 weeks. DevOps from 1.75 to 5.0 products per engineer
Results
- 01
- ₽397M savings over 5 years TCO (2022→2026)
- 02
- 26 products on platform, 21 infrastructure tools
- 03
- DevOps: 1.75 → 2.1 → 5.0 products per engineer
- 04
- Time-to-Market: 16 → 14 → 4 weeks
- 05
- ₽172M infrastructure savings, ₽65M from service reuse
- 06
- ₽80.5M DevOps savings, ₽38M from DSML centralization
- 07
- ₽41.5M other effects (TTM acceleration, quality, risk reduction)
Impact on business
₽93.4M budget against ₽397M TCO savings over 5 years (2022→2026). DevOps utilization grew from 1.75 to 5.0 products per engineer (₽80.5M savings). T2M reduced from 16 to 4 weeks. 45 of 83 duplicate services eliminated. DSML platform (JupyterHub, MLFlow, AirFlow) saved ₽38M on DS infrastructure centralization.
Algorithms & patterns
Technologies
- Python
- TypeScript
- NodeJS
- FastAPI
- Angular
- Vue.js
- PostgreSQL PRO
- Apache Kafka
- S3 Minio
- AirFlow
- MLFlow
- JupyterHub
- DVC
- Gurobi
- Docker
- Rancher
- ArgoCD
- Ansible
- GitLab CI/CD
- Nexus
- Zabbix
- Grafana
- Prometheus
- ELK
- Sentry
- Hashicorp Vault
- CheckMarx
- JMeter
- Selenium
- TestIT