Skip to content
← All casesPlatform EngineeringEnterprise

PGK Digital Platform

₽93.4M budget, ₽397M savings over 5 years — 26 products, 21 infrastructure tools, DevOps from 1.75 to 5 products per engineer

Problem

What doesn't work

Russia's largest private freight railcar operator (140K+ railcars) suffered from duplication: 45 of 83 services had functional clones across products. A DevOps engineer handled only 1.75 products. Time-to-Market for a new product — 16 weeks. No reusable services, unified standards, or DSML infrastructure for Data Science.

Solution

Architectural approach

Designed and launched a unified digital platform of 26 products: reusable services (service mesh), unified development standards, 21 infrastructure tools, DSML platform for Data Science (JupyterHub, MLFlow, AirFlow, DVC). Architecture control process for duplication — every new service checked against existing ones.

Challenges

What made it hard

Political resistance: every product team considered their stack unique and resisted platform migration. Had to prove savings with TCO numbers to the Board. Audit of 83 services revealed 45 duplicates, but each owner insisted 'their version is better.' Running 26 products in parallel with a limited DevOps team — prioritization by utilization, not by who shouts loudest.

Role

My role & contribution

CTO / Technical Director

Personally audited 83 services and identified 45 duplicates. Selected the stack of 21 infrastructure tools. Designed the DSML platform (JupyterHub, MLFlow, AirFlow, DVC). Developed the 5-year TCO model. Led the architecture team. Presented results to the Board of Directors.

Demo

How it looks

Screenshots

Real screenshots

Architecture

System architecture

PRODUCTSOptimizerNavigatorPred. Maint.Demand FcstSales Plan+20 moreSHARED SERVICESService MeshAPI GatewayAuthData GatewayINFRASTRUCTUREMonitoringZabbixGrafanaProm.ELKSentryCI/CDArgoCDGitLab CINexusIaCAnsibleTerraformDockerRancherDataPG PROKafkaS3 MinioSecurityVaultCheckMarxDSML PlatformJupyterHubMLFlowAirFlowDVC26 products, 21 infrastructure tools, DevOps 5.0 prod/engAI/LLMDataInfraEval
Implementation

How it works

Audit of 83 services → identified 45 duplicates → target reuse process. Unified 21 infrastructure tools (Zabbix, ELK, Prometheus, Grafana, ArgoCD, Vault, Rancher, Nexus, Kafka, PostgreSQL PRO, Ansible, Terraform, GitLab CI/CD, Sentry). DSML platform: JupyterHub + MLFlow + AirFlow + DVC + Gurobi. DevSecOps: CheckMarx (SAST), SIEM, Infowatch. Testing: JMeter, Selenium, Loadrunner, TestIT. 4 phases: prototyping → MVP → scaling → rollout.

Architecture Decision

Why this way

Platform approach instead of product autonomy

Alternative

Each product team chooses its own stack and infrastructure independently

Why it didn't fit

Autonomy: 45 of 83 services turned out to be duplicates. DevOps spent time on unique configurations instead of scaling. Infrastructure costs grew linearly with each product.

Result

26 products on unified platform with 21 standardized tools. T2M from 16 to 4 weeks. DevOps from 1.75 to 5.0 products per engineer

Metrics

Results

01
₽397M savings over 5 years TCO (2022→2026)
02
26 products on platform, 21 infrastructure tools
03
DevOps: 1.75 → 2.1 → 5.0 products per engineer
04
Time-to-Market: 16 → 14 → 4 weeks
05
₽172M infrastructure savings, ₽65M from service reuse
06
₽80.5M DevOps savings, ₽38M from DSML centralization
07
₽41.5M other effects (TTM acceleration, quality, risk reduction)
Business Impact

Impact on business

₽93.4M budget against ₽397M TCO savings over 5 years (2022→2026). DevOps utilization grew from 1.75 to 5.0 products per engineer (₽80.5M savings). T2M reduced from 16 to 4 weeks. 45 of 83 duplicate services eliminated. DSML platform (JupyterHub, MLFlow, AirFlow) saved ₽38M on DS infrastructure centralization.

Methods

Algorithms & patterns

Service MeshPlatform EngineeringArchitecture Review BoardTCO-модель (5 лет)Service Deduplication AuditDSML PlatformDevSecOps Pipeline
Stack

Technologies

  • Python
  • TypeScript
  • NodeJS
  • FastAPI
  • Angular
  • Vue.js
  • PostgreSQL PRO
  • Apache Kafka
  • S3 Minio
  • AirFlow
  • MLFlow
  • JupyterHub
  • DVC
  • Gurobi
  • Docker
  • Rancher
  • ArgoCD
  • Ansible
  • GitLab CI/CD
  • Nexus
  • Zabbix
  • Grafana
  • Prometheus
  • ELK
  • Sentry
  • Hashicorp Vault
  • CheckMarx
  • JMeter
  • Selenium
  • TestIT

Ready to discuss?

If you need an architect who builds autonomous AI systems — reach out.

Serbia-based · CET/CEST timezone · EU-aligned working hours · International contracts experience