Day 14: Explaining ML's Neglected Concepts - 𝗢𝗟𝗧𝗣
Most ML engineers interact with OLTP systems every day and still can't explain what makes them different.
OLTP - 𝗢𝗻𝗹𝗶𝗻𝗲 𝗧𝗿𝗮𝗻𝘀𝗮𝗰𝘁𝗶𝗼𝗻 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 - is the database layer built to power live applications: every login, purchase, and form submission hitting your DB in real time.
What actually happens:
OLTP handles thousands of small, fast read/write operations per second - inserts, updates, deletes, not bulk scans.
It's optimized for row-level access, so the wide column reads your training jobs need are genuinely slow on it.
ACID guarantees (atomicity, consistency, isolation, durability) keep app data correct but add overhead batch pipelines fight constantly.
Normalized schemas reduce write redundancy - and force 6-table joins before a single training row exists.
Key approaches in practice:
CDC (Change Data Capture) reads from the transaction log directly - so training pipelines don't hammer the live DB.
Incremental snapshots pull changes into a staging layer before any feature engineering touches the data.
Read replicas offload query load but don't fix the schema mismatch problem.
What happens in real stacks:
Most teams run Postgres → Kafka → data warehouse, then train only on the warehouse.
Feature stores cache pre-computed values so inference never hits OLTP at query time.
Skip this separation and training-serving skew shows up at 2am.
Built for writes, not models. That gap needs an explicit design decision.
What does your OLTP-to-feature pipeline look like - homegrown ETL or something managed?