Pranay Rishith Bondugula | ML Engineer & Data Scientist

Day 15: Explaining ML's Neglected Concepts

𝗢𝗟𝗔𝗣 (𝗢𝗻𝗹𝗶𝗻𝗲 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝗮𝗹 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴): The reason your "fast" database still can't answer a simple analytics question

You trained the model. The pipeline runs. Then someone asks "show me weekly accuracy by region by data source" - and your stack chokes.

The query isn't complex. The database is just the wrong shape.

What actually happens:

•OLTP databases optimize for row-level writes - analytics queries are the opposite workload entirely.

•OLAP systems pre-organize data into columnar formats so aggregations scan only the fields they need.

•A "cube" is a mental model: slice by time, dice by category, drill down or roll up on demand.

•The query that took 40 seconds on Postgres runs in 400ms on a columnar store - same data, different physics.

Key approaches in practice:

•Star schemas denormalize intentionally, trading storage for join-free query speed.

•Materialized views precompute expensive aggregations so dashboards don't recompute on every load.

•Partitioning by time column is often the single highest-leverage OLAP optimization.

What happens in real stacks:

•BigQuery, Snowflake, Redshift, and DuckDB are all columnar OLAP engines under the hood.

•ML teams hit OLAP limits first when building feature stores or evaluation dashboards at scale.

•Hybrid Transactional/Analytical Processing (HTAP) is closing the gap - but most teams don't need it yet.

Your model metrics are only as queryable as your data architecture allows.