Day 15: Explaining ML's Neglected Concepts
𝗢𝗟𝗔𝗣 (𝗢𝗻𝗹𝗶𝗻𝗲 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝗮𝗹 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴): The reason your "fast" database still can't answer a simple analytics question
You trained the model. The pipeline runs. Then someone asks "show me weekly accuracy by region by data source" - and your stack chokes.
The query isn't complex. The database is just the wrong shape.
What actually happens:
OLTP databases optimize for row-level writes - analytics queries are the opposite workload entirely.
OLAP systems pre-organize data into columnar formats so aggregations scan only the fields they need.
A "cube" is a mental model: slice by time, dice by category, drill down or roll up on demand.
The query that took 40 seconds on Postgres runs in 400ms on a columnar store - same data, different physics.
Key approaches in practice:
Star schemas denormalize intentionally, trading storage for join-free query speed.
Materialized views precompute expensive aggregations so dashboards don't recompute on every load.
Partitioning by time column is often the single highest-leverage OLAP optimization.
What happens in real stacks:
BigQuery, Snowflake, Redshift, and DuckDB are all columnar OLAP engines under the hood.
ML teams hit OLAP limits first when building feature stores or evaluation dashboards at scale.
Hybrid Transactional/Analytical Processing (HTAP) is closing the gap - but most teams don't need it yet.
Your model metrics are only as queryable as your data architecture allows.