Chapter 13: BI, ML, and Cost Optimization in Production

Learning Objectives

Chapter Overview Diagrams

Figure 13.1 — QuickSight Three-Layer Architecture

flowchart TD subgraph Sources["Data Sources"] RS[Redshift] S3[S3 / Athena] RDS[RDS / Aurora] SaaS[SaaS Connectors] end subgraph Connect["Connector and Security Layer"] VPC[VPC Connector] IAM[IAM Roles] RLS[Row-Level Security] end subgraph SPICE["SPICE Acceleration Layer"] Ingest[Scheduled Ingestion] Mem[("Columnar In-Memory Store
4:1-6:1 compression")] Refresh[Full / Incremental / On-Demand] end subgraph Experience["End-User Experience"] Dash[Dashboards] Embed[Embedded Analytics] Q[Q Natural-Language] end Sources --> Connect Connect --> SPICE Ingest --> Mem Refresh --> Ingest SPICE --> Experience Connect -. Direct Query .-> Experience

Figure 13.4 — End-to-End Reference Architecture

flowchart TD subgraph SRC["Sources"] DB[Operational DBs] EV[Events / IoT] SAAS[SaaS] end subgraph ING["Ingestion"] DMS[DMS CDC] KDS[Kinesis Streams] AF[AppFlow] GL[Glue Batch] end subgraph LAKE["Layered Lakehouse Storage"] BR["Bronze: S3 Raw
Parquet partitioned"] SI["Silver: Iceberg / Delta
Conformed Zone"] GO["Gold: Redshift / Snowflake
Star Schemas + MVs"] end subgraph BI["Analytics Branch"] QS[QuickSight + SPICE] ATH[Athena Ad-hoc] end subgraph ML["ML Branch"] FS["Feature Store
Online + Offline"] SM["SageMaker Training
+ Endpoints"] end subgraph CROSS["Cross-Cutting Planes"] GOV[Lake Formation Governance] OBS["Observability:
OpenLineage + Monte Carlo"] SEC["Security: IAM + KMS + VPC"] end DB --> DMS EV --> KDS SAAS --> AF DB --> GL DMS --> BR KDS --> BR AF --> BR GL --> BR BR --> SI SI --> GO GO --> QS GO --> ATH SI --> FS FS --> SM GOV -.-> LAKE GOV -.-> BI GOV -.-> ML OBS -.-> LAKE SEC -.-> ING SEC -.-> LAKE

Figure 13.5 — Multi-Tenant Isolation Models

graph TD subgraph Pool["Pool Model: Shared Everything"] P1[Shared Cluster] --> P2[Shared Schema] --> P3[Tables with tenant_id column] --> P4[Row-Level Security + Session Tags] end subgraph Bridge["Bridge Model: Schema-Per-Tenant"] B1[Shared Cluster] --> B2[Tenant A Schema] B1 --> B3[Tenant B Schema] B1 --> B4[Tenant C Schema] end subgraph Silo["Silo Model: Dedicated Infrastructure"] S1[Tenant A Account / Cluster] S2[Tenant B Account / Cluster] S3[Tenant C Account / Cluster] end Cost["Cost Efficiency:
Pool > Bridge > Silo"] -.-> Pool Iso["Isolation Strength:
Silo > Bridge > Pool"] -.-> Silo

Section 1: BI with QuickSight and SPICE

Pre-Reading Check — Section 1

1. A QuickSight Standard-edition account is hitting its dashboard refresh ceiling. The team needs hourly freshness on a 50 GB fact table. Which constraint is the binding one and what is the typical fix?

2. A business user complains that Q in QuickSight cannot answer "show me revenue by customer type" even though the data exists. The column in the dataset is named cust_acct_typ_cd_v2. Which fix most directly addresses the failure?

3. Why is incremental refresh the preferred SPICE mode for a clickstream fact table that is append-only with monotonic event timestamps?

What QuickSight Is

Business intelligence is the layer where engineered data finally meets the people who make decisions with it. A pipeline that ingests terabytes of clickstream and conforms it to a star schema delivers zero business value until a regional sales lead can open a dashboard at 7 AM and see whether yesterday's promotion worked. Amazon QuickSight is AWS's native managed BI service, designed to plug directly into Redshift warehouses, S3 data lakes, and Athena query layers. It is serverless, per-session billed, and tightly integrated with AWS identity and networking.

QuickSight has three distinct capability layers, each with its own cost model and failure modes:

SPICE: Super-fast, Parallel, In-memory Calculation Engine

SPICE is QuickSight's proprietary columnar in-memory store. When you ingest a dataset into SPICE, QuickSight extracts rows from the source, compresses them columnarly (typical ratios are 4:1 to 6:1), and stores them in a distributed memory tier the dashboard renderer queries directly. Aggregation latency falls to the 10–100 millisecond range regardless of how loaded the source warehouse is.

The mental model is a coffee thermos. The espresso machine (your warehouse) is expensive, slow to warm up, and shared. Instead of walking back for every sip, you brew a thermos in the morning (SPICE ingestion) and pour from it instantly all day. The thermos is smaller than the machine's output, so you choose what goes in; contents go stale unless you refill on a schedule.

Capacity is a finite shared resource: Standard edition gives 10 GB per account; Enterprise gives 10 GB base plus 0.5 GB per provisioned user up to 512 GB. A careless team can fill it with overlapping snapshots and starve the organization.

Animation 1: SPICE In-Memory Caching Speedup

Source warehouse query takes seconds; SPICE in-memory aggregation returns in 10–100 ms. Watch the bars grow proportionally to query latency.
Query Latency Comparison Same dashboard query: warehouse direct vs SPICE cached Direct Warehouse Query ~3.8s scan SPICE Cached (4:1-6:1 compression) ~60ms columnar compression + memory residency ~60x speedup → dashboard responsiveness independent of warehouse load

Refresh Strategies

Refresh ModeBehaviorWhen to Use
Full refreshRe-ingests entire dataset from sourceSmall datasets, dimension tables, schema changes
Incremental refreshLoads only rows newer than a timestamp watermarkLarge append-only fact tables with monotonic event time
On-demand refreshAPI or console-triggered manual runPost-ETL completion via EventBridge, ad-hoc fixes

Standard caps refreshes at four per day; Enterprise allows up to 32. Schedule refreshes during low-query windows so SPICE ingestion does not contend with interactive analytics. The trade-off is always freshness versus cost: a dashboard accurate to the minute is dramatically more expensive than one accurate to the hour.

Q Natural Language Queries

Q in QuickSight is the natural-language layer that lets a business user type "what were our top five products by revenue last quarter in EMEA" and receive a generated chart instead of a SQL prompt. Under the hood, Q is a text-to-SQL pipeline: the question is parsed by an NLP model, mapped against the dataset schema and metadata, translated to SQL, executed against SPICE or the source, and rendered as a visualization.

Q is Enterprise-only and its accuracy is almost entirely determined by metadata quality:

Embedded Analytics

Embedded analytics surfaces dashboards or the full Q bar inside another application via a signed embed URL. SaaS companies use it to expose per-customer reporting inside their own product. Three dimensions matter:

For multi-tenant scenarios, the anonymous-user embed pattern with session tags is standard: each session is tagged with the tenant ID, RLS rules filter datasets on that tag, and the host application controls which dashboard a user can request.

Section 1 Key Points

Post-Reading Check — Section 1

1. A QuickSight Standard-edition account is hitting its dashboard refresh ceiling. The team needs hourly freshness on a 50 GB fact table. Which constraint is the binding one and what is the typical fix?

2. A business user complains that Q in QuickSight cannot answer "show me revenue by customer type" even though the data exists. The column in the dataset is named cust_acct_typ_cd_v2. Which fix most directly addresses the failure?

3. Why is incremental refresh the preferred SPICE mode for a clickstream fact table that is append-only with monotonic event timestamps?

Section 2: ML Convergence and SageMaker Integration

Pre-Reading Check — Section 2

1. A churn model trained on the SageMaker offline feature store hits 99% accuracy in development but only 60% in production. What is the most likely root cause?

2. Which SageMaker Feature Store property best reflects how it integrates into the lakehouse rather than being a sibling silo?

3. Generative-BI assistants tend to fail when pointed at a raw lake. What discipline makes them succeed instead?

Lakehouse-Native ML

For most of BI's history, analytics and ML lived in separate tools with separate data copies. The convergence story of the past few years is that those worlds are merging onto the same lakehouse storage, and the bridge between them is the SageMaker Feature Store and a new generation of generative-BI assistants that operate over warehouse data.

SageMaker is AWS's managed ML platform covering notebooks, training, model registry, deployment, and monitoring. Its critical integration point is that SageMaker training jobs, batch-transform jobs, and Studio notebooks read directly from the same S3 lakehouse and Glue Data Catalog as analytics workloads. Training a churn model no longer requires an export to a separate ML store; it reads the same Iceberg or Hudi table the dashboards read.

Concrete integration patterns:

Feature Store: Online + Offline Pair

A feature store manages ML model inputs with two non-negotiable properties: low-latency online lookups for inference, and point-in-time-correct historical retrieval for training.

StoreBackendLatencyPrimary Use
OnlineManaged in-memory KVSingle-digit msReal-time inference via GetRecord / BatchGetRecord
OfflineS3 Parquet (Glue cataloged)Seconds to minutesTraining set creation, batch inference, analytics

The offline store is literally just S3 Parquet partitioned by event time, registered in the Glue Data Catalog, and queryable from Athena, Redshift Spectrum, or EMR. Feature data is part of the lakehouse, not a sibling. An analyst queries feature group history with the same Athena workgroup used for fact tables; a Lake Formation administrator grants access using the same policy primitives.

Animation 2: Feature Store Online/Offline Split with Consumer Paths

Ingestion paths converge on a single feature group, which dual-writes to the online KV (real-time inference) and offline S3 Parquet (training and analytics).
INGESTION Batch Glue / EMR Stream Kinesis / Flink CDC DMS / Debezium FEATURE GROUP PutRecord() EventTime + WriteTime dual-write Online Store In-Memory KV · ~ms Offline Store S3 Parquet / Iceberg CONSUMERS Inference Training (ASOF) Athena / Redshift Spectrum

Ingestion Patterns

  1. Batch ingestion — scheduled Glue or EMR jobs read from S3 or Redshift, transform, and call PutRecord (or use the Spark connector). The connector dual-writes to online and offline stores in one operation.
  2. Stream ingestion — Kinesis or Managed Service for Apache Flink push records through Lambda handlers calling PutRecord continuously.
  3. Backfill — historical Parquet files are written directly to the offline-store S3 path and registered, bypassing per-record API overhead.
  4. CDC ingestion — DMS or Debezium captures change events through Kinesis or Kafka; a stream handler updates features as upstream rows change.

For upsert-heavy workloads (a customer profile updated daily), the Iceberg-format offline store enables ACID upserts, schema evolution, and time travel. For append-only event histories (every transaction), the Glue/Standard offline store is simpler and cheaper.

Point-in-Time Correctness: The Discipline That Separates Production from Prototype

Every offline-store record carries both EventTime (when the value was true in the real world) and WriteTime (when it was written). Training queries use ASOF joins to retrieve the feature value current at prediction time, not the latest value. Without this, a churn model is poisoned by post-cancellation profile updates — classic label leakage that produces 99% dev accuracy and 60% production accuracy.

Generative-BI Assistants

Generative-BI assistants are LLM-powered agents that take a user's natural-language question, plan a sequence of warehouse queries, and synthesize a narrative answer with charts and citations. Q in QuickSight sits at the simpler end (single-question text-to-SQL); Bedrock Agents and similar frameworks orchestrate multi-step reasoning across catalogs, semantic layers, and external knowledge bases.

The data engineering work is largely metadata work:

The pattern that fails is pointing an LLM at a raw lake. The pattern that succeeds is publishing a curated semantic layer first and treating the LLM as a translator into it.

Section 2 Key Points

Post-Reading Check — Section 2

1. A churn model trained on the SageMaker offline feature store hits 99% accuracy in development but only 60% in production. What is the most likely root cause?

2. Which SageMaker Feature Store property best reflects how it integrates into the lakehouse rather than being a sibling silo?

3. Generative-BI assistants tend to fail when pointed at a raw lake. What discipline makes them succeed instead?

Section 3: Cost Optimization and Reference Architectures

Pre-Reading Check — Section 3

1. An Athena query SELECT * FROM events WHERE LOWER(country) = 'us' runs a full table scan on a clickstream table partitioned by country. What is the most direct fix?

2. A platform team is choosing between S3 Standard, Intelligent-Tiering, and Glacier Flexible Retrieval for a 300 TB lakehouse with mixed and unpredictable access patterns. Which choice and rationale is correct?

3. A production team adopts a multi-tenant pool model with row-level security and tenant_id columns. A new enterprise customer demands dedicated infrastructure for compliance reasons. Which evolution best fits the canonical pattern?

Cost Optimization as Workload Shaping

Cost optimization in modern data platforms is about shaping workloads to the billing model of each service. Snowflake bills credits per second (60s minimum); BigQuery bills per terabyte scanned or by reserved slots; Redshift bills per provisioned hour or per serverless second; Athena bills per terabyte scanned; S3 bills per gigabyte-month with tier-dependent rates. Cost optimization means reducing cost (compute hours, bytes scanned, storage tier) without harming value (query latency, freshness, completeness).

Storage Class and Lifecycle Tuning

TierApprox. Cost vs StandardRetrieval LatencyBest For
S3 Standard100%msHot, frequently queried
S3 Intelligent-Tiering~60%ms (auto-managed)Mixed/unknown access
S3 Standard-IA~55%ms (per-GB retrieval fee)Known infrequent, > 30 days
Glacier Instant Retrieval~25%msRare, instant needed
Glacier Flexible Retrieval~15%minutes-hoursArchive with occasional restore
Glacier Deep Archive~5%12+ hoursCompliance archive

S3 Intelligent-Tiering is the default-good choice for lakehouses with unpredictable access. Glacier tiers save 90+ percent for compliance archives and rarely-accessed raw event logs. A typical lakehouse lifecycle policy: keep raw bronze in Standard for 30 days, move to Intelligent-Tiering for 90 days, transition to Glacier Flexible Retrieval at 180 days, expire at 7 years.

Query Economics: Scan Reduction

In scan-billed services every byte read is a billed byte. The four levers, by typical impact:

Anti-pattern: SELECT * FROM big_table WHERE LOWER(country) = 'us' defeats partition pruning because the optimizer cannot prove the function preserves filter on storage layout. Normalize country codes at ingestion. Production deployments enforce per-query scan quotas (BigQuery custom quotas, Athena workgroup limits) so a typo cannot become a five-figure bill.

Right-Sizing and Reserved Capacity

Reserved capacity yields up to 75% savings for stable workloads. The trap is overcommitment: size reservations from 90 days of historical burn, cover only the steady baseline, and let on-demand absorb peaks. Per-query tagging (Snowflake QUERY_TAG, BigQuery labels, AWS resource tags) maps spend back to incurring teams — the precondition for any chargeback model.

Animation 3: Reference Architecture — Sequential Layer Reveal

Sources flow into ingestion, which feeds the bronze/silver/gold layers; gold branches to BI and ML, with governance/observability/security as cross-cutting planes.
Sources DBs · Events · SaaS · IoT Ingestion DMS · Kinesis · AppFlow · Glue Bronze · S3 Raw (Parquet) Silver · Iceberg / Delta Gold · Redshift / Snowflake Analytics Branch QuickSight + SPICE Athena ad-hoc ML Branch Feature Store (Online + Offline) SageMaker Training + Endpoints Governance Lake Formation Observability OpenLineage + MC Security IAM + KMS + VPC Layered storage progresses curation; parallel BI and ML branches share governance, observability, and security planes.

Cost Optimization Decision Flow

flowchart TD Start(["Workload Cost Review"]) --> Q1{"Storage
dominates bill?"} Q1 -->|Yes| Tier["Apply S3 Lifecycle:
Standard 30d to IT 90d to
Glacier IR 1y to Deep Archive"] Q1 -->|No| Q2{"Per-byte scan
billing?"} Tier --> Q2 Q2 -->|Yes| Scan["Add Partitioning + Clustering
+ Materialized Views
+ Result Cache"] Q2 -->|No| Q3{"Compute
under-utilized?"} Scan --> Quota["Enforce Per-Query
Scan Quotas"] Quota --> Q3 Q3 -->|Yes| Right["Right-Size:
Auto-Suspend
Serverless Mode"] Q3 -->|No| Q4{"Steady
baseline load?"} Right --> Q4 Q4 -->|Yes| Reserve["Reserved Capacity:
RIs / Slot Reservations
up to 75% savings"] Q4 -->|No| Tag["Per-Query Tagging
+ FinOps Chargeback"] Reserve --> Tag Tag --> End(["50-75% Savings"])

Reference Architectures

A reference architecture is a documented, tested combination of components solving a class of problems with known trade-offs. The point is that new projects start from a known-good template instead of relitigating fundamental decisions.

The canonical AWS shape:

Multi-Tenant Patterns

ModelIsolation BoundaryCost EfficiencyBlast RadiusUse Case
Pool (shared)Row-level (tenant_id column)HighestLargestMany small tenants, similar workloads
Bridge (silo within shared cluster)Schema or databaseMediumMediumMid-size tenants, regulatory variation
Silo (dedicated)Account or clusterLowestSmallestFew large tenants, strict compliance

Most production multi-tenant platforms end up hybrid: pool for the long tail of small tenants, silo for the handful of enterprise customers whose contracts demand dedicated infrastructure, with a clear migration path between the two as customers grow.

Capstone Discipline

The capstone is not finding the theoretically optimal stack; it is making every component choice deliberately, documenting the trade-off, and designing for the team that will operate the system at 3 AM on Tuesday when something breaks. A worked e-commerce example combines DMS + Kinesis ingestion, S3 Intelligent-Tiering bronze, Iceberg silver, Redshift Serverless gold, QuickSight Enterprise + SPICE BI, SageMaker Feature Store ML, Hightouch reverse-ETL, Lake Formation + Atlan governance, and a tight cost-control posture (lifecycle policies, workgroup scan limits, per-team tags) within a $1.5M annual budget.

Section 3 Key Points

Post-Reading Check — Section 3

1. An Athena query SELECT * FROM events WHERE LOWER(country) = 'us' runs a full table scan on a clickstream table partitioned by country. What is the most direct fix?

2. A platform team is choosing between S3 Standard, Intelligent-Tiering, and Glacier Flexible Retrieval for a 300 TB lakehouse with mixed and unpredictable access patterns. Which choice and rationale is correct?

3. A production team adopts a multi-tenant pool model with row-level security and tenant_id columns. A new enterprise customer demands dedicated infrastructure for compliance reasons. Which evolution best fits the canonical pattern?

Your Progress

Answer Explanations