Chapter 12: Search, Logs, and Observability with OpenSearch
Learning Objectives
Explain how Amazon OpenSearch indexes documents using inverted indexes, shards, and replicas to enable fast search across log corpora.
Build a log ingestion pipeline using Amazon OpenSearch Ingestion (managed Data Prepper), with sources, processors, and sinks defined in YAML.
Use OpenSearch Dashboards to explore logs, configure alerts, detect anomalies, and analyze distributed traces for application performance monitoring.
Apply hot, UltraWarm, and cold tiering through Index State Management (ISM) policies to dramatically reduce log retention costs while preserving queryability.
Every Lambda function, Glue job, Kinesis consumer, and microservice produces operational telemetry — logs, metrics, and traces — that must be searched, alerted on, and retained. This chapter examines the search and observability stack built around Amazon OpenSearch Service and shows how to control the cost curve as log volume grows from megabytes to petabytes.
1. What core data structure makes OpenSearch text search fast?
A) A B-tree on the document ID column
A forward index from doc IDs to terms
An inverted index from terms to doc IDs
A row-store with bitmap filters
2. Which property of a shard count is fixed once an index is created?
Replica count
Primary shard count
Refresh interval
Field mappings
3. Why does OpenSearch place a primary shard and its replica on different nodes?
To save disk space
To enable automatic reindexing
To survive a node failure and continue serving traffic
It is purely cosmetic; placement does not matter
4. Roughly what shard size range does AWS recommend for log workloads?
Less than 1 GB per shard
10 to 50 GB per shard
100 to 500 GB per shard
There is no recommended range
5. OpenSearch is a fork of which projects?
Solr and Lucene
Elasticsearch 7.10 and Kibana
CloudSearch and CloudWatch
PostgreSQL full-text search
Inverted Indexes
The core data structure inside OpenSearch is the inverted index. A normal forward index maps a document ID to its content; an inverted index flips that and maps each term to the list of document IDs that contain it. Consider two log lines: "Beauty is in the eye of the beholder" and "Beauty and the beast". The inverted index stores entries like beauty -> [1, 2], beholder -> [1], and beast -> [2]. When you query for "beauty", OpenSearch performs a single dictionary lookup rather than scanning every document — which is what makes search across billions of log lines feel interactive.
Analogy: the index at the back of a textbook. To find every page mentioning "shard," you look up the term and jump to the listed pages rather than reading the whole book.
Animation 1: Inverted Index Lookup
A query for "beauty" looks up the term once and jumps directly to the matching documents — no full scan.
Documents are JSON objects, indexes are collections of documents, and the mapping defines field types. The OpenSearch vocabulary maps cleanly onto a relational mental model:
Relational
OpenSearch
Database
Cluster
Table
Index
Row
Document
Column
Field
Schema
Mapping
Primary Key
_id field
Shards: Primaries and Replicas
A single index quickly outgrows a single machine, so OpenSearch partitions each index into shards. A shard is an independent Lucene index. Indexes have primary shards (the authoritative copies of partitioned data) and replica shards (full copies placed on different nodes for fault tolerance and read scaling).
Routing works as follows. When you index a document, OpenSearch hashes the document ID (or a custom routing key) modulo the primary shard count, picks a primary shard, writes the document, then replicates to each replica. With 5 primaries and 1 replica, a single bulk write touches 10 shards. A search request goes to either a primary or replica copy of each shard — only 5 shards are queried per request, in parallel.
Crucial constraint: primary shard count is immutable after index creation. Replica count, however, is dynamic — you can scale replicas up before a known traffic spike and back down afterwards.
Animation 2: Shard Placement and Replica Failover
Six data nodes hosting an index with 3 primaries (P0/P1/P2) and 3 replicas (R0/R1/R2). Watch Node A fail and R0 promote to primary.
Cluster, Node, and Replica Concepts
A cluster is the unit of management. Within it, cluster manager nodes (formerly master) maintain cluster state, data nodes hold shards, and coordinator nodes fan out client requests. Amazon OpenSearch Service abstracts most of this — you pick instance types and counts, AWS picks the role assignments.
Shard sizing has practical bounds. AWS recommends 10-30 GB per shard for search-dominated workloads and up to 50 GB for log workloads. Each open shard consumes file handles, JVM heap, and cluster-state metadata, so AWS suggests fewer than 20 shards per GB of heap, or roughly fewer than 1,000 shards per node. Too few shards and individual shards bloat past 50 GB; too many and the cluster manager spends all its time tracking metadata.
For log workloads, the best practice is time-based rolling indexes managed by an index template. Daily indexes named logs-2026.05.07, logs-2026.05.08 let you delete old indexes wholesale (cheap) instead of deleting documents from a huge single index (expensive).
OpenSearch began as Elasticsearch and Kibana. In January 2021, Elastic changed the licensing from Apache 2.0 to a dual SSPL/Elastic License model that restricted use by certain managed-service providers. AWS forked the last Apache-2.0 versions and renamed them OpenSearch and OpenSearch Dashboards. OpenSearch 1.0 GA shipped in mid-2021 under Linux Foundation governance.
Query DSL, document model, REST API, and most plugins are compatible with the Elasticsearch 7.10 era.
Codebases have diverged since 2021. Elasticsearch added ESQL; OpenSearch added neural search, ML Commons, security analytics.
New AWS deployments should choose OpenSearch unless a legacy app pins a specific Elasticsearch version.
Figure 12.1: Primary and Replica Shard Topology (6-node cluster)
flowchart TB
subgraph Cluster["OpenSearch Cluster: logs-2026.05.07"]
direction LR
subgraph Primaries["Primary Shards"]
direction LR
NA["Node A P0"]
NB["Node B P1"]
NC["Node C P2"]
end
subgraph Replicas["Replica Shards"]
direction LR
ND["Node D R0"]
NE["Node E R1"]
NF["Node F R2"]
end
end
NA -. replicates to .-> ND
NB -. replicates to .-> NE
NC -. replicates to .-> NF
Client["Indexing Request"] -->|hash mod 3| NA
Client -->|hash mod 3| NB
Client -->|hash mod 3| NC
Search["Search Request"] --> NA
Search --> NE
Search --> NF
Key Points: OpenSearch Fundamentals
Inverted index maps terms to doc IDs — the lookup is O(1) on the term, not O(n) on documents.
Primary shard count is immutable at index creation; replica count is dynamic.
Primaries and replicas live on different nodes for HA and read scaling.
Target 10-50 GB per shard, fewer than ~20 shards per GB of heap.
Use time-based rolling indexes + index templates for log workloads — deleting an index is cheap; deleting documents is expensive.
OpenSearch is the Apache-2.0 fork of Elasticsearch 7.10 from 2021; new AWS deployments should pick OpenSearch.
Post-Reading Check — OpenSearch Fundamentals
1. What core data structure makes OpenSearch text search fast?
A) A B-tree on the document ID column
A forward index from doc IDs to terms
An inverted index from terms to doc IDs
A row-store with bitmap filters
2. Which property of a shard count is fixed once an index is created?
Replica count
Primary shard count
Refresh interval
Field mappings
3. Why does OpenSearch place a primary shard and its replica on different nodes?
To save disk space
To enable automatic reindexing
To survive a node failure and continue serving traffic
It is purely cosmetic; placement does not matter
4. Roughly what shard size range does AWS recommend for log workloads?
Less than 1 GB per shard
10 to 50 GB per shard
100 to 500 GB per shard
There is no recommended range
5. OpenSearch is a fork of which projects?
Solr and Lucene
Elasticsearch 7.10 and Kibana
CloudSearch and CloudWatch
PostgreSQL full-text search
Section 2: Log Ingestion and Visualization
Pre-Reading Check — Log Ingestion and Visualization
1. What is OpenSearch Ingestion (OSIS)?
A self-hosted Logstash distribution
An auto-scaling, AWS-managed Data Prepper service
A Kinesis Data Streams replacement
A CloudWatch Logs subscription filter
2. What are the four logical stages of a Data Prepper pipeline?
producer, broker, consumer, sink
source, buffer, processor, sink
extract, transform, load, archive
map, reduce, shuffle, persist
3. When should you choose Firehose-to-OpenSearch over OSIS?
When you need grok parsing of unstructured logs
When you need conditional routing to multiple indexes
When you have already-structured records and want a simple managed delivery path
When you need on-premises ingestion
4. Which OpenSearch Dashboards view is the SRE day-to-day workspace for ad-hoc log exploration?
Visualize
Discover
Dev Tools
Stack Management
5. The Anomaly Detection plugin uses which algorithm?
Linear regression
k-means clustering
Random Cut Forest (unsupervised)
Static threshold over a moving average
OpenSearch Ingestion Service (OSIS)
Amazon OpenSearch Ingestion (OSIS) is the AWS-managed Data Prepper service. You upload a YAML pipeline definition, choose a capacity range in OpenSearch Compute Units (OCUs), and AWS runs the pipeline as a serverless service that auto-scales between your minimum and maximum OCU bounds.
A pipeline has four logical stages:
Source: where data enters — HTTP push, S3 + SQS, OpenTelemetry endpoints, Kafka, or existing OpenSearch indexes for migrations.
Buffer: holds events between stages; in-memory (default) or persistent for at-least-once durability.
Sink: writes to OpenSearch, OpenSearch Serverless, S3, or another pipeline.
Capacity is measured in OCUs. Stateless pipelines scale to 96 OCUs (384 with persistent buffering); stateful pipelines top out at 48 OCUs (192 with buffering). One OCU handles a few thousand events per second of typical log data.
Animation 3: Data Prepper / OSIS Pipeline Flow
Watch a single event traverse source → buffer → processor → sink with each stage activating in turn.
A worked example: ingest application logs from S3 (notified via SQS), parse with grok, ship to OpenSearch.
The index pattern logs-app-%{yyyy.MM.dd} produces daily rolling indexes; OSIS creates them on demand.
Self-hosted Data Prepper
Data Prepper is the open-source upstream that powers OSIS. Run it yourself for on-premises sources, custom Java processors, or colocation with another tool. The YAML syntax is identical to OSIS, easing migration in either direction. A pipeline can chain into another via the pipeline source/sink — useful for fan-in deduplication followed by fan-out routing. Up to 10 sub-pipelines can be chained per file.
Firehose-to-OpenSearch Delivery
When logs already flow through Amazon Data Firehose, you can deliver them straight to OpenSearch without a separate pipeline. Firehose buffers records by size or time, optionally invokes a Lambda transformation, and bulk-indexes the result. Records that fail after retries are written to an S3 backup bucket for replay.
Trade-off vs. OSIS: Firehose is simpler but less expressive. Grok parsing requires a Lambda transformer; conditional routing is unavailable (one stream maps to one index pattern). For pass-through, Firehose wins; for parsing and fan-out, OSIS wins.
Need
Choose
Pass-through indexing of structured records
Firehose
Grok parsing of unstructured logs
OSIS or Data Prepper
Distributed-trace ingestion (OTel)
OSIS or Data Prepper
On-premises collector reachability
Self-hosted Data Prepper
At-least-once with persistent buffering
OSIS with persistent buffering
Single AWS-resident, simple, cheap path
Firehose
OpenSearch Dashboards
OpenSearch Dashboards is the Kibana fork. It runs as a Node.js application that talks to an OpenSearch domain and provides a browser-based UI for several core workflows:
Discover: the SRE day-to-day workspace. Pick an index pattern (e.g. logs-app-*), pick a time range, run queries in OpenSearch Query DSL or PPL. Shows raw documents, a histogram of event counts, and per-field breakdowns. Killer feature: time-range comparison — "what is traffic now" vs. "what was traffic at this time last week" in two clicks.
Visualize: line, bar, heatmap, geo-map, tag-cloud charts against bucket and metric aggregations.
Dashboards: composed from saved visualizations; teams pin them during incidents.
Dev Tools: REST playground for raw API calls (GET _cluster/health, PUT /_index_template/...).
For multi-tenant environments, fine-grained access control restricts users to specific index patterns, fields, or documents. Application teams typically own a dashboard tenant; platform teams own a global tenant for cross-cutting dashboards.
Alerting and Anomaly Detection
The Alerting plugin turns saved queries into scheduled monitors with:
Inputs: a query against indexes returning documents or aggregations.
Triggers: per-document, per-bucket, or query-level conditions.
Actions: Slack, Teams, email, SNS, webhook, or PagerDuty notifications.
Example: a monitor that fires when level: ERROR events from any single service exceed 50 in 5 minutes. A per-bucket trigger over a terms aggregation on service.name produces a separate alert per offending service.
The Anomaly Detection plugin layers ML on top using Random Cut Forest — an unsupervised ensemble for time-series anomalies. Detectors learn normal patterns, score new buckets, and emit anomaly grades that adapt to seasonality and drift. Production hierarchy: static thresholds for SLO violations (latency p99 > 500 ms, errors > 1%); anomaly alerts for novel failures; composite monitors combining multiple inputs to suppress flapping.
Trace Analytics and APM
OpenSearch supports distributed-trace analytics through OpenTelemetry ingestion. Spans flow from instrumented services (via OTel Collector or AWS Distro for OpenTelemetry) into an OSIS or Data Prepper pipeline. The pipeline runs otel_trace_raw to flatten spans, service_map_stateful to compute a service graph with edge metrics like latency and error rate, and writes to otel-v1-apm-span-* for raw spans and otel-v1-apm-service-map for graph aggregates.
Analogy: traces are flight-tracking radar. Each request is an aircraft; each span is a leg (taxi, climb, cruise, descent). The trace view shows one flight path; the service map is the airport-pair graph aggregating all flights.
For log-trace correlation, instrument services to inject trace_id and span_id into every log line. With logs and traces in the same domain, a Discover query for a trace ID surfaces both, giving full context in one tool.
flowchart LR
subgraph Sources["Source Plugins"]
S3["S3 + SQS"]
HTTP["HTTP Push"]
OTEL["OTel Endpoint"]
KAFKA["Kafka"]
end
Buffer["Buffer (in-memory or persistent)"]
subgraph Processors["Processor Chain"]
direction LR
P1["grok parse log lines"] --> P2["date parse @timestamp"] --> P3["mutate enrich fields"] --> P4["conditional routing"]
end
subgraph Sinks["Sink Plugins"]
OS["OpenSearch Domain"]
OSS["OpenSearch Serverless"]
S3OUT["S3 Archive"]
PIPE["Another Pipeline"]
end
Sources --> Buffer --> Processors --> Sinks
Figure 12.3: OpenTelemetry Trace Ingestion Flow
sequenceDiagram
participant App as Instrumented App
participant ADOT as ADOT Collector
participant OSIS as OSIS Pipeline
participant Raw as otel-v1-apm-span-*
participant Map as otel-v1-apm-service-map
participant Dash as Trace Analytics
App->>ADOT: Emit spans (trace_id, span_id)
ADOT->>OSIS: Batch OTLP export
OSIS->>OSIS: Flatten spans (otel_trace_raw)
OSIS->>OSIS: Compute edges (service_map_stateful)
par Fan-out
OSIS->>Raw: Index raw spans
and
OSIS->>Map: Index service-graph aggregates
end
Dash->>Raw: Query waterfall by trace_id
Dash->>Map: Query service map
Key Points: Log Ingestion and Visualization
OSIS = managed Data Prepper, billed in OCUs, auto-scales between min/max bounds.
Pipelines are source → buffer → processor → sink; YAML is portable to self-hosted Data Prepper.
Choose Firehose for pass-through, OSIS for parsing and routing, self-hosted Data Prepper for on-premises or custom-code.
Dashboards: Discover for ad-hoc, Visualize/Dashboards for charts, Dev Tools for REST API.
Alerting: monitors with per-doc / per-bucket / query-level triggers and Slack/SNS/PagerDuty actions.
Anomaly Detection: Random Cut Forest, seasonal-aware, perfect for "every service has its own normal."
Trace Analytics: OTel → OSIS → raw spans + service map; correlate with logs via shared trace IDs.
Post-Reading Check — Log Ingestion and Visualization
1. What is OpenSearch Ingestion (OSIS)?
A self-hosted Logstash distribution
An auto-scaling, AWS-managed Data Prepper service
A Kinesis Data Streams replacement
A CloudWatch Logs subscription filter
2. What are the four logical stages of a Data Prepper pipeline?
producer, broker, consumer, sink
source, buffer, processor, sink
extract, transform, load, archive
map, reduce, shuffle, persist
3. When should you choose Firehose-to-OpenSearch over OSIS?
When you need grok parsing of unstructured logs
When you need conditional routing to multiple indexes
When you have already-structured records and want a simple managed delivery path
When you need on-premises ingestion
4. Which OpenSearch Dashboards view is the SRE day-to-day workspace for ad-hoc log exploration?
Visualize
Discover
Dev Tools
Stack Management
5. The Anomaly Detection plugin uses which algorithm?
Linear regression
k-means clustering
Random Cut Forest (unsupervised)
Static threshold over a moving average
Section 3: Cost and Tiering
Pre-Reading Check — Cost and Tiering
1. Which storage tier is read-only and backed by S3 plus a local SSD/memory cache?
Hot
UltraWarm
Cold
Frozen
2. What must you do before querying a cold-storage index?
Restore it from a snapshot to hot storage
Attach it back to UltraWarm
Re-index every document from S3
Cold queries run with no preparation
3. What does ISM stand for, and what does it automate?
4. Why is force_merge to one segment per shard recommended before warm migration?
It encrypts the segment files before upload
It is required by IAM
It removes deleted documents and reduces S3 fetch fan-out on cold cache
It rebuilds the inverted index from scratch
5. What is the main trade-off of OpenSearch Serverless versus a provisioned domain with ISM?
Serverless costs more for spiky workloads
Serverless eliminates cluster sizing but does not support ISM tiering
Serverless requires you to choose hot/warm/cold tiers manually
Serverless cannot run trace analytics
UltraWarm and Cold Storage Tiers
Recent logs need fast queries; quarter-old logs must exist for compliance but rarely get touched. OpenSearch addresses this with three storage tiers and an automation engine.
Hot tier: instance-attached EBS (or NVMe), full IOPS and memory caching. Fastest, sub-second queries. About $0.169/GB-month for EBS plus data-node cost.
UltraWarm: S3-backed with an LRU cache on local SSD and in memory. Migrated indexes become read-only, are force_merged to one segment per shard, and segment files are uploaded to S3. Queries pull segments from S3 into cache on demand. UltraWarm storage is $0.024/GB-month — about 85% cheaper than hot — plus warm node cost.
Cold storage: detaches indexes entirely; only metadata stays in the cluster. You pay S3 standard rates (~$0.0125/GB-month). To query a cold index you must attach it back to UltraWarm first. Cold is for once-a-year data: security audits, regulatory archives, long-tail debugging.
Tier
Use Case
Storage
Compute
Query Performance
Hot
Recent (0-7 days)
~$0.169/GB-mo
Full data-node instances
Sub-second
UltraWarm
Historical (7-90 days)
$0.024/GB-mo
$0.238-$2.68/hr per warm node
Interactive (S3 + cache)
Cold
Archive (>90 days)
~$0.0125/GB-mo (S3)
Pay-per-attach
Slow first query, then UltraWarm-like
Worked cost example: 100 TB retained 365 days with 7 days hot, 83 days UltraWarm, 275 days cold.
Total ≈ $21,700/yr versus $202,800/yr for naive all-hot — about a 9x reduction.
Index State Management (ISM)
Manually moving indexes between tiers does not scale. Index State Management (ISM) is OpenSearch's built-in policy engine. An ISM policy describes a finite-state machine of states (hot, warm, cold, delete), actions performed on state entry, and conditions that trigger transitions.
Animation 4: ISM Hot → UltraWarm → Cold → Delete Lifecycle
A document follows the policy: rolled over after 1d/50GB/50M docs, migrated to warm at 7d, cold at 30d, deleted at 90d.
The ism_template block auto-attaches the policy to any new index matching logs-*. Combined with rolling daily indexes from Firehose or OSIS, the lifecycle runs without operator intervention.
The force_merge before warm migration is a critical optimization. Lucene segments accumulate as documents are written; merging to one segment per shard removes deleted documents and consolidates layout, shrinking storage and speeding up cold-cache queries on UltraWarm. Skipping it leaves hundreds of small segments per shard, each requiring a separate S3 fetch.
Serverless OpenSearch Collections
For workloads where you do not want to size a cluster, Amazon OpenSearch Serverless offers collections — managed, auto-scaling endpoints typed as time-series (logs), search, or vector search. AWS provisions and scales OCUs behind the scenes and stores data on S3 by default.
Key differences from provisioned:
No node sizing. You set min/max OCU bounds; AWS scales between them.
Storage on S3 by default. No UltraWarm or cold tier — storage is already S3 — but you also lose explicit hot/warm control.
Pricing. Per OCU-hour for indexing and search separately, plus storage. Cheaper for spiky/small workloads, often costlier than a well-sized provisioned domain at steady high volume.
Feature subset. Trace analytics works; ISM is unsupported because the storage model differs.
Sink configuration. OSIS pipelines targeting Serverless need serverless: true plus index_type: management_disabled.
Choose Serverless for variable/bursty volumes, new workloads, or vector search. Stay provisioned for steady high-volume ingest where ISM tiering is the cost win, plugin-heavy workloads, or strict latency targets.
Figure 12.4: ISM Lifecycle State Machine
stateDiagram-v2
[*] --> hot: Index created (ism_template auto-attach)
hot --> hot: rollover (50GB / 1d / 50M docs)
hot --> warm: min_index_age >= 7d warm_migration, force_merge to 1 segment, replica_count = 1
warm --> cold: min_index_age >= 30d cold_migration (detach to S3)
cold --> delete: min_index_age >= 90d
delete --> [*]: cold_delete (remove metadata)
Key Points: Cost and Tiering
Hot = fast and expensive (~$0.169/GB-mo + instances).