Chapter 15: End-to-End Architecture Scenarios and Exam Synthesis
Learning Objectives
Synthesize multi-domain Cohesity architectures combining DataProtect, SmartFiles, SiteContinuity, FortKnox, DataHawk, and Helios for enterprise-scale scenarios.
Trace explicit business requirements (RPO, RTO, retention, compliance, residency, budget) through to specific Cohesity design choices.
Apply scenario-based reasoning to CCAE-style architecture questions, recognizing the four-option pattern and identifying which constraint each distractor violates.
Build a 30-day study plan keyed to the published domain weights (22 / 35 / 18 / 13 / 12 percent) and execute a disciplined test-day strategy.
The previous fourteen chapters built the vocabulary, mechanics, and design patterns of the Cohesity Data Cloud one layer at a time. The CCAE exam, however, almost never asks about a single layer in isolation. It asks you to assemble layers into a coherent design that satisfies a business problem under hard constraints. This final chapter walks through three full reference architectures end-to-end, decodes the exam blueprint and scenario question pattern in detail, and gives you a 30-day plan plus a test-day playbook so you walk into the proctoring session knowing exactly how to spend your 90 minutes.
Scenario 1: Global Enterprise with Multi-Region DR
Pre-Section Quiz
1. A multinational has three regional data centers and 42 branch sites. Which topology best satisfies "must survive loss of any one regional data center" while keeping branch backup traffic local?
A) Single hub at HQ; all branches replicate directly there
B) Hub-and-spoke per region with one-to-many triangular replication across the three regions
C) All branches archive directly to S3 Glacier; no on-prem clusters
D) Active-active synchronous mirroring between every pair of branches
2. The CIO requires Tier-0 ERP RPO of 15 minutes and RTO of 60 minutes. Which Cohesity capability most directly meets the 60-minute RTO?
A) CloudArchive to S3 Glacier Deep Archive
B) Hourly snapshots written to tape
C) Instant Mass Restore mounting VMs directly from SpanFS
D) Manual rebuild from quarterly base image
3. The security team will not allow production data in third-party SaaS but wants SaaS-based fleet visibility. What is the right answer?
A) Refuse Helios; manage clusters individually
B) Helios SaaS for control plane; CloudArchive to customer-owned S3 buckets
C) Use FortKnox SaaS for both vault and control plane
D) Helios Self-Managed plus DMaaS
The Business Problem
A multinational manufacturer operates three primary data centers (Dallas, Frankfurt, Singapore) plus 42 branch and plant sites across the Americas, EMEA, and APAC. Targets: Tier-0 ERP/MES need RPO 15 min, RTO 60 min; Tier-1 file/VM need RPO 4 h, RTO 4 h; Tier-2 archive/dev need RPO 24 h, RTO 24 h; seven-year retention; cross-region replication must survive loss of any one regional data center; SaaS control plane allowed but no third-party SaaS for production data.
Topology Choice: Hub-and-Spoke Per Region with One-to-Many Across Regions
The reference pattern that maps cleanly to this requirement set is hub-and-spoke within each region and one-to-many across regions. Each branch deploys a Cohesity Robo or small Virtual Edition cluster sized for local backups. Spokes replicate inbound to the regional hub. Each regional hub then replicates Tier-0 and Tier-1 data to one of the other two regions in a triangular mesh that survives any single regional loss.
Topology Element
Decision
Rationale
Branch protection
Robo Edition / small VE per site
Local backup for fast restore; reduces WAN traffic
Branch-to-hub
Many-to-one inbound replication
Centralizes recovery, audit, retention
Hub-to-hub
One-to-many triangular replication
Survives loss of any single region
Long-term retention
CloudArchive S3 + Glacier lifecycle
7-year retention without hot capacity bloat
Control plane
Helios SaaS (multi-tenant)
Global SLA reporting, capacity prediction, fleet upgrades
Figure 15.1 - Hub-and-Spoke Global DR with Triangular Cross-Region Replication
For the Tier-0 RTO of 60 minutes the architect leverages Instant Mass Restore at the regional hub. SiteContinuity runbooks orchestrate failover - power-on order, IP re-mapping, dependency groups - and a non-disruptive test failover runs quarterly into an isolated network bubble for audit evidence.
Capacity, Bandwidth, and Cost Sanity Check
A useful analogy: the regional hubs are regional water reservoirs. Each branch is a small upstream tank; water flows downhill into the regional reservoir at low pressure (deduplicated WAN-optimized streams). Reservoirs exchange water across long pipes (cross-region replication) only for the most critical workloads, because long pipes are expensive. The cloud archive is the underground aquifer - slow to recall but cheap and effectively infinite.
WAN bandwidth is dimensioned around the change rate of Tier-0 and Tier-1 data, not total front-end TB. If Tier-0 produces 200 GB of unique change daily after dedupe, a 25 Mbps committed cross-region link comfortably absorbs the load.
Key Takeaway: A global enterprise design layers hub-and-spoke (intra-region) with one-to-many (inter-region) replication, drives every workload through tiered Helios policies, uses CloudArchive for long-term retention without hot-capacity bloat, and orchestrates recovery through SiteContinuity runbooks.
Key Points
Hub-and-spoke inside each region keeps branch traffic local; triangular one-to-many across regions survives loss of any single regional DC.
Three Helios policy templates (Tier-0/1/2) drive every workload via Protection Groups - the design discipline the exam expects.
Instant Mass Restore mounts VMs directly from SpanFS, hitting 60-minute RTO; SiteContinuity runbooks handle dependency ordering.
CloudArchive to S3 Glacier covers the 7-year retention requirement without consuming hot cluster capacity.
Helios SaaS is the control plane (allowed) while production data stays in customer-owned storage (compliance constraint preserved).
Post-Section Quiz
4. The architect proposes oversubscribing the cross-region WAN link by adding all Tier-2 traffic to it. What constraint does this violate?
A) Compliance / data residency
B) Tier-0 RPO - the link must be sized for Tier-0/Tier-1 change rate, not Tier-2
C) Retention - Tier-2 needs longer retention than Tier-0
D) None - oversubscription is best practice
5. Which Cohesity construct is the right unit of design for SLA enforcement at scale, rather than per-job customization?
A) Manual scripts on each cluster
B) Per-VM tags only
C) Helios policy templates applied through Protection Groups
D) Custom backup binaries per workload
Scenario 2: Healthcare with HIPAA and Ransomware Posture
Pre-Section Quiz
6. A covered entity prohibits PHI from leaving its perimeter but still wants the air-gapped vault pattern. Which Cohesity option fits?
A) FortKnox SaaS in Cohesity-managed AWS region
B) FortKnox Self-Managed inside the covered entity's perimeter
C) S3 Glacier with the same admin credentials as production
D) Tape library managed by a third-party MSP
7. CISO requires "no admin can delete protected backups before retention." Which capability satisfies this?
A) Daily SMTP alerts to the SOC
B) DataLock WORM with quorum-required retention changes
C) RBAC alone, with the cluster admin as super-user
D) Encryption at rest with provider-managed keys
8. An attacker with stolen production admin credentials attempts to delete backup data. Which control is the LAST line of defense?
A) DataHawk anomaly detection
B) DataLock WORM enforced on the FortKnox vault copy
C) MFA on the production cluster
D) Daily Helios SLA report
The Business Problem
A 600-bed health system runs Epic EHR, PACS imaging, lab systems, and clinical research across two on-prem data centers. The CISO requires: PHI never leaves the covered entity (no SaaS vaulting); WORM-immutable backups that compromised admins cannot delete; encryption with customer-managed keys via KMIP; quorum approval for any retention change; ransomware anomaly detection on backup ingestion; clean-room recovery validated quarterly; Helios control plane without outbound SaaS connectivity.
Secondary copy - Cohesity DataProtect cluster on-prem with DataLock WORM. Tier-0 backups every 15 minutes via log-based RPO. KMIP-managed encryption.
Tertiary vault - FortKnox Self-Managed in a logically and physically isolated network segment, behind a separate management VLAN, with an inbound-only transfer window. Production admins do not have credentials for the vault; vault admins do not have credentials for production.
Analogy: FortKnox Self-Managed is the safe-deposit vault inside a bank inside a city. The DataProtect cluster is the bank. The vault sits behind a second locked door whose key is held by a different person, and the door opens only on a published schedule.
Detection and Clean Recovery
DataHawk performs anomaly detection on every backup ingestion, comparing change rates and entropy against historical baselines; an unusual spike fires a Helios alert. Threat scanning then uses curated IOCs to identify which restore points are clean. Quarterly clean-room recoveries restore Epic and PACS into an isolated network bubble for HIPAA contingency-plan audit evidence.
Audit, Logging, and HIPAA Alignment
All admin actions stream to Splunk via the Cohesity Data Security Alliance integration. RBAC follows least privilege: Backup Operator (restores, no retention changes), Compliance Officer (legal hold, no delete), Cluster Admin (config, but cannot bypass DataLock). MFA is mandatory; quorum approval is required for retention reduction and DataLock policy changes.
Key Takeaway: Healthcare ransomware-resilient design is a layered stack: DataLock provides immutability, FortKnox Self-Managed provides the air-gapped vault inside the covered entity, DataHawk provides detection and classification, KMIP provides customer-controlled keys, Helios Self-Managed provides the dark-site control plane, and quorum + MFA + RBAC provide segregation of duties. Removing any layer fails one of the requirements.
Key Points
Each HIPAA/security requirement maps to a specific Cohesity component - the design is the layered combination, not any single product.
FortKnox Self-Managed (not SaaS) keeps PHI inside the covered entity while preserving the air-gap pattern.
The 3-2-1 model with DataLock WORM on the secondary plus segregated credentials on the vault defeats the credential-compromise kill chain.
DataHawk detects (anomaly), classifies (PHI/PII), and validates (clean-room) - covering pre-attack, ingestion, and recovery.
Quorum + MFA + RBAC enforce segregation of duties so no single admin can disable the controls.
Post-Section Quiz
9. Which Cohesity feature most directly supports HIPAA's contingency-plan requirement to validate recovery?
A) Quarterly clean-room recovery via DataHawk + SiteContinuity into an isolated bubble
B) Annual third-party penetration test
C) DataLock WORM alone
D) Daily SMTP alert to the SOC
10. Why is FortKnox Self-Managed - and not FortKnox SaaS - the correct choice for this customer?
A) FortKnox SaaS does not support WORM
B) The CISO prohibits PHI from leaving the covered entity, ruling out third-party SaaS storage of production data
C) FortKnox SaaS is more expensive than Self-Managed
D) FortKnox SaaS is not available in the U.S.
Scenario 3: Service Provider Multi-Tenant DMaaS
Pre-Section Quiz
11. An MSP is launching BaaS for 25 tenants growing to 200 in 18 months without standing up its own DataProtect clusters. Which Cohesity model fits?
A) Robo Edition per tenant
B) Cohesity DMaaS consumed by the MSP as a managed-service overlay
C) FortKnox SaaS as the primary protection target
D) Per-tenant isolated tape libraries
12. Which Cohesity construct is the cornerstone of multi-tenant isolation?
A) Protection Groups
B) Organizations
C) Marketplace Apps
D) iris_cli
The Business Problem
An MSP wants to launch BaaS for mid-market customers: 25 initial tenants growing to 200 in 18 months; per-tenant isolation; tenants self-serve via a branded portal; metered chargeback; tenant onboarding under one business day; offboarding with data destruction in 30 days; deliver as DMaaS for the first wave.
Why DMaaS for the Initial Wave
Cohesity DMaaS delivers DataProtect as a SaaS offering with the MSP as a managed-service overlay. The MSP avoids capex and operational burden by consuming Cohesity's regional SaaS instances and picks the region that satisfies tenant data-residency requirements (us-east-1 for US tenants, eu-west-1 for EU tenants).
Tenant Isolation via Organizations
Dimension
Mechanism
Tenant Impact
Storage
Per-tenant View Boxes with quotas
Tenant A cannot see Tenant B's data
Network
VLAN/VRF per tenant; tenant-scoped VIPs
L2/L3 isolation
Identity
Per-tenant SAML IdP federation
Tenant uses its own Azure AD/Okta
Encryption
Per-tenant KMIP keys (optional)
Cryptographic separation
Roles
Tenant-scoped RBAC
Tenant Admin scoped to its Organization
Reporting
Per-tenant SLA + capacity reports
Tenant sees only its consumption
The MSP retains an MSP-Admin role across Organizations for ops but cannot access tenant data without explicit, audited break-glass procedures.
Tenants log into Helios with their own SAML IdP and see only their Organization. They create Protection Groups, attach pre-approved policies (Bronze/Silver/Gold), trigger restores, view SLA dashboards, and download compliance reports. The platform's self-service surface absorbs routine tickets.
Tier
RPO
Retention
Cloud Archive
Monthly Price (illustrative)
Bronze
24 h
30 days
None
$0.05 / GB
Silver
4 h
90 days
Quarterly to S3 IA
$0.10 / GB
Gold
15 min
7 years
Monthly to S3 Glacier
$0.18 / GB
Chargeback, Onboarding, Offboarding
Helios exposes consumption metrics that the MSP pulls via REST API into its billing system. A monthly cron job joins per-Organization metrics with tier price and emits invoices. The Cohesity Terraform provider versions tenant configs alongside the rest of IaC. Onboarding: Terraform creates the Organization, View Boxes, network bindings, default policies, SAML federation; Ansible registers sources and applies the chosen tier. Offboarding: 30-day grace period with restore-only access, then cryptographic erasure by destroying the per-tenant key.
Key Takeaway: A DMaaS MSP design rests on three pillars: Organizations for tenant isolation, Helios for self-service, and APIs (REST/Terraform/Ansible) for metering and lifecycle automation. Service tiers, not custom configurations, are the unit of sale.
Key Points
DMaaS lets the MSP skip the capex and ops of running clusters; Cohesity operates the regional SaaS instances.
Organizations provide cryptographic and operational isolation across storage, network, identity, encryption, RBAC, and reporting.
Service tiers (Bronze/Silver/Gold) are the unit of sale - architects sell catalog items, not custom designs.
Helios self-service absorbs routine tenant operations - the path to scaling tenant count without scaling support headcount.
Lifecycle automation via Terraform + Ansible + REST API keeps onboarding under one day and offboarding auditable to 30 days.
Post-Section Quiz
13. The MSP wants offboarded tenant data destroyed within 30 days with auditable evidence. What design pattern delivers this?
A) Wait for the natural retention period to expire
B) 30-day restore-only grace period followed by cryptographic erasure (destroy per-tenant KMIP key); audit trail to SIEM
C) Manually delete View Boxes without logging
D) Forward tenant data to FortKnox
CCAE Exam Blueprint and Scenario Question Pattern
Pre-Section Quiz
14. The CCAE has 60 questions in 90 minutes. What is the per-question time budget?
A) 30 seconds
B) 90 seconds
C) 3 minutes
D) 5 minutes
15. Which exam domain carries the largest weight, and what is its share?
A) Domain 1 - Platform Architecture - 22%
B) Domain 2 - Solution Discovery and Design - 35%
C) Domain 3 - Security-Focused Solutions - 18%
D) Domain 5 - Gap Analysis and Troubleshooting - 12%
The Numbers You Must Internalize
Element
Value
Exam code
COH500
Duration
90 minutes
Cost
$200 USD
Passing score
60% (~36 of 60 correct)
Question count
~60 scenario MCQs
Format
4-option scenario MCQ
Validity
2 years
Retake window
14 days
Ninety minutes for sixty questions is a 90-second budget. Memorize platform vocabulary so reading time is short and decision time is long.
A scenario paragraph with constraints (RPO/RTO, budget, bandwidth, compliance, residency).
Four candidate designs, each plausible at first glance.
The correct answer satisfies all stated constraints.
Each distractor fails exactly one constraint.
Distractor Archetype
Optimizes For
Sacrifices
The Cheap Option
Lowest CapEx/OpEx
RTO, RPO, resilience
The Fortress
Maximum security
Operational simplicity, cost
The Performance Demon
Lowest RTO/RPO
Cost, retention, compliance
The Status Quo
Minimal change
Future scale, modern features
The correct answer is almost always the option that balances competing objectives while applying the right Cohesity feature for the constraint set.
Decision Criteria Checklist
Have I mapped every requirement to a measurable SLA (RPO, RTO, retention, residency, compliance)?
Does the protection technique match the workload class - VM snapshot, application-aware, agent, cloud-native, SaaS?
Is the hardware/edition appropriate for the performance, footprint, and refresh cycle?
Is the design policy-driven via Helios rather than hand-configured per job?
Are encryption, RBAC/MFA, immutability, and vaulting answered explicitly, not bolted on?
Is the design validated by a PoC artifact or as-built vs as-used review?
Key Points
COH500 is 60 scenario MCQs in 90 minutes at 60% passing - 90 seconds per question.
Domain 2 (Design) is 35% of the exam, more than a third - it cannot be skipped.
Every question follows the four-archetype distractor pattern; eliminate the option that violates a stated constraint.
The correct answer balances objectives; it is rarely the cheapest, the most secure, the fastest, or the most familiar.
Run the six-point decision checklist on every scenario before picking an option.
Post-Section Quiz
16. A scenario asks for the lowest-cost design and offers an option using consumer-grade NAS with no replication. What distractor archetype is this?
A) The Fortress
B) The Cheap Option
C) The Performance Demon
D) The Status Quo
30-Day Study Plan and Test-Day Strategy
Pre-Section Quiz
17. How should study time be allocated across the five CCAE domains?
A) Equal time per domain
B) In proportion to published domain weights (22/35/18/13/12)
C) Most time on the smallest domain
D) Random rotation
18. During the exam, two options remain after eliminating constraint-violating distractors. Which do you pick?
A) The one that maximizes a single axis
B) The one that balances competing objectives
C) The cheapest
D) The one most familiar from your environment
Figure 15.5 - 30-Day CCAE Study Plan: Four Weeks Reveal Sequentially
Phase Allocation
Phase
Days
Focus
Domain
Foundations
1-7
SpanFS, hardware, networking, Helios
1 (22%)
Solution Design
8-18
Sizing, workloads, hybrid/multi-cloud, PoC
2 (35%)
Security
19-23
DataLock, FortKnox, DataHawk, DSA
3 (18%)
Integration
24-26
REST API, Terraform, Organizations
4 (13%)
Gap Analysis
27-28
Helios capacity, Siren, pre-checks
5 (12%)
Synthesis
29-30
Two timed practice exams + remediation
All
Analogy: a marathon training schedule, not a sprint. Domain 2 is the long-run portion of the week. Domain 5 is the cooldown. The synthesis weekend is the taper; the exam is race day.
timeline
title 30-Day CCAE Study Plan
section Week 1 Foundation
Days 1-7 : SpanFS internals
: Hardware editions
: Networking and Helios
: Domain 1 (22%)
section Week 2 Design
Days 8-18 : Sizing and workload patterns
: Hybrid / multi-cloud
: PoC architectures
: Domain 2 (35%)
section Week 3 Security
Days 19-23 : DataLock and FortKnox
: DataHawk and DSA partners
: Domain 3 (18%)
section Week 4 Integration and Practice
Days 24-26 : REST API and Terraform
: Organizations and multi-tenancy
: Domain 4 (13%)
Days 27-28 : Gap analysis and Siren
: Capacity prediction
: Domain 5 (12%)
Days 29-30 : Two timed practice exams
: Error analysis and remediation
Test-Day Strategy
The Day Before
Confirm proctoring system check is green, ID is ready, environment is private.
Re-read your one-page distilled summary of domain weights, RPO/RTO patterns, and the FortKnox + DataLock + DataHawk stack.
Sleep eight hours. Do not cram.
During the Exam
First 60 seconds: scan for constraints (RPO, RTO, retention, compliance, residency, budget, existing infrastructure). Underline them mentally.
Next 30 seconds: eliminate distractors that violate any one constraint.
If two options remain, pick the one that balances rather than maximizes a single axis.
Flag any question taking more than two minutes; return on a second pass.
Reserve the final 10 minutes for flagged questions and any blanks. No penalty for guessing - never leave blank.
After the Exam
Pass: claim your two-year credential and start logging continuing-education hours.
Fail: use the 14-day window deliberately. Run a domain-by-domain post-mortem; target the weakest domain for one focused week, then retake.
Key Takeaway: The exam is 90 minutes for 60 scenario MCQs at 60% passing - 90 seconds per question. Win by reading constraints first, eliminating distractors that violate any one constraint, and picking the balanced option. Allocate study time in proportion to domain weights, prioritizing Domain 2.
Key Points
The 30-day plan is weighted: Week 2 (Design) is the longest because Domain 2 is 35% of the exam.
Synthesis weekend = two timed 60-question practice exams + targeted remediation on the weakest domain.
Test-day discipline: 60s reading constraints, 30s eliminating, balance > maximization, flag and return.
Never leave a blank - there is no penalty for guessing.
If you fail, do not retake immediately. Use the 14-day window for focused remediation on the weakest domain.
Post-Section Quiz
19. You finish reading a question and notice it took 2.5 minutes. Which is the right move?
A) Spend another 3 minutes to be sure
B) Flag it, pick a best guess, move on, and return on a second pass
C) Skip and leave blank
D) Retake the exam tomorrow
20. A scenario gives both a tight RPO and a strict residency requirement. Which option is most likely correct?
A) Fastest design that ignores residency
B) Cheapest design with no replication
C) Balanced design satisfying both RPO and residency, even if not best-in-class on either single axis
D) Status-quo design that keeps the existing tape library
Chapter Summary
This chapter pulled the entire CCAE curriculum into three end-to-end architectures and one exam strategy. The global enterprise scenario showed how hub-and-spoke replication inside each region combines with one-to-many replication across regions to deliver multi-region survival. The healthcare scenario showed how DataLock, FortKnox Self-Managed, DataHawk, KMIP, and Helios Self-Managed combine into a HIPAA-aligned defense-in-depth stack. The MSP scenario showed how Organizations, Helios self-service, and REST/Terraform/Ansible automation deliver a multi-tenant DMaaS offering with metered chargeback. The exam blueprint section decoded COH500 and made the domain weights actionable. If you can defend each design in the three scenarios above, recognize the four distractor archetypes on sight, and finish a 60-question timed practice with at least 80% under exam conditions, you are ready to sit for the CCAE.