Chapter 1: CCAE Exam Overview and Cohesity Platform Architecture
Learning Objectives
Describe the CCAE exam blueprint, domains, weightings, and prerequisites.
Explain the high-level architecture of the Cohesity DataPlatform and its core services.
Differentiate between Cohesity DataProtect, SmartFiles, SiteContinuity, and Helios.
Identify how the CCAE role fits within the Cohesity certification track (CCSE, CCPE, CCAE).
Map physical, virtual, and cloud form factors to specific architectural use cases.
Section 1: CCAE Exam Blueprint and Study Strategy
Architects who pursue the Cohesity Certified Architect Expert (CCAE) credential are signaling that they can do more than operate a backup platform — they can size, design, and defend a Cohesity Data Cloud deployment in front of customers, security teams, and CIOs. This section unpacks what the exam tests, who it is for, and how to study for it efficiently.
Pre-Quiz: Section 1
1. Which CCAE exam domain carries the largest weight, and what does that emphasize about the exam's character?
Domain 1 (Platform Architecture, 22%) — the exam is about memorizing services and APIs.
Domain 2 (Solution Discovery and Design, 35%) — the exam is design-oriented, not CLI-trivia.
Domain 3 (Security-Focused Solutions, 18%) — the exam is primarily a security certification.
Domain 4 (Third-Party Integration, 13%) — the exam focuses on API and connector mastery.
2. An architect with deep DataProtect operations background but limited security knowledge is planning a 30-day study plan. What is the most defensible re-allocation of study time?
Spend the bulk of time on Domain 1 since it most aligns with their operational background.
Spread time evenly across the four domains regardless of background.
Invert the time allocation slightly and over-invest in Domain 3 (security) where scenarios are unfamiliar.
Skip Domain 4 entirely because it carries the smallest weight.
3. Which combination most accurately describes the CCAE exam delivery facts?
CCAE sits at the architect/expert tier above CCA, CCSE, and CCPE; the path is “why design” vs CCPE’s “how to operate”.
Allocate study time roughly proportional to domain weight, then over-invest in your weakest domain.
The most reliable lab pairs a Helios sandbox tenant with a Virtual Edition or Cloud Edition cluster for hands-on practice.
Post-Quiz: Section 1
1. Which CCAE exam domain carries the largest weight, and what does that emphasize about the exam's character?
Domain 1 (Platform Architecture, 22%) — the exam is about memorizing services and APIs.
Domain 2 (Solution Discovery and Design, 35%) — the exam is design-oriented, not CLI-trivia.
Domain 3 (Security-Focused Solutions, 18%) — the exam is primarily a security certification.
Domain 4 (Third-Party Integration, 13%) — the exam focuses on API and connector mastery.
2. An architect with deep DataProtect operations background but limited security knowledge is planning a 30-day study plan. What is the most defensible re-allocation of study time?
Spend the bulk of time on Domain 1 since it most aligns with their operational background.
Spread time evenly across the four domains regardless of background.
Invert the time allocation slightly and over-invest in Domain 3 (security) where scenarios are unfamiliar.
Skip Domain 4 entirely because it carries the smallest weight.
3. Which combination most accurately describes the CCAE exam delivery facts?
Every CCAE exam scenario eventually traces back to four architectural pillars: a single distributed file system, a hyperconverged scale-out node model, MapReduce-style background services, and strict consistency. If you internalize these pillars, you can derive most design answers from first principles.
Pre-Quiz: Section 2
1. SpanFS supports unlimited snapshots and clones with effectively zero performance penalty. Which underlying technology delivers this property?
RAID-6 with periodic full clones.
SnapTree, which implements Distributed Redirect-on-Write (D-ROW) on a B+ tree metadata store.
VMware vSphere CBT replicated across nodes.
Tape-out staging with linear read-only locks.
2. Why do typical Cohesity clusters require a minimum of three or four nodes depending on form factor?
To physically fit the chassis backplane.
Because Paxos-style quorum requires a majority of nodes to remain healthy to accept writes; smaller clusters cannot tolerate a single-node loss.
Because each protocol (NFS, SMB, S3) requires its own dedicated node.
Because Apollo MapReduce jobs require exactly three workers.
3. A customer protects 500 Windows VMs averaging 80 GB each (FETB ~40 TB) with significant OS overlap. Which cluster behavior most accurately explains why effective stored capacity lands near 5-7 TB?
Bridge applies fixed-block local dedupe per node.
Apollo’s MapReduce post-process re-dedupe and garbage collection sustain global variable-length dedupe (4-6x) plus inline compression (1.5-2x).
Magneto compresses VMs at the source agent before transmit only.
Iris caches duplicate blocks at the management layer.
4. What does “strict consistency” in SpanFS mean for client behavior?
Clients must always connect to the master metadata node for the latest data.
Any node can serve any I/O for any object and clients always see the latest committed state.
Reads are eventually consistent; only writes are strict.
Clients see committed data only after a 30-second propagation delay.
5. Compared with a traditional two-tier backup architecture, which property is unique to a Cohesity hyperconverged scale-out cluster?
A dedicated metadata controller decouples capacity and compute scaling.
Add-capacity workflows are re-rack and migrate operations.
Throughput and metadata capacity scale linearly with node count, with no privileged master node.
A single-node failure typically halts the entire array.
SpanFS Layered Subsystems
Access Layer — exposes NFS, SMB, S3 (plus OST and DirectIO for NetBackup) on the same volumes via virtual IPs, with no master node and no protocol-specific choke point.
I/O Engine — chunks data, performs variable-length global dedupe (inline or post-process), compresses, encrypts, indexes, and tiers blocks across SSD, HDD, and cloud.
Metadata Management — distributed key-value store on a patented B+ tree, replicated and sharded consistently. SnapTree delivers Distributed Redirect-on-Write (D-ROW) for unlimited snaps and clones.
Storage and Distribution — fully distributed across hyperconverged x86 nodes, dynamically rebalanced, protected by erasure coding or replication.
Hyperconverged vs. Traditional
Property
Traditional Two-Tier Backup
Cohesity Hyperconverged
Compute / storage scaling
Independent, often imbalanced
Coupled, balanced per node
Metadata controller
Dedicated server, bottleneck risk
Distributed across all nodes
Add-capacity workflow
Re-rack, re-license, migrate
Add ReadyNode, auto-rebalance
Failure blast radius
Often whole-array
Single node, EC-bounded
Key Points — Section 2
SpanFS combines a distributed-metadata B+ tree, multi-protocol access, SnapTree D-ROW snapshots, and global dedupe on one tier.
The hyperconverged scale-out node model means there is no privileged master — throughput and metadata grow with node count.
Apollo’s MapReduce-style background jobs — not Bridge — sustain dedupe ratios over time via post-process re-dedupe and garbage collection.
Strict consistency lets any node serve any I/O while always reflecting the latest committed state.
Paxos-style quorum drives the three- or four-node cluster minimums you see in every sizing question.
Post-Quiz: Section 2
1. SpanFS supports unlimited snapshots and clones with effectively zero performance penalty. Which underlying technology delivers this property?
RAID-6 with periodic full clones.
SnapTree, which implements Distributed Redirect-on-Write (D-ROW) on a B+ tree metadata store.
VMware vSphere CBT replicated across nodes.
Tape-out staging with linear read-only locks.
2. Why do typical Cohesity clusters require a minimum of three or four nodes depending on form factor?
To physically fit the chassis backplane.
Because Paxos-style quorum requires a majority of nodes to remain healthy to accept writes; smaller clusters cannot tolerate a single-node loss.
Because each protocol (NFS, SMB, S3) requires its own dedicated node.
Because Apollo MapReduce jobs require exactly three workers.
3. A customer protects 500 Windows VMs averaging 80 GB each (FETB ~40 TB) with significant OS overlap. Which cluster behavior most accurately explains why effective stored capacity lands near 5-7 TB?
Bridge applies fixed-block local dedupe per node.
Apollo’s MapReduce post-process re-dedupe and garbage collection sustain global variable-length dedupe (4-6x) plus inline compression (1.5-2x).
Magneto compresses VMs at the source agent before transmit only.
Iris caches duplicate blocks at the management layer.
4. What does “strict consistency” in SpanFS mean for client behavior?
Clients must always connect to the master metadata node for the latest data.
Any node can serve any I/O for any object and clients always see the latest committed state.
Reads are eventually consistent; only writes are strict.
Clients see committed data only after a 30-second propagation delay.
5. Compared with a traditional two-tier backup architecture, which property is unique to a Cohesity hyperconverged scale-out cluster?
A dedicated metadata controller decouples capacity and compute scaling.
Add-capacity workflows are re-rack and migrate operations.
Throughput and metadata capacity scale linearly with node count, with no privileged master node.
A single-node failure typically halts the entire array.
Section 3: Core Services and Software Stack
Beneath the CCAE exam’s scenario language is a handful of cooperating services. Memorizing what they do — and what they don’t do — is one of the highest-yield activities for Domain 1.
Pre-Quiz: Section 3
1. A scenario states: “The cluster reclaims unused capacity overnight, runs post-process dedupe across all data, and rebuilds file analytics indices.” Which service owns this behavior?
Bridge.
Apollo — cluster-wide MapReduce-style background services.
Magneto — data protection orchestration.
Iris — UI and RBAC.
2. Which service is responsible for chunking, dedupe, compression, encryption, erasure coding, tiering, and serving the NFS/SMB/S3 protocol stacks?
Magneto.
Yoda.
Bridge — the SpanFS data path.
ScribeStore.
3. A customer wants to find every PDF named contract-2024.pdf across 14 clusters in 8 regions and have results returned in seconds. Which service makes this possible?
Iris — per-cluster UI search.
ScribeStore — key-value metadata.
Yoda — the global indexing and search service surfaced through Helios.
Bridge — data path scan on each cluster.
4. Which statement about Helios is correct?
Helios stores all customer backup data centrally for SaaS access.
Helios is the SaaS multicloud control and insight plane; it does not host customer backup data, only control and observability.
Helios runs only inside a customer’s data center as on-premises software with no SaaS option.
Helios is a backup agent installed on protected workloads.
5. The Cohesity App Framework runs containerized third-party apps on cluster nodes. What is its primary architectural advantage?
It moves data off-cluster to dedicated analytics servers.
It exploits data gravity — apps run where the data already lives, avoiding petabyte-scale network transfers, and are sandboxed away from the data path.
It bypasses RBAC for performance reasons.
It replaces Bridge for protocol serving.
Layered Cohesity DataPlatform — Hardware up to Helios
Bridge = data path. Apollo = background MapReduce. Magneto = orchestration. Iris = UI/API/RBAC.
ScribeStore is the underlying KV metadata store; Yoda powers global cross-cluster search through Helios.
Helios is SaaS control + insight only — it does not host customer backup data; Helios Self-Managed serves dark sites.
The App Framework runs sandboxed containers on cluster nodes for analytics, AV, eDiscovery — data gravity, not data movement.
Most exam scenarios reduce to identifying which service owns a given behavior.
Post-Quiz: Section 3
1. A scenario states: “The cluster reclaims unused capacity overnight, runs post-process dedupe across all data, and rebuilds file analytics indices.” Which service owns this behavior?
Bridge.
Apollo — cluster-wide MapReduce-style background services.
Magneto — data protection orchestration.
Iris — UI and RBAC.
2. Which service is responsible for chunking, dedupe, compression, encryption, erasure coding, tiering, and serving the NFS/SMB/S3 protocol stacks?
Magneto.
Yoda.
Bridge — the SpanFS data path.
ScribeStore.
3. A customer wants to find every PDF named contract-2024.pdf across 14 clusters in 8 regions and have results returned in seconds. Which service makes this possible?
Iris — per-cluster UI search.
ScribeStore — key-value metadata.
Yoda — the global indexing and search service surfaced through Helios.
Bridge — data path scan on each cluster.
4. Which statement about Helios is correct?
Helios stores all customer backup data centrally for SaaS access.
Helios is the SaaS multicloud control and insight plane; it does not host customer backup data, only control and observability.
Helios runs only inside a customer’s data center as on-premises software with no SaaS option.
Helios is a backup agent installed on protected workloads.
5. The Cohesity App Framework runs containerized third-party apps on cluster nodes. What is its primary architectural advantage?
It moves data off-cluster to dedicated analytics servers.
It exploits data gravity — apps run where the data already lives, avoiding petabyte-scale network transfers, and are sandboxed away from the data path.
It bypasses RBAC for performance reasons.
It replaces Bridge for protocol serving.
Section 4: Hardware, Cloud, and Virtual Edition Form Factors
Domain 2 routinely asks you to choose a form factor. The wrong hardware decision can sink an otherwise-correct architecture, so understand the trade-offs across Cohesity-branded appliances, ReadyNodes, Virtual Edition, Cloud Edition, and Robo Edition.
Pre-Quiz: Section 4
1. An insurance company runs 18 branch offices, each with ~2 TB of data, no local IT, and a need to replicate to a regional hub. Which Cohesity form factor is the architect-grade pick?
A pair of physical ReadyNodes per branch with full local IT staffing.
Robo Edition replicating back to the regional hub, centrally managed by Helios.
Cloud Edition in AWS us-east-1 for each branch.
Virtual Edition on a single laptop per branch.
2. A regulated, classified, dark-site customer cannot use any SaaS dependency. Which combination is appropriate?
Cloud Edition in AWS GovCloud + Helios SaaS.
Physical cluster + Helios SaaS only.
Physical cluster + Helios Self-Managed for on-prem fleet management.
Virtual Edition on a public cloud hypervisor + Helios SaaS.
3. Which statement about Cloud Edition is most accurate?
Cloud Edition runs the same Bridge/Apollo/Magneto stack as physical clusters but uses cloud-provider block storage; it is the foundation of CloudReplicate and CloudSpin patterns.
Cloud Edition replaces SpanFS with native AWS S3 to avoid software dependencies.
Cloud Edition is a SaaS-only offering with no customer-controlled VMs.
Cloud Edition runs only on-premises with cloud archive disabled.
4. A Cisco UCS shop wants Cohesity but must align procurement with their existing OEM partnership. Which form factor fits?
ReadyNodes are the workhorse physical form factor — OEM hardware (Cisco, HPE, Dell, Lenovo) running Cohesity software.
Virtual Edition handles lab, ROBO, dark-site, and management clusters where physical hardware is uneconomical.
Cloud Edition runs natively in AWS/Azure/GCP and is the foundation of CloudReplicate and CloudSpin patterns.
Robo Edition is a small-footprint variant for branch offices, replicated to a primary cluster and centrally managed via Helios.
Form-factor choice is always downstream of RPO/RTO, sovereignty, and branch IT capability — never the starting point.
Post-Quiz: Section 4
1. An insurance company runs 18 branch offices, each with ~2 TB of data, no local IT, and a need to replicate to a regional hub. Which Cohesity form factor is the architect-grade pick?
A pair of physical ReadyNodes per branch with full local IT staffing.
Robo Edition replicating back to the regional hub, centrally managed by Helios.
Cloud Edition in AWS us-east-1 for each branch.
Virtual Edition on a single laptop per branch.
2. A regulated, classified, dark-site customer cannot use any SaaS dependency. Which combination is appropriate?
Cloud Edition in AWS GovCloud + Helios SaaS.
Physical cluster + Helios SaaS only.
Physical cluster + Helios Self-Managed for on-prem fleet management.
Virtual Edition on a public cloud hypervisor + Helios SaaS.
3. Which statement about Cloud Edition is most accurate?
Cloud Edition runs the same Bridge/Apollo/Magneto stack as physical clusters but uses cloud-provider block storage; it is the foundation of CloudReplicate and CloudSpin patterns.
Cloud Edition replaces SpanFS with native AWS S3 to avoid software dependencies.
Cloud Edition is a SaaS-only offering with no customer-controlled VMs.
Cloud Edition runs only on-premises with cloud archive disabled.
4. A Cisco UCS shop wants Cohesity but must align procurement with their existing OEM partnership. Which form factor fits?