Study Guide: Data Protection: Sources, Policies, and Protection Groups

Learning Objectives

Register and protect heterogeneous sources, including VMware vSphere, Microsoft Hyper-V, Nutanix AHV, physical Linux/Windows hosts, NAS systems via SMB/NFS/NDMP, and database engines such as Oracle, SQL Server, SAP HANA, and Exchange.
Design protection policies that align with explicit RPO, RTO, and retention SLAs, using GFS-style hierarchical retention and a tiered Gold/Silver/Bronze model.
Build Protection Groups using static membership, container-based auto-protection, and vSphere tag-based auto-protection — and choose between them based on operational maturity and audit requirements.
Optimize backup performance and reduce production impact using SmartCopy storage-snapshot integration, Changed Block Tracking (CBT), proxy distribution, and per-datastore stream throttling.
Differentiate application-consistent from crash-consistent backups and select the appropriate quiescing path per workload.

If the previous chapters built the platform — clusters, networks, identity — this chapter is where Cohesity finally earns its keep. Data Protection is the day-job: pulling backups from a sprawling, heterogeneous estate of hypervisors, file servers, databases, and SaaS tenants; storing those copies efficiently; and proving, at 3 a.m. on the worst day of someone's career, that the data can come back.

The CCAE exam tests three intersecting constructs: Sources (what you protect), Policies (how often, how long, where copies go), and Protection Groups (the binding object that stitches the two together). Master those three nouns, and most of the data-protection blueprint falls into place.

Figure 7.1: End-to-end data protection object model

Section 1: Source Registration and Discovery

Before Cohesity can protect anything, it must know that the source exists, hold credentials to talk to it, and understand its API surface. Source registration is the moment a "production system" becomes a "discoverable, protectable inventory" inside Cohesity.

vCenter, SCVMM, and Nutanix Prism Integration

For VMware environments, the primary handshake is at the vCenter level. You register vCenter once, and Cohesity walks the entire managed inventory — datacenters, clusters, hosts, resource pools, folders, datastores, tags, and individual VMs. Registration requires a service account with sufficient privileges to read inventory, snapshot VMs, and (for some restore paths) attach virtual disks. Most architects create a dedicated svc-cohesity account in vCenter rather than reusing a domain admin.

Critically, registration is the moment to set per-datastore stream caps. After Cohesity discovers all datastores, you can override global stream limits by enabling a Cap and setting a maximum number of concurrent backup streams per datastore. This is one of the most commonly overlooked exam topics: a small, hot all-flash datastore hosting tier-1 transactional workloads should not be saturated by a 32-stream backup job hammering its queues. For Microsoft Hyper-V, registration goes through SCVMM (or directly to standalone Hyper-V hosts), and Cohesity uses Resilient Change Tracking (RCT) instead of CBT for incremental detection. Nutanix AHV registers via Prism Element or Prism Central; Cohesity then uses Nutanix's native snapshot APIs.

Physical Agent for Linux and Windows

Not everything is virtualized, and Cohesity's physical agent handles the rest. The Cohesity Agent is a lightweight binary that runs on Linux or Windows and provides three modes:

File-based backup for individual filesets and directories.
Volume-based (block) backup for full-system imaging, including bare-metal recovery.
Application-aware backup for SQL, Oracle, Exchange, SharePoint, and Active Directory, where the agent coordinates with the application's quiescing API (VSS on Windows, RMAN on Oracle, VDI on SQL Server, Backint on SAP HANA).

NAS Sources via SMB/NFS and NDMP

NAS protection has two main flavors. For modern NAS (NetApp ONTAP, Dell PowerScale/Isilon, Pure FlashBlade, generic Linux NFS exporters, Windows file servers), Cohesity registers the share over SMB or NFS and walks the namespace. For legacy or large enterprise NAS where snapshot-and-stream is preferable, Cohesity drives backups via NDMP — talking directly to the array's tape-out protocol but redirecting the stream into Cohesity instead of physical tape.

Database Sources: Oracle, SQL, SAP HANA, Exchange

Oracle: The Cohesity agent integrates with RMAN as a media-management library.
SQL Server: Registration uses VDI (Virtual Device Interface) and is AAG-aware (Always-On Availability Groups).
SAP HANA: Registration plugs into the Backint API.
Exchange: Registration uses the VSS Exchange writer for application-consistent mailbox backups.

Key Points

Register vCenter once and Cohesity auto-discovers the entire managed inventory; use a dedicated svc-cohesity service account.
Set per-datastore stream caps at registration time — small all-flash datastores with tier-1 workloads should not be saturated by 32-stream jobs.
VMware uses CBT; Hyper-V uses RCT; AHV uses Nutanix's native snapshot APIs.
The Cohesity Agent supports file-based, volume-based, and application-aware modes — plan rollout via Ansible/SCCM/Puppet for fleet scale.
Modern NAS uses SMB/NFS; legacy/large-enterprise NAS often uses NDMP for array-friendly snapshot streaming.

Section 2: Policies and Schedules

A Cohesity Protection Policy is the SLA expressed in code. It encapsulates how often a backup runs (RPO), how long copies are kept (retention), where copies go (replication and archival targets), and any immutability rules. Critically, a single policy can express the entire lifecycle of a backup — from the first snapshot on local cluster storage all the way through replication to a DR site and archival to S3 Glacier seven years later.

Frequency, Retention, and Lock Attributes

The minimum RPO Cohesity can express in a standard policy is 15 minutes for hypervisor-based backups using Redirect-on-Write (RoW) snapshots. Tighter RPOs — sub-minute, even continuous — are achievable when integrated with primary array snapshots through SmartCopy.

Retention is configured as "Keep for N days/weeks/months/years" and supports DataLock attributes for compliance and ransomware resilience. Two flavors exist: Compliance Lock (truly immutable, legally enforceable) and Governance Lock (soft-immutable, can be overridden by a quorum of admins). For SOX, HIPAA, and PCI workloads, Compliance Lock is mandatory.

Hierarchical Retention (GFS)

Cohesity policies natively support the Grandfather-Father-Son (GFS) retention model — you can promote the first snapshot of each day, week, month, and year into longer-retention buckets. Combined with global variable-length deduplication, the storage cost of a 7-year monthly retention is far lower than naive arithmetic suggests, because unchanged blocks are stored once across the entire chain.

Policy Templates and Re-Use

A core architectural principle: one policy per SLA tier, not per workload. If you have 50 SQL servers, 200 file shares, and 1,200 VMs all in the "Gold" tier, they should all reference the same Gold policy. When the SLA changes — and it will — you edit one object instead of 1,450.

The SLA Analogy: A protection policy is the SLA contract; a Protection Group is the customer roster. The same Gold contract can be sold to a hundred customers (Protection Groups), and changing the contract terms automatically updates all subscribers.

Reference SLA Tier Design (Gold/Silver/Bronze)

Tier	Frequency (RPO)	Local Retention	Replication	Archive	Target RTO
Gold	Every 15 min via SmartCopy	30 days, app-consistent	Async every cycle to DR	Monthly, 7+ yrs, Compliance Lock	Minutes (Instant Mass Restore)
Silver	Every 4–6 hrs, CBT/RCT	14–30 days	Async daily to DR	Monthly, 3–5 yrs	< 1 hour
Bronze	Daily, crash-consistent OK	7–14 days	None or weekly	Quarterly, 1 yr	< 4 hours

Figure 7.2: Tiered policy decision tree

Pre-Quiz: Policies and Schedules

3. Which DataLock flavor is required for SOX, HIPAA, and PCI workloads where backup deletion must be impossible even with admin override?

A. Governance Lock B. Compliance Lock C. Quorum Lock D. SnapLock Enterprise

4. An enterprise has 50 SQL servers, 200 file shares, and 1,200 VMs all classified as Gold-tier. What is the recommended policy design?

A. Create one policy per workload type (3 policies) B. Create one policy per object (1,450 policies) C. Create a single Gold policy and reference it from many Protection Groups D. Use auto-generated policies from Helios

5. What is the minimum RPO Cohesity supports in a standard policy for hypervisor-based backups?

A. 1 minute B. 5 minutes C. 15 minutes D. 1 hour

Post-Quiz: Policies and Schedules

3. Which DataLock flavor is required for SOX, HIPAA, and PCI workloads where backup deletion must be impossible even with admin override?

A. Governance Lock B. Compliance Lock C. Quorum Lock D. SnapLock Enterprise

4. An enterprise has 50 SQL servers, 200 file shares, and 1,200 VMs all classified as Gold-tier. What is the recommended policy design?

5. What is the minimum RPO Cohesity supports in a standard policy for hypervisor-based backups?

A. 1 minute B. 5 minutes C. 15 minutes D. 1 hour

Section 3: Protection Groups

A Protection Group is the binding object that connects a set of source objects to a single policy. It also holds operational settings such as proxy assignment, indexing options, pre/post scripts, application-quiesce flags, and exclude lists. The architectural decision that dominates Protection Group design is how membership is determined: statically, by container, or by tag.

Static Membership vs. Tag-Based Auto-Protection

Static membership means the administrator hand-picks individual objects at job-creation time. The Protection Group's scope never changes unless someone edits it. This is highly deterministic and auditable — but the risk is silent under-protection: a new VM provisioned by a junior engineer in a regulated environment may not appear in any Protection Group for weeks until someone notices.

Auto-protect automatically protects new VMs added to selected parent objects — datacenters, folders, clusters, hosts, resource pools — and supports vSphere tags for inclusion and exclusion. New VMs added to that container are automatically swept into the next backup run.

Auto-Protect with vSphere Tags — The AND/OR Quirk

Tag-based auto-protect has a non-obvious behavior that frequently appears on the CCAE exam:

Adding tags one-by-one with exclude selected produces an OR operation — Cohesity excludes any VM with any of the listed tags.
Adding multiple tags simultaneously in a single operation produces an AND operation — Cohesity excludes only VMs with all the listed tags.

If you want to exclude everything tagged dev or lab, add them one-by-one (OR). If you want to exclude only VMs that are both dev and decommissioned, add them together (AND). Misreading this distinction has caused architects to either over-protect (tagging an entire dev fleet into Gold) or under-protect (silently excluding production workloads).

Figure 7.3: Auto-protect via vSphere tags — dynamic membership update flow

When to Use Each Membership Model

Membership Model	Best For	Risks	Audit Posture
Static	PCI/HIPAA/SOX-scoped VMs; small, slow-changing high-value sets	New VMs silently unprotected	Strongest — explicit list
Container auto-protect	Well-organized vSphere with folder-per-business-unit	Folder reorganization can shift scope	Good — provided folder hygiene
Tag auto-protect	Cross-cutting concerns (tier=gold, app=sql) where folder structure is contested	Tag drift, AND/OR confusion	Moderate — requires tag governance

App-Consistent vs. Crash-Consistent Backups

A crash-consistent backup captures the disk state as if the system had been suddenly powered off. An app-consistent backup pauses the application briefly so it can flush buffers, checkpoint state, and quiesce I/O before the snapshot is taken. On Windows, this is VSS; for databases, it is the engine's own quiesce API (RMAN, VDI, Backint). Architects should default to app-consistent for any VM running a database or transactional system.

Pre-Quiz: Protection Groups

6. An architect adds two exclusion tags dev and lab to a Cohesity Protection Group's auto-protect filter, in two separate "add" operations. What is the resulting exclusion logic?

A. AND — only VMs tagged with both dev AND lab are excluded B. OR — VMs tagged with dev OR lab are excluded C. NAND — VMs tagged with neither are excluded D. The order of addition does not matter; result is always AND

7. A regulated PCI environment requires explicit, auditable proof that every cardholder-data VM is included in a Protection Group. Which membership model best meets this requirement?

A. Tag-based auto-protect with include filter B. Container auto-protect on the entire datacenter C. Static membership with explicit VM list D. Resource-pool auto-protect

8. A Protection Group backs up a VM hosting a SQL Server transactional database. The architect should configure the backup as:

A. Crash-consistent only — quiesce overhead is unacceptable B. App-consistent via VSS quiesce coordinated by the agent C. Filesystem-only — exclude SQL data files D. Disable indexing and rely on database native backups

Post-Quiz: Protection Groups

6. An architect adds two exclusion tags dev and lab to a Cohesity Protection Group's auto-protect filter, in two separate "add" operations. What is the resulting exclusion logic?

7. A regulated PCI environment requires explicit, auditable proof that every cardholder-data VM is included in a Protection Group. Which membership model best meets this requirement?

A. Tag-based auto-protect with include filter B. Container auto-protect on the entire datacenter C. Static membership with explicit VM list D. Resource-pool auto-protect

8. A Protection Group backs up a VM hosting a SQL Server transactional database. The architect should configure the backup as:

Section 4: Performance and Concurrency

A perfectly designed policy is worthless if backups run hot enough to crash production. Performance tuning sits at the intersection of source impact, network bandwidth, proxy capacity, and cluster ingest throughput.

SmartCopy and Storage Snapshot Integration

SmartCopy is Cohesity's snapshot-based copy and replication mechanism that integrates directly with primary storage arrays — most prominently Pure Storage FlashArray, but also NetApp, HPE Nimble/Primera, and Dell PowerStore via partner integrations. Rather than running a hypervisor-side or in-guest backup that competes with production I/O, Cohesity drives the array's own snapshot APIs and ingests data from those snapshots.

The architecture flow:

Discovery — Register the Pure FlashArray as a source.
Policy assignment — Assign a Protection Policy with snapshot frequency, retention on the array, retention on Cohesity, replication, and archive.
Snapshot creation — At schedule time, Cohesity calls the Pure REST API. Optional pre/post scripts quiesce SQL/Oracle/Exchange.
Mount and read — Cohesity mounts the snapshot via iSCSI, reads only changed blocks, and ingests through inline dedupe and compression.
Retention tiering — Recent snapshots remain on Pure for instant restore at flash speed; older snapshots tier to Cohesity for long-term recovery.
Recovery — Volume-level restore back to any Pure FlashArray, file-level via SmartFiles mount, or cross-platform to native cloud VMs.

The exam-relevant point: SmartCopy enables sub-15-minute RPOs with zero hypervisor overhead and is the canonical Gold-tier mechanism for transactional databases sitting on Pure.

Figure 7.4: SmartCopy with Pure FlashArray — orchestration sequence

sequenceDiagram participant App as SQL/Oracle App participant Cohesity as Cohesity Cluster participant Pure as Pure FlashArray participant Archive as CloudArchive (S3) Cohesity->>App: Pre-script: Quiesce (VSS/RMAN) App-->>Cohesity: Quiesce ACK Cohesity->>Pure: REST API: Take Snapshot Pure-->>Pure: Native array snapshot created Pure-->>Cohesity: Snapshot ID Cohesity->>App: Post-script: Release quiesce Cohesity->>Pure: Mount snapshot (iSCSI) Pure-->>Cohesity: Stream changed blocks only Cohesity-->>Cohesity: Inline dedupe + compression Note over Pure: Recent snapshots retained
on flash for instant restore Note over Cohesity: Older snapshots tier
to Cohesity DataPlatform Cohesity->>Archive: Tier monthly to S3 Glacier

CBT/RCT and Incremental Forever

For non-array-integrated VM backups, Cohesity uses CBT on VMware and RCT on Hyper-V. The hypervisor maintains a bitmap of changed blocks since the last backup; Cohesity reads only those changed blocks. Combined with global variable-length deduplication on ingest, this delivers an Incremental Forever model: a single full backup at job inception, then deltas only.

Throttling and QoS

Cohesity supports time-windowed bandwidth throttling for replication and archive traffic — for example, capping replication to 200 Mbps during business hours and lifting the cap overnight. Per-policy QoS lets you mark Gold backups higher-priority than Bronze on a shared cluster.

Pre-Quiz: Performance and Concurrency

9. A Gold-tier SQL database lives on a Pure FlashArray and requires a 5-minute RPO with minimal hypervisor overhead. Which Cohesity mechanism best meets this requirement?

A. Hypervisor-level CBT backup with 5-minute scheduling B. SmartCopy integrated with Pure's native snapshot API C. Daily NDMP backup with intermediate transaction logs D. CloudArchive with 5-minute lifecycle

10. In a SmartCopy + Pure deployment, where do the most recent snapshots reside, and why?

A. On the Cohesity cluster, for inline dedupe before serving restores B. On the Pure array, so restores happen at flash speed without rehydration C. In S3 Glacier, for the lowest cost D. In NVRAM on the Cohesity nodes

11. A small all-flash datastore hosting a tier-1 OLTP database is being saturated by a 32-stream Cohesity backup job. What is the most appropriate fix?

A. Disable CBT and fall back to full reads B. Add more Cohesity proxies C. Configure a per-datastore stream cap on that datastore D. Move the backup to crash-consistent mode

Post-Quiz: Performance and Concurrency

9. A Gold-tier SQL database lives on a Pure FlashArray and requires a 5-minute RPO with minimal hypervisor overhead. Which Cohesity mechanism best meets this requirement?

10. In a SmartCopy + Pure deployment, where do the most recent snapshots reside, and why?

11. A small all-flash datastore hosting a tier-1 OLTP database is being saturated by a 32-stream Cohesity backup job. What is the most appropriate fix?

A. Disable CBT and fall back to full reads B. Add more Cohesity proxies C. Configure a per-datastore stream cap on that datastore D. Move the backup to crash-consistent mode

Chapter Summary

This chapter unpacked the trio of objects that drive Cohesity data protection: Sources, Policies, and Protection Groups. You learned how to register vCenter, SCVMM, Prism, physical hosts, NAS, and databases — and the importance of per-datastore stream caps set at registration time. You walked through GFS hierarchical retention, DataLock immutability, and the tiered Gold/Silver/Bronze SLA model. You compared static membership against container- and tag-based auto-protect, including the AND/OR tag logic that catches careless architects on the exam. And you traced how SmartCopy with Pure FlashArray enables 15-minute RPOs without hypervisor overhead.

Hold the analogy in mind: a policy is the SLA contract; a Protection Group is the customer roster.

Chapter 7: Data Protection: Sources, Policies, and Protection Groups

Learning Objectives

Figure 7.1: End-to-end data protection object model

Section 1: Source Registration and Discovery

vCenter, SCVMM, and Nutanix Prism Integration

Physical Agent for Linux and Windows

NAS Sources via SMB/NFS and NDMP

Database Sources: Oracle, SQL, SAP HANA, Exchange

Key Points

Section 2: Policies and Schedules

Frequency, Retention, and Lock Attributes

Hierarchical Retention (GFS)

Policy Templates and Re-Use

Reference SLA Tier Design (Gold/Silver/Bronze)

Figure 7.2: Tiered policy decision tree

Key Points

Section 3: Protection Groups

Static Membership vs. Tag-Based Auto-Protection

Auto-Protect with vSphere Tags — The AND/OR Quirk

Figure 7.3: Auto-protect via vSphere tags — dynamic membership update flow

When to Use Each Membership Model

App-Consistent vs. Crash-Consistent Backups

Key Points

Section 4: Performance and Concurrency

SmartCopy and Storage Snapshot Integration

Figure 7.4: SmartCopy with Pure FlashArray — orchestration sequence

CBT/RCT and Incremental Forever

Throttling and QoS

Key Points

Chapter Summary

Your Progress

Answer Explanations