Chapter 7: Data Protection: Sources, Policies, and Protection Groups
Learning Objectives
Register and protect heterogeneous sources, including VMware vSphere, Microsoft Hyper-V, Nutanix AHV, physical Linux/Windows hosts, NAS systems via SMB/NFS/NDMP, and database engines such as Oracle, SQL Server, SAP HANA, and Exchange.
Design protection policies that align with explicit RPO, RTO, and retention SLAs, using GFS-style hierarchical retention and a tiered Gold/Silver/Bronze model.
Build Protection Groups using static membership, container-based auto-protection, and vSphere tag-based auto-protection — and choose between them based on operational maturity and audit requirements.
Optimize backup performance and reduce production impact using SmartCopy storage-snapshot integration, Changed Block Tracking (CBT), proxy distribution, and per-datastore stream throttling.
Differentiate application-consistent from crash-consistent backups and select the appropriate quiescing path per workload.
If the previous chapters built the platform — clusters, networks, identity — this chapter is where Cohesity finally earns its keep. Data Protection is the day-job: pulling backups from a sprawling, heterogeneous estate of hypervisors, file servers, databases, and SaaS tenants; storing those copies efficiently; and proving, at 3 a.m. on the worst day of someone's career, that the data can come back.
The CCAE exam tests three intersecting constructs: Sources (what you protect), Policies (how often, how long, where copies go), and Protection Groups (the binding object that stitches the two together). Master those three nouns, and most of the data-protection blueprint falls into place.
Figure 7.1: End-to-end data protection object model
flowchart TD
A[Source vCenter / Hyper-V / Prism / Physical / NAS / DB] --> B[Protection Group Membership: Static / Container / Tag]
B --> C[Policy SLA Contract]
C --> D[Schedule Frequency / RPO]
C --> E[Retention GFS Hierarchy + DataLock]
D --> F[Snapshots Local Cluster Storage]
E --> F
F --> G[Replication DR Cluster]
F --> H[Archive CloudArchive / S3 Glacier]
style A fill:#1f6feb,color:#fff
style C fill:#238636,color:#fff
style F fill:#8957e5,color:#fff
Animation 1: Data Flow — Source through Protection Group, Policy, Snapshots, and beyond
Section 1: Source Registration and Discovery
Before Cohesity can protect anything, it must know that the source exists, hold credentials to talk to it, and understand its API surface. Source registration is the moment a "production system" becomes a "discoverable, protectable inventory" inside Cohesity.
vCenter, SCVMM, and Nutanix Prism Integration
For VMware environments, the primary handshake is at the vCenter level. You register vCenter once, and Cohesity walks the entire managed inventory — datacenters, clusters, hosts, resource pools, folders, datastores, tags, and individual VMs. Registration requires a service account with sufficient privileges to read inventory, snapshot VMs, and (for some restore paths) attach virtual disks. Most architects create a dedicated svc-cohesity account in vCenter rather than reusing a domain admin.
Critically, registration is the moment to set per-datastore stream caps. After Cohesity discovers all datastores, you can override global stream limits by enabling a Cap and setting a maximum number of concurrent backup streams per datastore. This is one of the most commonly overlooked exam topics: a small, hot all-flash datastore hosting tier-1 transactional workloads should not be saturated by a 32-stream backup job hammering its queues. For Microsoft Hyper-V, registration goes through SCVMM (or directly to standalone Hyper-V hosts), and Cohesity uses Resilient Change Tracking (RCT) instead of CBT for incremental detection. Nutanix AHV registers via Prism Element or Prism Central; Cohesity then uses Nutanix's native snapshot APIs.
Physical Agent for Linux and Windows
Not everything is virtualized, and Cohesity's physical agent handles the rest. The Cohesity Agent is a lightweight binary that runs on Linux or Windows and provides three modes:
File-based backup for individual filesets and directories.
Volume-based (block) backup for full-system imaging, including bare-metal recovery.
Application-aware backup for SQL, Oracle, Exchange, SharePoint, and Active Directory, where the agent coordinates with the application's quiescing API (VSS on Windows, RMAN on Oracle, VDI on SQL Server, Backint on SAP HANA).
NAS Sources via SMB/NFS and NDMP
NAS protection has two main flavors. For modern NAS (NetApp ONTAP, Dell PowerScale/Isilon, Pure FlashBlade, generic Linux NFS exporters, Windows file servers), Cohesity registers the share over SMB or NFS and walks the namespace. For legacy or large enterprise NAS where snapshot-and-stream is preferable, Cohesity drives backups via NDMP — talking directly to the array's tape-out protocol but redirecting the stream into Cohesity instead of physical tape.
Database Sources: Oracle, SQL, SAP HANA, Exchange
Oracle: The Cohesity agent integrates with RMAN as a media-management library.
SQL Server: Registration uses VDI (Virtual Device Interface) and is AAG-aware (Always-On Availability Groups).
SAP HANA: Registration plugs into the Backint API.
Exchange: Registration uses the VSS Exchange writer for application-consistent mailbox backups.
Key Points
Register vCenter once and Cohesity auto-discovers the entire managed inventory; use a dedicated svc-cohesity service account.
Set per-datastore stream caps at registration time — small all-flash datastores with tier-1 workloads should not be saturated by 32-stream jobs.
The Cohesity Agent supports file-based, volume-based, and application-aware modes — plan rollout via Ansible/SCCM/Puppet for fleet scale.
Modern NAS uses SMB/NFS; legacy/large-enterprise NAS often uses NDMP for array-friendly snapshot streaming.
Pre-Quiz: Source Registration
1. An architect is registering a vCenter source on Cohesity. Which configuration step at registration time is most commonly overlooked but materially impacts production I/O?
2. Which change-tracking mechanism does Cohesity use for Microsoft Hyper-V incremental backups?
Post-Quiz: Source Registration
1. An architect is registering a vCenter source on Cohesity. Which configuration step at registration time is most commonly overlooked but materially impacts production I/O?
2. Which change-tracking mechanism does Cohesity use for Microsoft Hyper-V incremental backups?
Section 2: Policies and Schedules
A Cohesity Protection Policy is the SLA expressed in code. It encapsulates how often a backup runs (RPO), how long copies are kept (retention), where copies go (replication and archival targets), and any immutability rules. Critically, a single policy can express the entire lifecycle of a backup — from the first snapshot on local cluster storage all the way through replication to a DR site and archival to S3 Glacier seven years later.
Frequency, Retention, and Lock Attributes
The minimum RPO Cohesity can express in a standard policy is 15 minutes for hypervisor-based backups using Redirect-on-Write (RoW) snapshots. Tighter RPOs — sub-minute, even continuous — are achievable when integrated with primary array snapshots through SmartCopy.
Retention is configured as "Keep for N days/weeks/months/years" and supports DataLock attributes for compliance and ransomware resilience. Two flavors exist: Compliance Lock (truly immutable, legally enforceable) and Governance Lock (soft-immutable, can be overridden by a quorum of admins). For SOX, HIPAA, and PCI workloads, Compliance Lock is mandatory.
Hierarchical Retention (GFS)
Cohesity policies natively support the Grandfather-Father-Son (GFS) retention model — you can promote the first snapshot of each day, week, month, and year into longer-retention buckets. Combined with global variable-length deduplication, the storage cost of a 7-year monthly retention is far lower than naive arithmetic suggests, because unchanged blocks are stored once across the entire chain.
Policy Templates and Re-Use
A core architectural principle: one policy per SLA tier, not per workload. If you have 50 SQL servers, 200 file shares, and 1,200 VMs all in the "Gold" tier, they should all reference the same Gold policy. When the SLA changes — and it will — you edit one object instead of 1,450.
The SLA Analogy: A protection policy is the SLA contract; a Protection Group is the customer roster. The same Gold contract can be sold to a hundred customers (Protection Groups), and changing the contract terms automatically updates all subscribers.
Reference SLA Tier Design (Gold/Silver/Bronze)
Tier
Frequency (RPO)
Local Retention
Replication
Archive
Target RTO
Gold
Every 15 min via SmartCopy
30 days, app-consistent
Async every cycle to DR
Monthly, 7+ yrs, Compliance Lock
Minutes (Instant Mass Restore)
Silver
Every 4–6 hrs, CBT/RCT
14–30 days
Async daily to DR
Monthly, 3–5 yrs
< 1 hour
Bronze
Daily, crash-consistent OK
7–14 days
None or weekly
Quarterly, 1 yr
< 4 hours
Figure 7.2: Tiered policy decision tree
graph TD
A[Workload SLA Requirement] --> B{RPO needed?}
B -->|<= 15 min| C{RTO needed?}
B -->|4-6 hours| D[Silver Tier]
B -->|>= 24 hours| E[Bronze Tier]
C -->|Minutes Instant Mass Restore| F[Gold Tier]
C -->|< 1 hour| D
F --> G[SmartCopy + Pure 30d local + DR replication 7y archive + Compliance Lock]
D --> H[CBT/RCT Hypervisor 14-30d local + DR daily 3-5y archive]
E --> I[Daily crash-consistent 7-14d local 1y archive]
style F fill:#d4af37,color:#000
style D fill:#c0c0c0,color:#000
style E fill:#cd7f32,color:#fff
Key Points
The minimum RPO for hypervisor-based backups is 15 minutes; tighter RPOs require SmartCopy and primary-array snapshot integration.
Compliance Lock is truly immutable (legally enforceable); Governance Lock can be overridden by an admin quorum.
Use GFS hierarchical retention to retain daily/weekly/monthly/yearly snapshots without policy explosion.
Define one policy per SLA tier (Gold/Silver/Bronze), not per workload — reuse across many Protection Groups.
Schedules run in the cluster's local time zone; document blackout windows and confirm RPO is still achievable around suspensions.
Pre-Quiz: Policies and Schedules
3. Which DataLock flavor is required for SOX, HIPAA, and PCI workloads where backup deletion must be impossible even with admin override?
4. An enterprise has 50 SQL servers, 200 file shares, and 1,200 VMs all classified as Gold-tier. What is the recommended policy design?
5. What is the minimum RPO Cohesity supports in a standard policy for hypervisor-based backups?
Post-Quiz: Policies and Schedules
3. Which DataLock flavor is required for SOX, HIPAA, and PCI workloads where backup deletion must be impossible even with admin override?
4. An enterprise has 50 SQL servers, 200 file shares, and 1,200 VMs all classified as Gold-tier. What is the recommended policy design?
5. What is the minimum RPO Cohesity supports in a standard policy for hypervisor-based backups?
Section 3: Protection Groups
A Protection Group is the binding object that connects a set of source objects to a single policy. It also holds operational settings such as proxy assignment, indexing options, pre/post scripts, application-quiesce flags, and exclude lists. The architectural decision that dominates Protection Group design is how membership is determined: statically, by container, or by tag.
Static Membership vs. Tag-Based Auto-Protection
Static membership means the administrator hand-picks individual objects at job-creation time. The Protection Group's scope never changes unless someone edits it. This is highly deterministic and auditable — but the risk is silent under-protection: a new VM provisioned by a junior engineer in a regulated environment may not appear in any Protection Group for weeks until someone notices.
Auto-protect automatically protects new VMs added to selected parent objects — datacenters, folders, clusters, hosts, resource pools — and supports vSphere tags for inclusion and exclusion. New VMs added to that container are automatically swept into the next backup run.
Auto-Protect with vSphere Tags — The AND/OR Quirk
Tag-based auto-protect has a non-obvious behavior that frequently appears on the CCAE exam:
Adding tags one-by-one with exclude selected produces an OR operation — Cohesity excludes any VM with any of the listed tags.
Adding multiple tags simultaneously in a single operation produces an AND operation — Cohesity excludes only VMs with all the listed tags.
If you want to exclude everything tagged dev or lab, add them one-by-one (OR). If you want to exclude only VMs that are both dev and decommissioned, add them together (AND). Misreading this distinction has caused architects to either over-protect (tagging an entire dev fleet into Gold) or under-protect (silently excluding production workloads).
flowchart LR
A[vCenter Tag tier=gold] --> B[Cohesity Inventory Sync]
B --> C{Tag Filter Include / Exclude}
C -->|Match| D[Dynamic Membership Update]
C -->|No Match| E[Excluded from PG]
D --> F[Protection Group pg-gold-auto]
F --> G[Next Backup Run New VMs Auto-Swept]
H[New VM Provisioned + Tagged] -.-> A
I[Untagged VM Removed] -.-> E
style A fill:#1f6feb,color:#fff
style D fill:#238636,color:#fff
style F fill:#8957e5,color:#fff
Animation 2: Auto-Protect via vSphere Tags — VM appears, tag detected, filter matched, included in PG
Well-organized vSphere with folder-per-business-unit
Folder reorganization can shift scope
Good — provided folder hygiene
Tag auto-protect
Cross-cutting concerns (tier=gold, app=sql) where folder structure is contested
Tag drift, AND/OR confusion
Moderate — requires tag governance
App-Consistent vs. Crash-Consistent Backups
A crash-consistent backup captures the disk state as if the system had been suddenly powered off. An app-consistent backup pauses the application briefly so it can flush buffers, checkpoint state, and quiesce I/O before the snapshot is taken. On Windows, this is VSS; for databases, it is the engine's own quiesce API (RMAN, VDI, Backint). Architects should default to app-consistent for any VM running a database or transactional system.
Key Points
Static membership is auditable but brittle; auto-protect is operationally elegant but requires tag/folder governance.
Tag exclude logic: tags added one-by-one = OR; tags added simultaneously = AND. This is a CCAE exam favorite.
Pair static membership with regulated workloads (PCI/HIPAA/SOX) and auto-protect with everything else.
Indexing enables global file search and self-service restore — disable it for block-only DB volumes; enable it for file servers and end-user VMs.
Default to app-consistent for any VM running a database or transactional system; crash-consistent only where quiesce overhead is unacceptable.
Pre-Quiz: Protection Groups
6. An architect adds two exclusion tags dev and lab to a Cohesity Protection Group's auto-protect filter, in two separate "add" operations. What is the resulting exclusion logic?
7. A regulated PCI environment requires explicit, auditable proof that every cardholder-data VM is included in a Protection Group. Which membership model best meets this requirement?
8. A Protection Group backs up a VM hosting a SQL Server transactional database. The architect should configure the backup as:
Post-Quiz: Protection Groups
6. An architect adds two exclusion tags dev and lab to a Cohesity Protection Group's auto-protect filter, in two separate "add" operations. What is the resulting exclusion logic?
7. A regulated PCI environment requires explicit, auditable proof that every cardholder-data VM is included in a Protection Group. Which membership model best meets this requirement?
8. A Protection Group backs up a VM hosting a SQL Server transactional database. The architect should configure the backup as:
Section 4: Performance and Concurrency
A perfectly designed policy is worthless if backups run hot enough to crash production. Performance tuning sits at the intersection of source impact, network bandwidth, proxy capacity, and cluster ingest throughput.
SmartCopy and Storage Snapshot Integration
SmartCopy is Cohesity's snapshot-based copy and replication mechanism that integrates directly with primary storage arrays — most prominently Pure Storage FlashArray, but also NetApp, HPE Nimble/Primera, and Dell PowerStore via partner integrations. Rather than running a hypervisor-side or in-guest backup that competes with production I/O, Cohesity drives the array's own snapshot APIs and ingests data from those snapshots.
The architecture flow:
Discovery — Register the Pure FlashArray as a source.
Policy assignment — Assign a Protection Policy with snapshot frequency, retention on the array, retention on Cohesity, replication, and archive.
Snapshot creation — At schedule time, Cohesity calls the Pure REST API. Optional pre/post scripts quiesce SQL/Oracle/Exchange.
Mount and read — Cohesity mounts the snapshot via iSCSI, reads only changed blocks, and ingests through inline dedupe and compression.
Retention tiering — Recent snapshots remain on Pure for instant restore at flash speed; older snapshots tier to Cohesity for long-term recovery.
Recovery — Volume-level restore back to any Pure FlashArray, file-level via SmartFiles mount, or cross-platform to native cloud VMs.
The exam-relevant point: SmartCopy enables sub-15-minute RPOs with zero hypervisor overhead and is the canonical Gold-tier mechanism for transactional databases sitting on Pure.
Figure 7.4: SmartCopy with Pure FlashArray — orchestration sequence
sequenceDiagram
participant App as SQL/Oracle App
participant Cohesity as Cohesity Cluster
participant Pure as Pure FlashArray
participant Archive as CloudArchive (S3)
Cohesity->>App: Pre-script: Quiesce (VSS/RMAN)
App-->>Cohesity: Quiesce ACK
Cohesity->>Pure: REST API: Take Snapshot
Pure-->>Pure: Native array snapshot created
Pure-->>Cohesity: Snapshot ID
Cohesity->>App: Post-script: Release quiesce
Cohesity->>Pure: Mount snapshot (iSCSI)
Pure-->>Cohesity: Stream changed blocks only
Cohesity-->>Cohesity: Inline dedupe + compression
Note over Pure: Recent snapshots retained on flash for instant restore
Note over Cohesity: Older snapshots tier to Cohesity DataPlatform
Cohesity->>Archive: Tier monthly to S3 Glacier
Animation 3: SmartCopy — snapshot stays on Pure flash for fast restore, then tiers to Cohesity
CBT/RCT and Incremental Forever
For non-array-integrated VM backups, Cohesity uses CBT on VMware and RCT on Hyper-V. The hypervisor maintains a bitmap of changed blocks since the last backup; Cohesity reads only those changed blocks. Combined with global variable-length deduplication on ingest, this delivers an Incremental Forever model: a single full backup at job inception, then deltas only.
Throttling and QoS
Cohesity supports time-windowed bandwidth throttling for replication and archive traffic — for example, capping replication to 200 Mbps during business hours and lifting the cap overnight. Per-policy QoS lets you mark Gold backups higher-priority than Bronze on a shared cluster.
Key Points
SmartCopy drives the array's native snapshot APIs (Pure REST, NetApp ONTAP, etc.) — sub-15-minute RPOs with zero hypervisor overhead.
Recent snapshots stay on the primary array (flash speed restore); older snapshots tier to Cohesity for long-term retention and archive.
CBT (VMware) and RCT (Hyper-V) enable incremental-forever — only changed blocks are read after the initial seed.
Cap streams at the source (per-datastore) and tune concurrency iteratively rather than maximizing on day one.
Use bandwidth throttling windows and per-policy QoS to keep Gold backups fast and Bronze backups polite on a shared cluster.
Pre-Quiz: Performance and Concurrency
9. A Gold-tier SQL database lives on a Pure FlashArray and requires a 5-minute RPO with minimal hypervisor overhead. Which Cohesity mechanism best meets this requirement?
10. In a SmartCopy + Pure deployment, where do the most recent snapshots reside, and why?
11. A small all-flash datastore hosting a tier-1 OLTP database is being saturated by a 32-stream Cohesity backup job. What is the most appropriate fix?
Post-Quiz: Performance and Concurrency
9. A Gold-tier SQL database lives on a Pure FlashArray and requires a 5-minute RPO with minimal hypervisor overhead. Which Cohesity mechanism best meets this requirement?
10. In a SmartCopy + Pure deployment, where do the most recent snapshots reside, and why?
11. A small all-flash datastore hosting a tier-1 OLTP database is being saturated by a 32-stream Cohesity backup job. What is the most appropriate fix?
Chapter Summary
This chapter unpacked the trio of objects that drive Cohesity data protection: Sources, Policies, and Protection Groups. You learned how to register vCenter, SCVMM, Prism, physical hosts, NAS, and databases — and the importance of per-datastore stream caps set at registration time. You walked through GFS hierarchical retention, DataLock immutability, and the tiered Gold/Silver/Bronze SLA model. You compared static membership against container- and tag-based auto-protect, including the AND/OR tag logic that catches careless architects on the exam. And you traced how SmartCopy with Pure FlashArray enables 15-minute RPOs without hypervisor overhead.
Hold the analogy in mind: a policy is the SLA contract; a Protection Group is the customer roster.