Study Guide: Chapter 10 — Cloud Integration: Archive, Tier, Replicate, and Spin

Learning Objectives

Differentiate CloudArchive, CloudArchive Direct, CloudTier, CloudReplicate, and CloudSpin by purpose, data movement model, and recovery semantics.
Configure External Targets to AWS S3 (including Glacier and Deep Archive), Azure Blob (including Archive tier), GCP, and S3-compatible object stores.
Apply storage-class lifecycle policies correctly — keeping bucket-side rules and cluster-side retention separated and aligned.
Estimate egress, retrieval-request, and rehydration charges, and design retention and recall scenarios that avoid surprise invoices.
Recognize the IAM and RBAC minimums required for each target — the CloudFormation Template for AWS, Storage Blob Data Contributor for Azure, and the role of setBlobTier/action in tier transitions.

Section 10.1 — CloudArchive: Long-Term Retention to Object Storage

Pre-Section Check (10.1)

1. A regulator requires 7-year retention of monthly fulls, but the cluster must remain the authoritative recovery tier for fast restores. Which Cohesity feature fits best?

CloudTier — moves cold data to free local capacity

CloudArchive — copies to object storage while keeping the local copy

CloudReplicate — replicates the cluster to a Cloud Edition

CloudSpin — converts a backup to a native cloud VM

2. Why does CloudArchive let an operator browse and search archived snapshots without paying Glacier retrieval fees?

Glacier serves search queries for free

Cohesity caches all chunks locally indefinitely

The Yoda index stays on the cluster; only chunk data lives in Glacier

AWS waives request fees for backup vendors

10.1.1 Long-Term Retention to Object Storage

CloudArchive is a copy-out mechanism. The cluster keeps a complete local snapshot — chunk files, blob files, and metadata — and additionally writes a deduplicated, compressed copy to a registered external target. The local cluster remains authoritative for indices and metadata so that catalog operations (browse, search, restore) can be answered without rehydrating cloud objects unnecessarily.

CloudArchive is driven by Protection Policies: each policy may attach one or more Archival actions, each referencing an External Target with its own retention horizon. A typical pattern: daily incrementals retained 30 days on cluster, weekly fulls 90 days on cluster, and monthly fulls retained 7 years in S3 Glacier Deep Archive via CloudArchive. Only the third tier leaves the cluster.

10.1.2 Encryption and Immutability Options

CloudArchive honors the cluster's encryption posture end-to-end. Data leaves the cluster over TLS 1.2+ and is written at rest as AES-256. When the target is AWS S3, Cohesity can also use SSE-S3 or SSE-KMS server-side encryption, the latter requiring kms:Encrypt, kms:Decrypt, and kms:GenerateDataKey permissions on the customer-managed key.

Immutability for ransomware-resistant archives uses S3 Object Lock (AWS) or the equivalent Immutable Blob Storage (Azure). Cohesity sets per-object retention via the s3:PutObjectRetention API as objects are written. Object Lock must be enabled at bucket creation — it cannot be retrofitted without contacting AWS Support.

10.1.3 Indexing for Cloud-Archived Snapshots

The Cohesity index (Yoda) stays on the cluster. Two consequences:

You can browse and search archived snapshots without paying retrieval fees. The metadata is local; only the actual chunk data lives in Glacier.
If the originating cluster is destroyed, you must rebuild the index from cloud-resident metadata before recovery is fast. This is one of the principal differences between CloudArchive (cluster-authoritative) and FortKnox cyber vaulting (Cohesity-managed).

10.1.4 Direct Archive vs. Archive on Policy

CloudArchive (standard) — default; archive copy is created via a Protection Policy.
CloudArchive Direct — streaming variant: data flows through the cluster but is not retained as a full local copy — only metadata and index live on cluster, while bulk data is streamed directly to the external target. Appropriate for decommissioned apps retained only for compliance.

Post-Section Mastery (10.1)

1. A customer needs ransomware-immune archive copies of nightly backups. They forgot to enable S3 Object Lock when the bucket was created. What is the correct architectural response?

Use a bucket policy Deny Delete — equivalent to Object Lock

Create a new bucket with Object Lock enabled at creation, point the External Target there, and migrate retention

Enable Object Lock from the S3 console retroactively

Switch to CloudArchive Direct — it includes immutability automatically

2. What is the architectural drawback of CloudArchive Direct compared to standard CloudArchive?

It does not support encryption at rest

There is no full local copy on the cluster, so restores always pay retrieval and rehydration costs

It cannot be combined with Glacier Deep Archive

It requires CloudReplicate as a prerequisite

Section 10.2 — CloudTier: Capacity Extension for Cold Blocks

10.2.1 Capacity Tiering for Cold Blocks

CloudTier is a move operation, not a copy. The cluster's tiering engine continuously profiles block heat: blocks not accessed for the configured threshold are migrated out to cloud object storage, freeing local capacity. The cluster retains a pointer (a stub) so the namespace appears unchanged — when the data is needed, it is rehydrated transparently.

Because tiering moves data, CloudTier is irreversible without a recall operation. If you tier 500 TB out to S3 Standard-IA, you cannot simply "untier" by toggling a setting — you must recall the data, which counts as a full read against the object store and incurs egress and request charges.

10.2.2 Tiering Thresholds and Recall Behavior

Age-based: "tier any block not read in 90 days."
Capacity-based: "begin tiering when the View Box exceeds 80% utilization."
Hybrid: tier opportunistically by age, urgently by capacity pressure.

10.2.3 Performance Impact Considerations

Aggressive thresholds (e.g., "tier anything older than 7 days") tier blocks that will be read by month-end backups, causing recall storms.
Disaster recall storms — rehydrating 100 TB of tiered data through WAN bandwidth and cloud egress quotas can dominate RTO. Plan this scenario explicitly.
Cluster cache effect — recalled blocks repopulate the local tier, so recurring workloads stop hitting the cloud after the first recall.

10.2.4 Tier vs. Archive Trade-offs

Dimension	CloudTier	CloudArchive
Data movement	Move (single copy)	Copy (local + cloud)
Local footprint	Reduced	Unchanged
Reversibility	Recall required	Local copy still authoritative
Typical destination class	S3 Standard-IA, Azure Cool	S3 Glacier, Azure Archive
Driver	Cluster running out of capacity	Compliance / LTR retention
Recall cost exposure	High — every restore pulls from cloud	Low — only on disaster

The two patterns are complementary, not mutually exclusive. A common enterprise design tiers cold backup blocks to S3 Standard-IA (CloudTier) and copies monthly fulls to Glacier Deep Archive (CloudArchive).

Post-Section Mastery (10.2)

4. A View Box hits 92% capacity and the team enables CloudTier with a 30-day age threshold. Two days later, the monthly synthetic full reads blocks 45 days old, triggering massive recalls. What is the most architecturally sound fix?

Disable CloudTier entirely

Switch to CloudArchive Direct

Raise the age threshold past the longest predictable read pattern (e.g., 100 days), and add capacity headroom

Move tiered data to Glacier Deep Archive to reduce storage cost

5. A customer wants both capacity relief AND a 7-year compliance copy. Which approach is correct?

Use CloudTier alone — Standard-IA is durable enough for compliance

Use CloudArchive alone — set retention to 7 years

Combine CloudTier (to S3 Standard-IA) for capacity relief AND CloudArchive (to Glacier Deep Archive) for compliance

Use CloudReplicate — both problems resolve themselves

Section 10.3 — CloudReplicate and CloudSpin: Cloud as a DR Plane

10.3.1 CloudReplicate to a Cohesity Cloud Edition

CloudReplicate is conceptually identical to cluster-to-cluster replication, except the destination cluster is a Cohesity Cloud Edition running inside AWS, Azure, or GCP. The replicated data lands on a fully-functional Cohesity cluster, so all DataPlatform features (instant mass restore, indexing, granular search) are available on the cloud side.

CloudReplicate is the right answer when:

The DR strategy requires a functioning Cohesity control plane in the cloud.
RTO requirements exclude a slow rehydration from object storage.
Compliance permits the cluster's normal feature set in the cloud (DataLock, indexing).

The cost profile is meaningfully higher than CloudArchive — you are paying for cluster compute (EC2 or Azure VMs running Cohesity), local SSD/EBS, and replication network — but the recovery posture is dramatically better.

10.3.2 Converting Backups to Native Cloud VMs (CloudSpin)

CloudSpin converts an on-prem VM backup into a native cloud VM — EC2, Azure VM, or GCE instance — without requiring a Cohesity cluster on the destination side. The flow:

Reads the VM backup from local cluster (or recalls from CloudArchive if needed).
Converts the disk format (VMDK or VHDX → AMI for AWS, managed disk for Azure).
Boots the VM into the target VPC/VNet with the chosen instance shape.

CloudSpin use cases:

Test/dev cloud bursting — spin a copy of a production VM in the cloud for a stress test, then destroy it.
Forensic investigation — boot a known-clean snapshot in an isolated VPC for malware analysis.
Cloud migration trial runs — validate a workload runs in EC2 before lift-and-shift.

CloudSpin is not a continuous DR replication mechanism — each spin is a discrete conversion job.

10.3.3 Network and IAM Prerequisites

Outbound HTTPS (port 443) to the cloud control plane endpoints.
IAM credentials with permission to create EC2/Azure VM resources, manage EBS/managed disks, and configure VPC/VNet network interfaces.
VPC/VNet design with appropriate subnets, security groups, and route tables.
For CloudReplicate: a registered Cloud Edition cluster as the destination.

The IAM minimums for CloudSpin in AWS include ec2:RunInstances, ec2:CreateVolume, ec2:AttachVolume, ec2:CreateImage, ec2:RegisterImage, iam:PassRole, plus S3 actions if the backup is in object storage.

10.3.4 Test Recovery and Clean-Up

SiteContinuity wraps CloudSpin and CloudReplicate in runbooks supporting test failover, planned failover, and failback. Always include a destroy step in the runbook — spun cloud VMs accrue compute charges as long as they run.

Key Points — Section 10.3

CloudReplicate targets a Cohesity Cloud Edition — full feature parity, hourly compute cost, fastest cloud RTO.
CloudSpin converts a single backup into a native EC2/Azure/GCE VM — for bursting, testing, and forensic isolation.
CloudSpin is not continuous DR — each invocation is a discrete conversion job.
IAM minimums for CloudSpin in AWS: ec2:RunInstances, ec2:CreateVolume, ec2:CreateImage, ec2:RegisterImage, iam:PassRole.
Always include a destroy step in CloudSpin runbooks — running cloud VMs accrue charges until terminated.

Section 10.4 — The Decision Matrix

The single most exam-relevant artifact in this chapter is the decision matrix below. Memorize it.

Capability	CloudArchive	CloudArchive Direct	CloudTier	CloudReplicate	CloudSpin
Primary purpose	LTR copy	Streaming archive	Capacity extension	Cloud DR	Native cloud VM
Data movement	Copy	Stream	Move	Cluster-to-cluster	Convert + boot
Local copy?	Yes	No (metadata only)	No (stub)	Yes	Yes
Reversible?	N/A	Limited	No — recall	N/A	N/A
Destination class	Glacier / Archive	Glacier / Archive	Std-IA / Cool	EC2/Azure VM	EC2/Azure VM
Recovery speed	Hours	Hours	Sec (warm); recall	Seconds	Minutes
Cost profile	Lowest $/GB	Even lower	Mid	Highest	Per-spin + VM hourly

Exam tip: "Cluster at 85% capacity, customer wants cold backups online 90 more days, budget tight" → CloudTier. "7-year compliance, ransomware-resistant copies off-cluster" → CloudArchive with Object Lock. "Quarterly DR test in AWS without standing up another Cohesity cluster" → CloudSpin.

Figure 10.1: Cloud option decision tree (Mermaid)

flowchart TD Start([What is the primary driver?]) --> Q1{Cluster running
out of capacity?} Q1 -->|Yes| Tier[CloudTier
Move cold blocks
S3 Std-IA / Azure Cool] Q1 -->|No| Q2{Long-term
retention copy
required?} Q2 -->|Yes| Q3{Need local copy
for fast restore?} Q3 -->|Yes| Archive[CloudArchive
Copy to Glacier/Archive] Q3 -->|No| Direct[CloudArchive Direct
Stream — metadata only] Q2 -->|No| Q4{Need warm cluster
in cloud for DR?} Q4 -->|Yes| Replicate[CloudReplicate
To Cohesity Cloud Edition] Q4 -->|No| Q5{Need native
cloud VM from backup?} Q5 -->|Yes| Spin[CloudSpin
Convert to EC2/Azure VM] Q5 -->|No| Reassess([Reassess requirements])

Section 10.5 — Configuring CloudArchive to AWS S3 Glacier

The AWS Glacier flow is the most exam-tested external target configuration. Cohesity documents a five-step pattern.

Five-step workflow

flowchart LR A[1. Register
External Target
Purpose Archival] --> B[2. IAM via CFT
least-privilege role
+ KMS policy] B --> C[3. Bucket Policy
Allow Cohesity role
Object Lock enabled] C --> D[4. Lifecycle Rule
Std → IA → Glacier
→ Deep Archive] D --> E[5. Bind to
Protection Policy
Validate + run] style A fill:#1f6feb,color:#fff style B fill:#1f6feb,color:#fff style C fill:#1f6feb,color:#fff style D fill:#1f6feb,color:#fff style E fill:#238636,color:#fff

10.5.1 Step 1 — Register the External Target

Navigate to Inventory > External Targets > Register External Target. Configure name, Purpose: Archival (not Tiering — this one-character mistake defines the target's whole behavior), Provider (AWS > S3), bucket, region, access key, secret key, and storage class.

10.5.2 Step 2 — IAM via the Cohesity CloudFormation Template

The CFT provisions a least-privilege IAM role. Required actions:

s3:PutObject
s3:GetObject
s3:DeleteObjectVersion
s3:RestoreObject              ← required to recall from Glacier
s3:PutLifecycleConfiguration
s3:GetLifecycleConfiguration
s3:GetBucketObjectLockConfiguration
s3:PutObjectRetention         ← required for Object Lock / WORM
iam:SimulatePrincipalPolicy
kms:Encrypt
kms:Decrypt
kms:GenerateDataKey           ← required for SSE-KMS

The CFT-generated role is the right answer on the exam — never grant s3:* or ec2:*.

10.5.3 Step 3 — Bucket Policy

Authorize the Cohesity role explicitly:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::<ACC>:role/<COHESITY-ROLE>"},
    "Action": ["s3:PutObject","s3:GetObject","s3:RestoreObject","s3:PutObjectRetention"],
    "Resource": "arn:aws:s3:::your-bucket/*"
  }]
}

10.5.4 Step 4 — S3 Lifecycle Rule

The lifecycle rule lives on the bucket, not in Cohesity. Cohesity manages retention; the bucket lifecycle manages storage class.

Respect the minimum storage durations: Standard-IA 30 days, Glacier 90 days, Deep Archive 180 days. Early deletion triggers a charge equal to the remaining minimum days.

10.5.5 Step 5 — Bind to a Protection Policy

Edit a Protection Policy, add an Archival action targeting the new external target, define retention, attach to a Protection Group, validate, and run.

10.5.6 Glacier Pricing and Minimum Retention

Class	$/GB-mo	Min Retention	Retrieval (Std)	Use case
S3 Standard	~$0.023	none	n/a	Active backups
S3 Standard-IA	~$0.0125	30 days	n/a	CloudTier dest
Glacier Instant Retrieval	~$0.004	90 days	milliseconds	Rare-but-fast recall
Glacier Flexible Retrieval	~$0.0036	90 days	3–5 hours	Default Glacier
Glacier Deep Archive	~$0.00099	180 days	12 hours	Multi-year compliance

Key Points — Section 10.5

The five-step pattern: Register target → CFT IAM → Bucket policy → Lifecycle rule → Bind to policy.
Cohesity owns retention; the bucket lifecycle owns storage class. Don't conflate them.
Use the Cohesity-published CloudFormation Template for least-privilege IAM — never s3:*.
s3:RestoreObject is required to rehydrate from Glacier; s3:PutObjectRetention is required for Object Lock.
Respect minimum storage durations: Standard-IA 30d, Glacier 90d, Deep Archive 180d. Early deletion = early-deletion charge.

Post-Section Mastery (10.5)

11. A Protection Policy retains monthly fulls 7 years and the bucket lifecycle transitions objects to Deep Archive at day 30. After 60 days, an operator shortens the Protection Policy to 1 year. What happens?

Cohesity moves the objects back to Standard automatically

Deletion of objects in Deep Archive triggers an early-deletion charge equal to the remainder of the 180-day minimum

The bucket lifecycle prevents Cohesity from deleting at all

The objects remain forever — Cohesity cannot manage Deep Archive

12. Which is the BEST IAM approach when configuring a new S3 bucket for CloudArchive?

Grant s3:* on all buckets — simplifies operations

Run the Cohesity-published CloudFormation Template, which creates the least-privilege role and bucket policy

Embed the cluster's root credentials in Cohesity's external target config

Use an EC2 instance profile from another workload that already has S3 access

Section 10.6 — Configuring CloudArchive to Azure Blob Storage

10.6.1 The Minimum Role — Storage Blob Data Contributor

The single role a CCAE candidate must remember is Storage Blob Data Contributor. It grants:

Read, write, and delete on blobs.
The setBlobTier/action data action — required for tier transitions to Archive.
Sufficient for backup, archive, list, and restore against the target container.

What it does not grant: control-plane operations like creating storage accounts. That is Storage Account Contributor — a control-plane role that does NOT grant data-plane access. Assigning Reader or Storage Account Contributor alone causes confusing 403 errors when the cluster tries to write objects.

10.6.2 Authentication Model — Entra ID Service Principals Preferred

Microsoft Entra ID service principal with RBAC — the recommended pattern. Use a managed identity if the cluster runs in Azure.
Shared Access Signature (SAS) — time-limited tokens; rotation risk. Avoid in production.
Storage Account Access Keys — easiest, full account access, hardest to revoke.

10.6.3 setBlobTier and the Azure Archive Tier

To move a blob to Azure's Archive tier, the principal must have:

Microsoft.Storage/storageAccounts/blobServices/containers/blobs/setBlobTier/action

Included in Storage Blob Data Contributor. Rehydration: up to 15h Standard or up to 1h High priority.

10.6.4 Private Endpoints — Production Networking

Provision an Azure Private Endpoint on the storage account's blob sub-resource.
Approve the connection in Networking > Private endpoint connections.
Ensure the cluster VNet resolves privatelink.blob.core.windows.net via Azure Private DNS.
Lock the storage account firewall to Cohesity public IPs / VNet / service tags if any public access is permitted.

Common pitfall: private endpoint configured but DNS pointed at the public Blob endpoint — traffic falls back, gets blocked, archives fail.

Key Points — Section 10.6

Storage Blob Data Contributor is the data-plane minimum role. Storage Account Contributor alone is not enough.
Always prefer Entra ID service principals (or managed identities for in-Azure clusters) over SAS tokens or account keys.
setBlobTier/action is required to move blobs into the Archive tier — it is included in Storage Blob Data Contributor.
Production networking: Azure Private Endpoint + Private DNS resolution to privatelink.blob.core.windows.net.
Azure Archive rehydration: up to 15h Standard, ~1h High priority (extra cost).

Post-Section Mastery (10.6)

14. A customer assigned the Cohesity service principal "Storage Account Contributor" but archives are failing with 403 errors. What is the fix?

Add "Storage Blob Data Contributor" — Storage Account Contributor is control-plane only

Switch to a SAS token instead of a service principal

Disable Private Endpoint so traffic uses the public endpoint

Remove all bucket-side lifecycle rules

15. Which authentication method is the recommended production pattern for Cohesity to Azure Blob?

Storage account access keys hardcoded in the External Target

A 90-day SAS token rotated manually

An Entra ID service principal (or managed identity) with Storage Blob Data Contributor

No authentication — rely on VNet isolation

Section 10.7 — Storage Classes and Cost Modeling

10.7.1 S3 and Azure Class Mapping

Use case	AWS S3	Azure Blob	Min retention	Latency
Active backups, fast recovery	S3 Standard	Hot	none	ms
Cold backups (CloudTier target)	S3 Standard-IA	Cool	30 days	ms
Archive — fast occasional recall	Glacier Instant Retrieval	(no exact equivalent)	90 days	ms
Archive — multi-hour recall ok	Glacier Flexible Retrieval	Cold (preview)	90 days	3–5h
Deepest archive — multi-year	Glacier Deep Archive	Archive	180 days	up to 15h

10.7.2 The Three Cost Components

Storage — $/GB-month at rest.
Requests — per-API-call PUT, GET, RESTORE. Glacier and Archive request fees can dominate when many small objects are archived.
Egress — outbound data transfer. AWS charges roughly $0.09/GB egress to Internet. Cross-region ~$0.02/GB. Within-region to AWS services typically free.

10.7.3 Worked Example — 100 TB Disaster Recall

Scenario: 500 TB archived to Glacier Deep Archive over 4 years. Ransomware destroys the production environment. Customer must recall 100 TB to a new on-prem cluster.

Component	Calculation	Cost
Storage at rest	500 TB × $0.00099/GB-mo × 48 mo	~$23,760
Restore (Std retrieval)	$0.02/GB × 100 TB	~$2,000
Restore requests	~10M objects × $0.025/1k	~$250
Egress to Internet	$0.09/GB × 100 TB	~$9,000
Total recall event		~$11,250
Plus 4 yrs storage already paid		~$23,760

Architectural lessons:

Egress dominates the recall event. Recalling within AWS makes egress effectively free.
Storage is cheap; recall is not. Design recall destinations to keep traffic in-region.
Bandwidth may be the real constraint. 100 TB over 1 Gbps WAN ≈ 9 days; AWS Snowball Edge ships 80 TB devices for offline recall.

10.7.4 Lifecycle and Retention Best Practices

Match Cohesity retention to bucket lifecycle expiration, with a small safety margin so orphans get cleaned but Cohesity-managed objects are never prematurely deleted.
Stage transitions: Standard → IA at 30d → Glacier at 90d → Deep Archive at 180d. Avoids early-deletion charges.
Test rehydration windows quarterly. A 12h rehydration on a 50 TB recall is real RTO cost — put it in the runbook.

Chapter 10 — Cloud Integration: Archive, Tier, Replicate, and Spin