Chapter 10 — Cloud Integration: Archive, Tier, Replicate, and Spin

CCAE Certification Exam Preparation Guide · Chapter 10 of 15

Learning Objectives

Analogy: Garage cleanup vs. offsite storage rental. Think of your cluster as a two-car garage. CloudTier is the garage cleanup: you mark anything older than 90 days and haul those boxes to a self-storage unit. The boxes are gone from the garage — you got the floor space back — but you can drive over and fetch them when needed. CloudArchive is the offsite records vault: you photocopy critical documents and ship the copies to an archival facility. The originals stay in the garage; the archival facility is the durable, regulator-friendly second copy.

Section 10.1 — CloudArchive: Long-Term Retention to Object Storage

Pre-Section Check (10.1)

1. A regulator requires 7-year retention of monthly fulls, but the cluster must remain the authoritative recovery tier for fast restores. Which Cohesity feature fits best?

CloudTier — moves cold data to free local capacity
CloudArchive — copies to object storage while keeping the local copy
CloudReplicate — replicates the cluster to a Cloud Edition
CloudSpin — converts a backup to a native cloud VM

2. Why does CloudArchive let an operator browse and search archived snapshots without paying Glacier retrieval fees?

Glacier serves search queries for free
Cohesity caches all chunks locally indefinitely
The Yoda index stays on the cluster; only chunk data lives in Glacier
AWS waives request fees for backup vendors

10.1.1 Long-Term Retention to Object Storage

CloudArchive is a copy-out mechanism. The cluster keeps a complete local snapshot — chunk files, blob files, and metadata — and additionally writes a deduplicated, compressed copy to a registered external target. The local cluster remains authoritative for indices and metadata so that catalog operations (browse, search, restore) can be answered without rehydrating cloud objects unnecessarily.

CloudArchive is driven by Protection Policies: each policy may attach one or more Archival actions, each referencing an External Target with its own retention horizon. A typical pattern: daily incrementals retained 30 days on cluster, weekly fulls 90 days on cluster, and monthly fulls retained 7 years in S3 Glacier Deep Archive via CloudArchive. Only the third tier leaves the cluster.

10.1.2 Encryption and Immutability Options

CloudArchive honors the cluster's encryption posture end-to-end. Data leaves the cluster over TLS 1.2+ and is written at rest as AES-256. When the target is AWS S3, Cohesity can also use SSE-S3 or SSE-KMS server-side encryption, the latter requiring kms:Encrypt, kms:Decrypt, and kms:GenerateDataKey permissions on the customer-managed key.

Immutability for ransomware-resistant archives uses S3 Object Lock (AWS) or the equivalent Immutable Blob Storage (Azure). Cohesity sets per-object retention via the s3:PutObjectRetention API as objects are written. Object Lock must be enabled at bucket creation — it cannot be retrofitted without contacting AWS Support.

10.1.3 Indexing for Cloud-Archived Snapshots

The Cohesity index (Yoda) stays on the cluster. Two consequences:

  1. You can browse and search archived snapshots without paying retrieval fees. The metadata is local; only the actual chunk data lives in Glacier.
  2. If the originating cluster is destroyed, you must rebuild the index from cloud-resident metadata before recovery is fast. This is one of the principal differences between CloudArchive (cluster-authoritative) and FortKnox cyber vaulting (Cohesity-managed).

10.1.4 Direct Archive vs. Archive on Policy

Animation 1 — Cloud Integration Decision Tree

Branching paths reveal as you walk through the architect's decision: capacity vs. retention vs. DR vs. spin.

Primary driver? Cluster running out of capacity? Yes No CloudTier Move cold blocks S3 Std-IA / Azure Cool Long-term retention copy required? Yes No Need local copy for fast restore? Need warm cluster in cloud for DR? CloudArchive Copy + local retained Glacier / Azure Archive CloudArchive Direct Stream — metadata local No full local copy CloudReplicate Cohesity Cloud Edition CloudSpin Native EC2/Azure VM
Each path solves a different architectural problem; memorize the branching question at every node.

Key Points — Section 10.1

Post-Section Mastery (10.1)

1. A customer needs ransomware-immune archive copies of nightly backups. They forgot to enable S3 Object Lock when the bucket was created. What is the correct architectural response?

Use a bucket policy Deny Delete — equivalent to Object Lock
Create a new bucket with Object Lock enabled at creation, point the External Target there, and migrate retention
Enable Object Lock from the S3 console retroactively
Switch to CloudArchive Direct — it includes immutability automatically

2. What is the architectural drawback of CloudArchive Direct compared to standard CloudArchive?

It does not support encryption at rest
There is no full local copy on the cluster, so restores always pay retrieval and rehydration costs
It cannot be combined with Glacier Deep Archive
It requires CloudReplicate as a prerequisite

Section 10.2 — CloudTier: Capacity Extension for Cold Blocks

Pre-Section Check (10.2)

3. Which statement about CloudTier is TRUE?

It always creates a second copy of every backup
It moves cold blocks to cloud and leaves only stubs on the cluster
It is intended for compliance-driven long-term retention
It uses Glacier Deep Archive as the default destination class

10.2.1 Capacity Tiering for Cold Blocks

CloudTier is a move operation, not a copy. The cluster's tiering engine continuously profiles block heat: blocks not accessed for the configured threshold are migrated out to cloud object storage, freeing local capacity. The cluster retains a pointer (a stub) so the namespace appears unchanged — when the data is needed, it is rehydrated transparently.

Because tiering moves data, CloudTier is irreversible without a recall operation. If you tier 500 TB out to S3 Standard-IA, you cannot simply "untier" by toggling a setting — you must recall the data, which counts as a full read against the object store and incurs egress and request charges.

10.2.2 Tiering Thresholds and Recall Behavior

10.2.3 Performance Impact Considerations

10.2.4 Tier vs. Archive Trade-offs

DimensionCloudTierCloudArchive
Data movementMove (single copy)Copy (local + cloud)
Local footprintReducedUnchanged
ReversibilityRecall requiredLocal copy still authoritative
Typical destination classS3 Standard-IA, Azure CoolS3 Glacier, Azure Archive
DriverCluster running out of capacityCompliance / LTR retention
Recall cost exposureHigh — every restore pulls from cloudLow — only on disaster

The two patterns are complementary, not mutually exclusive. A common enterprise design tiers cold backup blocks to S3 Standard-IA (CloudTier) and copies monthly fulls to Glacier Deep Archive (CloudArchive).

Key Points — Section 10.2

Post-Section Mastery (10.2)

4. A View Box hits 92% capacity and the team enables CloudTier with a 30-day age threshold. Two days later, the monthly synthetic full reads blocks 45 days old, triggering massive recalls. What is the most architecturally sound fix?

Disable CloudTier entirely
Switch to CloudArchive Direct
Raise the age threshold past the longest predictable read pattern (e.g., 100 days), and add capacity headroom
Move tiered data to Glacier Deep Archive to reduce storage cost

5. A customer wants both capacity relief AND a 7-year compliance copy. Which approach is correct?

Use CloudTier alone — Standard-IA is durable enough for compliance
Use CloudArchive alone — set retention to 7 years
Combine CloudTier (to S3 Standard-IA) for capacity relief AND CloudArchive (to Glacier Deep Archive) for compliance
Use CloudReplicate — both problems resolve themselves

Section 10.3 — CloudReplicate and CloudSpin: Cloud as a DR Plane

Pre-Section Check (10.3)

6. Which statement best distinguishes CloudReplicate from CloudSpin?

CloudReplicate is a continuous replication to a Cohesity Cloud Edition; CloudSpin is a discrete conversion of a backup to a native cloud VM
CloudReplicate uses Glacier; CloudSpin uses Standard
CloudReplicate is for AWS only; CloudSpin is for Azure only
CloudReplicate has no cluster compute cost; CloudSpin runs continuously

10.3.1 CloudReplicate to a Cohesity Cloud Edition

CloudReplicate is conceptually identical to cluster-to-cluster replication, except the destination cluster is a Cohesity Cloud Edition running inside AWS, Azure, or GCP. The replicated data lands on a fully-functional Cohesity cluster, so all DataPlatform features (instant mass restore, indexing, granular search) are available on the cloud side.

CloudReplicate is the right answer when:

The cost profile is meaningfully higher than CloudArchive — you are paying for cluster compute (EC2 or Azure VMs running Cohesity), local SSD/EBS, and replication network — but the recovery posture is dramatically better.

10.3.2 Converting Backups to Native Cloud VMs (CloudSpin)

CloudSpin converts an on-prem VM backup into a native cloud VM — EC2, Azure VM, or GCE instance — without requiring a Cohesity cluster on the destination side. The flow:

  1. Reads the VM backup from local cluster (or recalls from CloudArchive if needed).
  2. Converts the disk format (VMDK or VHDX → AMI for AWS, managed disk for Azure).
  3. Boots the VM into the target VPC/VNet with the chosen instance shape.

CloudSpin use cases:

CloudSpin is not a continuous DR replication mechanism — each spin is a discrete conversion job.

10.3.3 Network and IAM Prerequisites

The IAM minimums for CloudSpin in AWS include ec2:RunInstances, ec2:CreateVolume, ec2:AttachVolume, ec2:CreateImage, ec2:RegisterImage, iam:PassRole, plus S3 actions if the backup is in object storage.

10.3.4 Test Recovery and Clean-Up

SiteContinuity wraps CloudSpin and CloudReplicate in runbooks supporting test failover, planned failover, and failback. Always include a destroy step in the runbook — spun cloud VMs accrue compute charges as long as they run.

Animation 2 — CloudSpin Transformation

From cluster-resident backup, through format conversion, to a running native cloud VM.

Cohesity Cluster VM Backup VMDK / VHDX dedup + compressed Conversion Format Transform VMDK → AMI / mgd disk EC2 / Azure VM Booted & Running PWR in target VPC/VNet 1. Read backup 2. Format conversion 3. Boot native VM CloudSpin: backup → native cloud VM No Cohesity cluster needed at the destination
CloudSpin is a discrete conversion job, not continuous replication — each spin is its own pipeline run.

Key Points — Section 10.3

Post-Section Mastery (10.3)

7. The DR strategy requires a fully-functional Cohesity cluster in AWS so that recovery can use Instant Mass Restore directly into VMware Cloud on AWS. Which option fits?

CloudArchive to Glacier Deep Archive
CloudTier to S3 Standard-IA
CloudReplicate to a Cohesity Cloud Edition
CloudSpin to EC2

8. A SecOps team needs to boot a malware-suspected VM in an isolated AWS VPC for analysis. What is the right Cohesity feature?

CloudArchive Direct
CloudTier with capacity-based threshold
CloudSpin into the isolated VPC
CloudReplicate with all networking disabled

Section 10.4 — The Decision Matrix

The single most exam-relevant artifact in this chapter is the decision matrix below. Memorize it.

Capability CloudArchive CloudArchive Direct CloudTier CloudReplicate CloudSpin
Primary purposeLTR copyStreaming archiveCapacity extensionCloud DRNative cloud VM
Data movementCopyStreamMoveCluster-to-clusterConvert + boot
Local copy?YesNo (metadata only)No (stub)YesYes
Reversible?N/ALimitedNo — recallN/AN/A
Destination classGlacier / ArchiveGlacier / ArchiveStd-IA / CoolEC2/Azure VMEC2/Azure VM
Recovery speedHoursHoursSec (warm); recallSecondsMinutes
Cost profileLowest $/GBEven lowerMidHighestPer-spin + VM hourly

Exam tip: "Cluster at 85% capacity, customer wants cold backups online 90 more days, budget tight" → CloudTier. "7-year compliance, ransomware-resistant copies off-cluster" → CloudArchive with Object Lock. "Quarterly DR test in AWS without standing up another Cohesity cluster" → CloudSpin.

Figure 10.1: Cloud option decision tree (Mermaid)

flowchart TD Start([What is the primary driver?]) --> Q1{Cluster running
out of capacity?} Q1 -->|Yes| Tier[CloudTier
Move cold blocks
S3 Std-IA / Azure Cool] Q1 -->|No| Q2{Long-term
retention copy
required?} Q2 -->|Yes| Q3{Need local copy
for fast restore?} Q3 -->|Yes| Archive[CloudArchive
Copy to Glacier/Archive] Q3 -->|No| Direct[CloudArchive Direct
Stream — metadata only] Q2 -->|No| Q4{Need warm cluster
in cloud for DR?} Q4 -->|Yes| Replicate[CloudReplicate
To Cohesity Cloud Edition] Q4 -->|No| Q5{Need native
cloud VM from backup?} Q5 -->|Yes| Spin[CloudSpin
Convert to EC2/Azure VM] Q5 -->|No| Reassess([Reassess requirements])

Section 10.5 — Configuring CloudArchive to AWS S3 Glacier

Pre-Section Check (10.5)

9. In the five-step AWS configuration, where does the storage-class lifecycle rule live?

In the Cohesity Protection Policy
On the S3 bucket — managed by AWS, not Cohesity
In the IAM role assigned by the CloudFormation Template
In the Cohesity Helios global settings

10. Why does the CloudArchive workflow include s3:RestoreObject in the IAM policy?

It is needed to upload objects to S3
It enables Cohesity to rehydrate Glacier objects on recall
It enables MFA Delete
It is required only when Object Lock is enabled

The AWS Glacier flow is the most exam-tested external target configuration. Cohesity documents a five-step pattern.

Five-step workflow

flowchart LR A[1. Register
External Target
Purpose Archival] --> B[2. IAM via CFT
least-privilege role
+ KMS policy] B --> C[3. Bucket Policy
Allow Cohesity role
Object Lock enabled] C --> D[4. Lifecycle Rule
Std → IA → Glacier
→ Deep Archive] D --> E[5. Bind to
Protection Policy
Validate + run] style A fill:#1f6feb,color:#fff style B fill:#1f6feb,color:#fff style C fill:#1f6feb,color:#fff style D fill:#1f6feb,color:#fff style E fill:#238636,color:#fff

10.5.1 Step 1 — Register the External Target

Navigate to Inventory > External Targets > Register External Target. Configure name, Purpose: Archival (not Tiering — this one-character mistake defines the target's whole behavior), Provider (AWS > S3), bucket, region, access key, secret key, and storage class.

10.5.2 Step 2 — IAM via the Cohesity CloudFormation Template

The CFT provisions a least-privilege IAM role. Required actions:

s3:PutObject
s3:GetObject
s3:DeleteObjectVersion
s3:RestoreObject              ← required to recall from Glacier
s3:PutLifecycleConfiguration
s3:GetLifecycleConfiguration
s3:GetBucketObjectLockConfiguration
s3:PutObjectRetention         ← required for Object Lock / WORM
iam:SimulatePrincipalPolicy
kms:Encrypt
kms:Decrypt
kms:GenerateDataKey           ← required for SSE-KMS

The CFT-generated role is the right answer on the exam — never grant s3:* or ec2:*.

10.5.3 Step 3 — Bucket Policy

Authorize the Cohesity role explicitly:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::<ACC>:role/<COHESITY-ROLE>"},
    "Action": ["s3:PutObject","s3:GetObject","s3:RestoreObject","s3:PutObjectRetention"],
    "Resource": "arn:aws:s3:::your-bucket/*"
  }]
}

10.5.4 Step 4 — S3 Lifecycle Rule

The lifecycle rule lives on the bucket, not in Cohesity. Cohesity manages retention; the bucket lifecycle manages storage class.

Respect the minimum storage durations: Standard-IA 30 days, Glacier 90 days, Deep Archive 180 days. Early deletion triggers a charge equal to the remaining minimum days.

10.5.5 Step 5 — Bind to a Protection Policy

Edit a Protection Policy, add an Archival action targeting the new external target, define retention, attach to a Protection Group, validate, and run.

10.5.6 Glacier Pricing and Minimum Retention

Class$/GB-moMin RetentionRetrieval (Std)Use case
S3 Standard~$0.023nonen/aActive backups
S3 Standard-IA~$0.012530 daysn/aCloudTier dest
Glacier Instant Retrieval~$0.00490 daysmillisecondsRare-but-fast recall
Glacier Flexible Retrieval~$0.003690 days3–5 hoursDefault Glacier
Glacier Deep Archive~$0.00099180 days12 hoursMulti-year compliance

Animation 3 — S3 Glacier Lifecycle Block Slide

A backup block enters S3 Standard, transitions to Glacier at day 30, and lands in Deep Archive at day 180.

CloudArchive write → Standard → Glacier → Deep Archive Cohesity cluster Day 0 S3 Standard $0.023/GB-mo ms latency Days 0–30 Glacier $0.0036/GB-mo 3–5 h recall Days 30–180 Deep Archive $0.00099/GB-mo 12 h recall 180 d → 7 yrs Expire Day 2555 obj Cluster write PutObject (TLS) Lifecycle: 30 d Lifecycle: 180 d Bucket expire Bucket-side lifecycle drives storage class; Cohesity owns retention.
The yellow block is a single archive object; the highlighted tier shows where it lives at each lifecycle phase.

Key Points — Section 10.5

Post-Section Mastery (10.5)

11. A Protection Policy retains monthly fulls 7 years and the bucket lifecycle transitions objects to Deep Archive at day 30. After 60 days, an operator shortens the Protection Policy to 1 year. What happens?

Cohesity moves the objects back to Standard automatically
Deletion of objects in Deep Archive triggers an early-deletion charge equal to the remainder of the 180-day minimum
The bucket lifecycle prevents Cohesity from deleting at all
The objects remain forever — Cohesity cannot manage Deep Archive

12. Which is the BEST IAM approach when configuring a new S3 bucket for CloudArchive?

Grant s3:* on all buckets — simplifies operations
Run the Cohesity-published CloudFormation Template, which creates the least-privilege role and bucket policy
Embed the cluster's root credentials in Cohesity's external target config
Use an EC2 instance profile from another workload that already has S3 access

Section 10.6 — Configuring CloudArchive to Azure Blob Storage

Pre-Section Check (10.6)

13. What is the minimum Azure RBAC role for Cohesity to write, read, delete, and tier blobs?

Reader
Storage Account Contributor
Storage Blob Data Contributor
Owner

10.6.1 The Minimum Role — Storage Blob Data Contributor

The single role a CCAE candidate must remember is Storage Blob Data Contributor. It grants:

What it does not grant: control-plane operations like creating storage accounts. That is Storage Account Contributor — a control-plane role that does NOT grant data-plane access. Assigning Reader or Storage Account Contributor alone causes confusing 403 errors when the cluster tries to write objects.

10.6.2 Authentication Model — Entra ID Service Principals Preferred

  1. Microsoft Entra ID service principal with RBAC — the recommended pattern. Use a managed identity if the cluster runs in Azure.
  2. Shared Access Signature (SAS) — time-limited tokens; rotation risk. Avoid in production.
  3. Storage Account Access Keys — easiest, full account access, hardest to revoke.

10.6.3 setBlobTier and the Azure Archive Tier

To move a blob to Azure's Archive tier, the principal must have:

Microsoft.Storage/storageAccounts/blobServices/containers/blobs/setBlobTier/action

Included in Storage Blob Data Contributor. Rehydration: up to 15h Standard or up to 1h High priority.

10.6.4 Private Endpoints — Production Networking

  1. Provision an Azure Private Endpoint on the storage account's blob sub-resource.
  2. Approve the connection in Networking > Private endpoint connections.
  3. Ensure the cluster VNet resolves privatelink.blob.core.windows.net via Azure Private DNS.
  4. Lock the storage account firewall to Cohesity public IPs / VNet / service tags if any public access is permitted.

Common pitfall: private endpoint configured but DNS pointed at the public Blob endpoint — traffic falls back, gets blocked, archives fail.

Key Points — Section 10.6

Post-Section Mastery (10.6)

14. A customer assigned the Cohesity service principal "Storage Account Contributor" but archives are failing with 403 errors. What is the fix?

Add "Storage Blob Data Contributor" — Storage Account Contributor is control-plane only
Switch to a SAS token instead of a service principal
Disable Private Endpoint so traffic uses the public endpoint
Remove all bucket-side lifecycle rules

15. Which authentication method is the recommended production pattern for Cohesity to Azure Blob?

Storage account access keys hardcoded in the External Target
A 90-day SAS token rotated manually
An Entra ID service principal (or managed identity) with Storage Blob Data Contributor
No authentication — rely on VNet isolation

Section 10.7 — Storage Classes and Cost Modeling

10.7.1 S3 and Azure Class Mapping

Use caseAWS S3Azure BlobMin retentionLatency
Active backups, fast recoveryS3 StandardHotnonems
Cold backups (CloudTier target)S3 Standard-IACool30 daysms
Archive — fast occasional recallGlacier Instant Retrieval(no exact equivalent)90 daysms
Archive — multi-hour recall okGlacier Flexible RetrievalCold (preview)90 days3–5h
Deepest archive — multi-yearGlacier Deep ArchiveArchive180 daysup to 15h

10.7.2 The Three Cost Components

  1. Storage — $/GB-month at rest.
  2. Requests — per-API-call PUT, GET, RESTORE. Glacier and Archive request fees can dominate when many small objects are archived.
  3. Egress — outbound data transfer. AWS charges roughly $0.09/GB egress to Internet. Cross-region ~$0.02/GB. Within-region to AWS services typically free.

10.7.3 Worked Example — 100 TB Disaster Recall

Scenario: 500 TB archived to Glacier Deep Archive over 4 years. Ransomware destroys the production environment. Customer must recall 100 TB to a new on-prem cluster.

ComponentCalculationCost
Storage at rest500 TB × $0.00099/GB-mo × 48 mo~$23,760
Restore (Std retrieval)$0.02/GB × 100 TB~$2,000
Restore requests~10M objects × $0.025/1k~$250
Egress to Internet$0.09/GB × 100 TB~$9,000
Total recall event~$11,250
Plus 4 yrs storage already paid~$23,760

Architectural lessons:

10.7.4 Lifecycle and Retention Best Practices

Key Points — Section 10.7

Your Progress

Answer Explanations