Chapter 12: SmartFiles — Files, Objects, and Unstructured Data Services

Learning Objectives

For most enterprises, the largest pool of "data sprawl" is unstructured. Cohesity SmartFiles is the product that turns the same DataPlatform you already use as a backup target into a primary, multi-protocol unstructured-data service. SmartFiles is not a separate appliance to learn — it is a consumption mode of the cluster you have already designed.

1. SmartFiles Architecture

Pre-Section Quiz — SmartFiles Architecture

1. A View in SmartFiles is best described as:

A separate physical volume per protocol (one for SMB, one for NFS, one for S3).
A logical container in SpanFS that can be exposed simultaneously as SMB3, NFSv3/v4, and S3 against the same data.
A protection group definition for backup workloads only.
A network VIP used to load-balance NAS clients.

2. The View Box (Storage Domain) primarily defines:

The DNS name a client uses to mount.
The Active Directory forest used for authentication.
Storage efficiency, resiliency (RF/EC), encryption, tiering, and default quotas for the Views inside it.
The replication target cluster.

3. How does SmartFiles handle the "writable S3 clone" pattern (parallel writes from S3 against a live NFS/SMB View)?

By taking byte-range locks on the S3 side that block NFS/SMB writers.
By spawning an instant zero-copy SnapTree clone that is exposed as a separate writable S3 bucket.
By making the live View read-only across all protocols whenever S3 writes occur.
It is not supported — S3 writes always block NFS access.

From SpanFS to View: One File System, Many Faces

SmartFiles is not a separate product riding on top of Cohesity — it is a way of consuming SpanFS, the same distributed file system that holds backup snapshots, archived databases, and replicated VMs. SpanFS exposes a single global namespace across every node in the cluster with strict consistency. That single-namespace property is what lets the same logical object be a file to an NFS client, a share to an SMB client, and an object to an S3 client at the same time, without copying data into protocol-specific silos.

The central SmartFiles construct is the View. A View is a logical container that lives inside a View Box (Storage Domain) and that can be exposed simultaneously as:

flowchart LR SpanFS[SpanFS
Distributed File System
Single Global Namespace] VB[View Box /
Storage Domain
Policy Boundary] V[View
Logical Container] SMB[SMB3
Windows Shares] NFS[NFSv3 / NFSv4
UNIX Mounts] S3[S3
Object Buckets] SpanFS --> VB VB --> V V --> SMB V --> NFS V --> S3 SMB -.same data.-> NFS NFS -.same data.-> S3
Animation: SpanFS Layered Architecture — Bottom-Up Reveal
SpanFS Distributed file system — single global namespace View Box / Storage Domain Policy boundary: dedupe, RF/EC, encryption, tiering, quotas View Logical multi-protocol container SMB3 Windows shares NFSv3/v4 UNIX mounts S3 Object buckets Protocols ride on top of one View, one View Box, one SpanFS

View Boxes / Storage Domains: The Policy Boundary

The View Box — which newer documentation calls a Storage Domain — is the policy container for the Views inside it. The View Box is where you define storage efficiency, resiliency (RF2/RF3 or erasure coding), encryption, tiering policy, and default quotas. A common architectural pattern is to maintain at least two Storage Domains: one tuned for backup landing (high dedupe, erasure-coded, HDD-biased) and one tuned for primary file/object workloads (SSD-biased, lower dedupe ratio target) so that backup ingest cannot starve a busy SmartFiles user share.

+--------------------------------------------------------------+
| Cohesity Cluster (SpanFS, single global namespace)           |
|                                                              |
|  +----------------------+    +-------------------------+     |
|  | Storage Domain:      |    | Storage Domain:         |     |
|  | "BackupTarget"       |    | "SmartFiles-Primary"    |     |
|  | RF2 + EC 4:2         |    | RF2, SSD-biased         |     |
|  | Inline dedupe        |    | Post-process dedupe     |     |
|  |                      |    |                         |     |
|  |  View: vmware-bk     |    |  View: media-projects   |     |
|  |  View: oracle-bk     |    |  View: build-artifacts  |     |
|  |  View: m365-bk       |    |  View: home-dirs        |     |
|  +----------------------+    +-------------------------+     |
+--------------------------------------------------------------+

Protocol Surface: SMB3, NFSv3/v4, S3

ModeNFSSMBS3Typical Use
Multi-protocol R/WR/WR/WRead-onlyNAS workload + modern S3 readers
File-onlyR/WR/WOffUser home dirs / build farm
S3-onlyOffOffR/WSplunk SmartStore, container backups
Writable S3 cloneR/W (live)R/W (live)R/W (clone)Analytics/ML on writable copy

Cross-Protocol Identity and Locking

Cohesity bridges these via AD/LDAP integration. Cross-protocol locking is enforced inside SpanFS: an SMB3 oplock blocks conflicting NFS access per the locking semantics SpanFS exposes.

Animation: Multi-Protocol Simultaneous Access — SMB Write + NFS Read + S3 Read on the Same Data
View on SpanFS report.csv one copy, three faces SMB3 Client Windows / Alice NFSv4 Client Linux / uid 1001 S3 Client Cloud app / IAM key WRITE READ GET object AD/LDAP identity bridge: SID ↔ UID/GID ↔ access key Three protocols, three clients, one byte sequence on SpanFS

Key Points — SmartFiles Architecture

Post-Section Quiz — SmartFiles Architecture

1. A View in SmartFiles is best described as:

A separate physical volume per protocol (one for SMB, one for NFS, one for S3).
A logical container in SpanFS that can be exposed simultaneously as SMB3, NFSv3/v4, and S3 against the same data.
A protection group definition for backup workloads only.
A network VIP used to load-balance NAS clients.

2. The View Box (Storage Domain) primarily defines:

The DNS name a client uses to mount.
The Active Directory forest used for authentication.
Storage efficiency, resiliency (RF/EC), encryption, tiering, and default quotas for the Views inside it.
The replication target cluster.

3. How does SmartFiles handle the "writable S3 clone" pattern (parallel writes from S3 against a live NFS/SMB View)?

By taking byte-range locks on the S3 side that block NFS/SMB writers.
By spawning an instant zero-copy SnapTree clone that is exposed as a separate writable S3 bucket.
By making the live View read-only across all protocols whenever S3 writes occur.
It is not supported — S3 writes always block NFS access.

2. Quotas, QoS, and Tiering

Pre-Section Quiz — Quotas, QoS, and Tiering

1. In SmartFiles, the relationship between quota and alert-limit is best described as:

The alert-limit is the hard cap; the quota is the soft warning.
The quota is the enforced cap; the alert-limit is the soft warning that fires before enforcement.
Both are advisory; SpanFS never enforces either.
The quota only applies to S3 buckets; the alert-limit only applies to NFS/SMB.

2. Why must the QoS policy be selected at View creation time?

Because changing it on a busy View is non-trivial and may require data movement.
Because QoS is a billing-only setting and locks at view creation.
Because Cohesity's UI does not let you create a View without picking one.
Because QoS determines the SMB share name.

3. After a cold block has been tiered to cloud, what happens when an SMB client opens that file?

The client sees an error and must explicitly rehydrate the file.
SmartFiles transparently recalls the block; the client sees the same path, may experience first-read latency, and may incur egress cost.
The file appears truncated to zero bytes until tier-up completes.
The cluster automatically promotes the entire View back to local storage.

4. Which QoS policy best matches an active SMB user share that needs SSD-class latency?

Backup Target Low.
TestAndDev High.
Archive / general-purpose HDD.
CloudArchive Cold.

Quotas: Capacity Governance

SmartFiles supports both per-View quotas and per-user quotas inside a View, with audit logs of usage and Helios REST endpoints (getViewUserQuotas) to drive reporting and chargeback. Storage-Domain defaults are typically configured via the CLI parameters default-view-quota (in GiB) and default-view-quota-alert-limit.

A subtlety the CCAE exam can test: Cohesity does not sharply distinguish "soft" from "hard" quotas in the NetApp sense. Instead, think of:

graph TD SD[Storage Domain
default-view-quota = 10 TiB
alert-limit = 8 TiB] V1[View: home-dirs
inherits domain default] V2[View: media-projects
OVERRIDE: 50 TiB / alert 40 TiB] V3[View: render-scratch
OVERRIDE: 200 TiB / alert 160 TiB] U1[User quota: alice
500 GiB cap] U2[User quota: bob
500 GiB cap] U3[User quota: build-svc
2 TiB cap] SD --> V1 SD --> V2 SD --> V3 V1 --> U1 V1 --> U2 V2 --> U3

QoS Policies: Workload-Aware Placement

QoS PolicyTier BiasIO ProfileDesigned For
Backup Target LowHDD-heavyLarge sequentialBackup landing zones
TestAndDev HighSSD-optimizedTransactional, low-latencyActive dev/test, VDI
Archive / general purposeHDD/coldCapacity-orientedCold archives, relaxed-latency shares

Two CCAE design rules: (1) Pick the QoS at View creation — changing it on a busy View is non-trivial and may require data movement. (2) Match QoS to workload, not to who paid for it. Putting an active SMB user share on "Backup Target Low" tanks latency; putting a backup target on "TestAndDev High" wastes SSD.

Tiering: Hot/Cold Placement

SmartFiles applies policy-driven tiering across SSD, HDD, and S3-compatible cloud targets. Cold blocks move out without breaking the namespace — clients still see the file or object at the same path, and access triggers a transparent recall. Tiering is configured at the Storage Domain or View level and applies to all protocols simultaneously: a file tiered to S3 that is then read via SMB or NFS or the S3 API behaves identically.

Architects should explicitly model:

Worked Example: Media-and-Entertainment Workflow

Storage Domain: "Media-Primary"  (SSD-biased, RF2, post-process dedupe)
  +-- View: "edit-projects"
  |     Protocols: SMB3 R/W, NFSv4 R/W, S3 read-only
  |     QoS: TestAndDev High (SSD priority, low latency)
  |     Quota: 50 TB, alert 40 TB
  |     Tiering: cold > 90 days idle -> S3 (Standard-IA)
  |
  +-- View: "render-scratch"
        Protocols: NFSv4 R/W, S3 R/W
        QoS: General-purpose (HDD, throughput)
        Quota: 200 TB, alert 160 TB
        Tiering: cold > 30 days idle -> S3 (Glacier IR)

Key Points — Quotas, QoS, and Tiering

Post-Section Quiz — Quotas, QoS, and Tiering

1. In SmartFiles, the relationship between quota and alert-limit is best described as:

The alert-limit is the hard cap; the quota is the soft warning.
The quota is the enforced cap; the alert-limit is the soft warning that fires before enforcement.
Both are advisory; SpanFS never enforces either.
The quota only applies to S3 buckets; the alert-limit only applies to NFS/SMB.

2. Why must the QoS policy be selected at View creation time?

Because changing it on a busy View is non-trivial and may require data movement.
Because QoS is a billing-only setting and locks at view creation.
Because Cohesity's UI does not let you create a View without picking one.
Because QoS determines the SMB share name.

3. After a cold block has been tiered to cloud, what happens when an SMB client opens that file?

The client sees an error and must explicitly rehydrate the file.
SmartFiles transparently recalls the block; the client sees the same path, may experience first-read latency, and may incur egress cost.
The file appears truncated to zero bytes until tier-up completes.
The cluster automatically promotes the entire View back to local storage.

4. Which QoS policy best matches an active SMB user share that needs SSD-class latency?

Backup Target Low.
TestAndDev High.
Archive / general-purpose HDD.
CloudArchive Cold.

3. Data Protection for SmartFiles

Pre-Section Quiz — Data Protection for SmartFiles

1. ICAP integration in SmartFiles is best described as:

An asynchronous batch scan that runs nightly.
A synchronous protocol that fans out write paths to one or more configured ICAP servers (Trellix, Symantec, Sophos) for AV scanning before commit.
A bolt-on appliance that sits in front of the cluster.
A replacement for AD/LDAP authentication.

2. For SmartFiles DR replication to a remote cluster, which prerequisite is most often overlooked?

Adding extra disk capacity at the source cluster.
Ensuring AD/LDAP is reachable from the DR site so SMB and NFSv4 identities resolve post-failover.
Disabling deduplication on replicated streams.
Removing snapshots before failover.

3. SmartFiles snapshots are powered by:

A traditional copy-on-write block store with capacity penalties.
SnapTree, providing zero-copy snapshots and clones.
The legacy NAS array's hardware snapshot engine.
A separate snapshot service running outside SpanFS.

Snapshots and Policies

Every View can be snapshotted on a schedule using the same Protection Policies covered in Chapter 7 — frequency, hierarchical retention (daily / weekly / monthly / yearly), and lock attributes. SnapTree gives near-zero overhead for snapshots, so retention windows can be aggressive without paying capacity penalties. Snapshots are mountable as read-only Views, which makes "previous versions" workflows straightforward for SMB users.

For ransomware resilience, combine snapshot policies with DataLock (Chapter 11) so snapshot deletes require either time-elapse or quorum approval.

DR Replication for SmartFiles

Views replicate to a remote cluster using the same replication engine that DataProtect uses. Architectural notes:

File Audit Logging

SmartFiles ships native file audit logging that records per-event activity (open, read, write, rename, delete, ACL change) on Views. This is pushed to Syslog or to SIEM platforms and replaces bolt-on third-party audit appliances. Audit logging is also a control for ransomware detection: an unusual rate of rename-then-delete on a user share is a classic encryption signature.

Anti-Virus and ICAP Integration

SmartFiles integrates antivirus scanning natively via the ICAP (Internet Content Adaptation Protocol). When ICAP AV is enabled on a View, write paths fan out to one or more configured ICAP servers (Trellix, Symantec, Sophos) for scanning before the data is committed.

For the CCAE exam, remember:

Key Points — Data Protection for SmartFiles

Post-Section Quiz — Data Protection for SmartFiles

1. ICAP integration in SmartFiles is best described as:

An asynchronous batch scan that runs nightly.
A synchronous protocol that fans out write paths to one or more configured ICAP servers (Trellix, Symantec, Sophos) for AV scanning before commit.
A bolt-on appliance that sits in front of the cluster.
A replacement for AD/LDAP authentication.

2. For SmartFiles DR replication to a remote cluster, which prerequisite is most often overlooked?

Adding extra disk capacity at the source cluster.
Ensuring AD/LDAP is reachable from the DR site so SMB and NFSv4 identities resolve post-failover.
Disabling deduplication on replicated streams.
Removing snapshots before failover.

3. SmartFiles snapshots are powered by:

A traditional copy-on-write block store with capacity penalties.
SnapTree, providing zero-copy snapshots and clones.
The legacy NAS array's hardware snapshot engine.
A separate snapshot service running outside SpanFS.

4. Migration and Modernization

Pre-Section Quiz — Migration and Modernization

1. Cohesity's packaged NAS File Migration Service is sized at approximately:

3 TB per migration event.
30 TB per migration event.
300 TB per migration event.
Unlimited per migration event.

2. The "transparent cold-data tiering" path lets architects:

Move only cold blocks off the legacy filer to SmartFiles or cloud while the legacy NAS keeps serving hot data.
Force a full cutover of all data immediately.
Replace the legacy filer's controller hardware in place.
Use S3 as a primary tier for hot data only.

3. The strongest architectural argument SmartFiles makes against NetApp ONTAP and Dell PowerScale is:

SmartFiles uses faster network cards.
SmartFiles is the only platform that supports SMB.
Consolidation: the same platform serves backup, files, objects, archive, and DR with global dedupe and native audit/ICAP.
SmartFiles is free.

Three Migration Paths

  1. NAS File Migration Service — packaged Professional Services engagement covering cluster prep, planning, cutover, and end-state docs, sized at ~30 TB per migration event. For larger estates, plan multiple cutover events.
  2. Transparent cold-data tiering — SmartFiles scans the source NAS, classifies data by access pattern, and policy-tiers cold blocks to Cohesity (or cloud) without rehydration on access. The legacy NAS keeps serving hot data; SmartFiles silently absorbs the cold tail.
  3. Backup-driven cross-filer restore — back up the source NAS to Cohesity, then restore directly into a SmartFiles View (or even into a different NAS array). Especially useful when SMB/NFS permission preservation matters.

Cohesity SmartFiles vs. NetApp ONTAP and Dell PowerScale (Isilon)

CapabilityCohesity SmartFilesNetApp ONTAPDell PowerScale (Isilon)
ArchitectureSpanFS, hyperconvergedHA pairs (cluster of pairs)OneFS scale-out
Single namespaceCluster-widePer SVMCluster-wide
Multi-protocol same dataNFSv3/v4 + SMB3 + S3NFS + SMB + S3 (bolt-on)NFS + SMB + (S3 via OneFS or ECS)
Global dedupeYes, cluster-wideVolume/aggregateLimited; per-volume
Cold-tier cloudYes, transparent recall, all protocolsFabricPoolCloudPools
Native ICAP AVYesYes (Vscan)Yes (CAVA / ICAP)
Native auditYesFPolicyAudit subsystem
Backup targetNativePossible, not designed forPossible, not designed for
Single platform: backup + primaryYesNoNo

Lift-and-Shift Sequence

  1. Discovery. Run SmartFiles analytics against source filers; produce hot/warm/cold map per share.
  2. Decision tree. Per share: tier-off vs. full cutover vs. backup-driven restore.
  3. Identity and AV. Stand up AD/LDAP and ICAP scanner pools before any cutover.
  4. QoS placement. Map each share to a QoS class up front.
  5. Cutover windows. Plan at the 30-TB-per-event sizing; use replication seeding to minimize cutover time.
  6. Decommission. Retire source filer once SmartFiles View is authoritative and snapshots have aged.
flowchart TD Legacy[Legacy NAS
NetApp ONTAP / Dell Isilon] Disc[Discovery and Analytics
hot/warm/cold map] Decision{Decision tree
per share} Cutover[Full Cutover Path
NAS Migration Service
~30 TB per event] Tier[Cold-Data Tier-Off Path
policy-driven tiering
legacy keeps hot data] Backup[Backup-Driven Path
NAS backup + cross-filer restore] Prep[Pre-cutover prep
AD/LDAP + ICAP + QoS] SF[SmartFiles View
SMB3 + NFSv4 + S3] Decom[Decommission legacy] Legacy --> Disc Disc --> Decision Decision --> Cutover Decision --> Tier Decision --> Backup Cutover --> Prep Backup --> Prep Tier --> SF Prep --> SF SF --> Decom
Animation: NAS Migration Workflow — Legacy NAS to File Migration Service to SmartFiles
Legacy NAS NetApp / Isilon SMB / NFS / NDMP ~600 TB unstructured File Migration Service ~30 TB per cutover event ACL preservation SmartFiles Target View SMB3 + NFSv4 + S3 global dedupe, tiering Source In-flight (chunked, ACL-preserving) SmartFiles target Cutover progress (chunked, idempotent, resumable) Three paths converge: full cutover, cold-tier absorption, backup-driven restore

Key Points — Migration and Modernization

Post-Section Quiz — Migration and Modernization

1. Cohesity's packaged NAS File Migration Service is sized at approximately:

3 TB per migration event.
30 TB per migration event.
300 TB per migration event.
Unlimited per migration event.

2. The "transparent cold-data tiering" path lets architects:

Move only cold blocks off the legacy filer to SmartFiles or cloud while the legacy NAS keeps serving hot data.
Force a full cutover of all data immediately.
Replace the legacy filer's controller hardware in place.
Use S3 as a primary tier for hot data only.

3. The strongest architectural argument SmartFiles makes against NetApp ONTAP and Dell PowerScale is:

SmartFiles uses faster network cards.
SmartFiles is the only platform that supports SMB.
Consolidation: the same platform serves backup, files, objects, archive, and DR with global dedupe and native audit/ICAP.
SmartFiles is free.

Your Progress

Answer Explanations