Chapter 2: DGX OS Software Stack, CUDA Toolkit & Containerized AI Workflows
Learning Objectives
Navigate the DGX OS Ubuntu Linux environment and NVIDIA-optimized kernel components
Configure and use the pre-installed CUDA toolkit, cuDNN, TensorRT, and core AI frameworks for development workflows
Deploy and manage NGC container images for reproducible AI environments on DGX Spark
Implement containerized inference and training pipelines using NVIDIA Container Toolkit and Kubernetes
Section 1: DGX OS — Ubuntu Linux with NVIDIA Optimization
Pre-Quiz: DGX OS Fundamentals
1. What distinguishes DGX OS GPU driver installation from a standard Ubuntu GPU setup?
DGX OS requires manual DKMS compilation after each kernel updateDGX OS uses a proprietary non-Linux kernel for GPU supportDGX OS ships GPU drivers pre-integrated as .deb packages without DKMS, eliminating rebuild stepsDGX OS only supports GPU access through containers, not natively
2. What is the primary role of DCGM (Data Center GPU Manager) on DGX Spark?
It replaces nvidia-smi as the only GPU monitoring toolIt provides programmatic GPU telemetry, health checks, and integration with monitoring dashboards like PrometheusIt manages Docker container networking for GPU workloadsIt controls GPU clock speeds and voltage for overclocking
3. Why does DGX Spark ship with a custom NVIDIA kernel rather than the stock Ubuntu kernel?
The stock kernel cannot boot on ARM64 hardwareThe custom kernel is tuned for Grace Blackwell unified memory, NVLink, and GPU schedulingNVIDIA legally cannot distribute the standard Ubuntu kernelThe custom kernel removes all networking support to improve GPU performance
4. How does DGX Spark handle multi-user GPU workload isolation?
Each user receives a physically separate GPU partition at the hardware levelOnly one user can log in at a time to prevent conflictsGPU access is controlled at the container level using the --gpus flag for Docker containersA hypervisor creates virtual GPUs for each user session
5. Which diagnostic approach correctly represents the layered diagnostics model on DGX Spark?
Run nvidia-smi only; it covers all diagnostic needsHardware layer (nvidia-smi -q), driver layer (dmesg | grep nvidia), application layer (CUDA sample programs)Check Docker logs first, then reboot the system if errors persistRun the management dashboard exclusively; command-line tools are deprecated
DGX OS Base System: Ubuntu LTS with NVIDIA Kernel Modules
DGX OS is built on Ubuntu 24.04 LTS but is far more than a stock installation. NVIDIA ships a custom kernel (6.17.0-1014-nvidia) alongside a Hardware Enablement (HWE) kernel 6.14, both tuned for the Grace Blackwell architecture's unified memory, NVLink interconnects, and GPU scheduling requirements.
The GPU driver package — the nvidia-580-open series — is delivered as a .deb package without DKMS (Dynamic Kernel Module Support). This eliminates the fragile driver rebuild step that plagues standard Linux GPU setups. Think of it this way: if standard Ubuntu with manually installed drivers is like assembling furniture from parts, DGX OS is the factory-assembled version.
Component
Standard Ubuntu
DGX OS
Kernel
Generic Linux kernel
Custom NVIDIA kernel (6.17.0-1014-nvidia)
GPU Drivers
Manually installed, DKMS-rebuilt
Pre-integrated, .deb packaged, no DKMS
AI Libraries
User-installed
Pre-configured CUDA, cuDNN, TensorRT
Container Runtime
Docker only
Docker + NVIDIA Container Toolkit
DGX OS Software Stack Layers
GPU Monitoring with nvidia-smi and DCGM
The moment DGX Spark boots, the GPU driver stack is operational. The primary monitoring interface is nvidia-smi, which reports GPU utilization, memory, temperature, power draw, and running processes. For continuous monitoring, nvidia-smi dmon streams metrics at configurable intervals.
Beyond nvidia-smi, DCGM (Data Center GPU Manager) provides programmatic access to GPU telemetry, health checks, and policy-based monitoring suitable for integration with Prometheus and Grafana dashboards.
Application layer: CUDA sample programs (deviceQuery, bandwidthTest) verify end-to-end functionality
Multi-User Access and GPU Isolation
DGX Spark supports teams through standard Linux multi-user capabilities enhanced for GPU workload isolation. GPU access is controlled at the container level — each user's Docker containers receive dedicated GPU resources through the --gpus flag. JupyterLab, pre-installed on DGX Spark, provides browser-based development access with per-user sessions.
Key Takeaways
DGX OS is a purpose-built Linux distribution, not simply Ubuntu with NVIDIA drivers bolted on
GPU drivers are kernel-integrated, packaged as .deb without DKMS — no rebuild on kernel updates
nvidia-smi and DCGM provide complementary monitoring (interactive vs. programmatic)
Multi-user isolation is container-based via the --gpus flag, not hardware GPU partitioning
Post-Quiz: DGX OS Fundamentals
1. What distinguishes DGX OS GPU driver installation from a standard Ubuntu GPU setup?
DGX OS requires manual DKMS compilation after each kernel updateDGX OS uses a proprietary non-Linux kernel for GPU supportDGX OS ships GPU drivers pre-integrated as .deb packages without DKMS, eliminating rebuild stepsDGX OS only supports GPU access through containers, not natively
2. What is the primary role of DCGM (Data Center GPU Manager) on DGX Spark?
It replaces nvidia-smi as the only GPU monitoring toolIt provides programmatic GPU telemetry, health checks, and integration with monitoring dashboards like PrometheusIt manages Docker container networking for GPU workloadsIt controls GPU clock speeds and voltage for overclocking
3. Why does DGX Spark ship with a custom NVIDIA kernel rather than the stock Ubuntu kernel?
The stock kernel cannot boot on ARM64 hardwareThe custom kernel is tuned for Grace Blackwell unified memory, NVLink, and GPU schedulingNVIDIA legally cannot distribute the standard Ubuntu kernelThe custom kernel removes all networking support to improve GPU performance
4. How does DGX Spark handle multi-user GPU workload isolation?
Each user receives a physically separate GPU partition at the hardware levelOnly one user can log in at a time to prevent conflictsGPU access is controlled at the container level using the --gpus flag for Docker containersA hypervisor creates virtual GPUs for each user session
5. Which diagnostic approach correctly represents the layered diagnostics model on DGX Spark?
Run nvidia-smi only; it covers all diagnostic needsHardware layer (nvidia-smi -q), driver layer (dmesg | grep nvidia), application layer (CUDA sample programs)Check Docker logs first, then reboot the system if errors persistRun the management dashboard exclusively; command-line tools are deprecated
Section 2: CUDA Toolkit & Core AI Development Libraries
Pre-Quiz: CUDA Toolkit & AI Libraries
1. What does cuDNN provide that the base CUDA toolkit does not?
A GPU compiler for .cu source filesGPU-accelerated deep learning primitives like convolutions, pooling, and normalizationContainer runtime hooks for GPU passthroughA web-based dashboard for monitoring training progress
2. Why must software compiled natively for DGX Spark target ARM64 rather than x86_64?
The Blackwell GPU only supports ARM instruction setsDGX Spark uses the Grace CPU which is an ARM64 (AArch64) processorARM64 is required by the NVIDIA Container Toolkit licensing termsx86_64 binaries are automatically translated, but ARM64 is faster
3. What is TensorRT's primary function in the AI deployment pipeline?
Training neural networks from scratch with automatic hyperparameter tuningConverting trained models into optimized inference engines with layer fusion and precision calibrationManaging CUDA toolkit version compatibility across different GPU architecturesDistributing training data across multiple nodes in a cluster
4. How does NCCL improve multi-GPU training performance on DGX Spark?
It compresses model weights to reduce memory usage per GPUIt routes collective operations (AllReduce, Broadcast) over high-bandwidth NVLink rather than slower PCIeIt automatically partitions the model across GPUs using pipeline parallelismIt replaces cuDNN for communication-heavy operations like attention layers
5. When a developer runs PyTorch on DGX Spark and calls a convolution operation, what library actually performs the GPU computation?
PyTorch's built-in GPU kernels written in PythoncuDNN, which selects the fastest algorithm for the specific tensor dimensions and hardwareTensorRT, which optimizes all operations at runtimeNCCL, which handles all GPU computations including math operations
CUDA Toolkit: Installation, Versioning, and Configuration
The CUDA toolkit comes pre-installed on DGX Spark, verified for Blackwell hardware compatibility. Versions include CUDA 12.8 and CUDA 13.0.2 depending on the release batch. The toolkit includes:
nvcc — the CUDA compiler for .cu source files
CUDA runtime libraries — the API layer applications link against
cuBLAS, cuFFT, cuSPARSE — GPU-accelerated math libraries
CUDA samples — reference implementations for testing
Environment configuration centers on two variables: PATH must include /usr/local/cuda/bin, and LD_LIBRARY_PATH must include /usr/local/cuda/lib64. On DGX OS, these are set by default.
Important: DGX Spark uses the ARM64 (AArch64) architecture via the Grace CPU. Any natively compiled software must target ARM64, and the NGC CLI must be installed from the ARM64 Linux tab.
cuDNN and TensorRT
cuDNN provides GPU-accelerated deep learning primitives: convolutions, pooling, normalization, and activation functions. When PyTorch or TensorFlow execute a convolution, they call cuDNN, which selects the fastest algorithm for the specific tensor dimensions, data type, and hardware.
TensorRT (v10.2 on DGX Spark) is NVIDIA's inference optimization engine. It takes a trained model and produces an optimized execution plan — fusing layers, selecting precision (FP32, FP16, INT8), and calibrating for the target GPU. The workflow: export to ONNX, optimize with trtexec, deploy the .trt engine.
flowchart LR
A["Trained Model\nPyTorch / TensorFlow"] --> B["Export to ONNX\ntorch.onnx.export()"]
B --> C["TensorRT Optimizer\ntrtexec"]
C --> D{"Precision\nSelection"}
D --> E["FP32\nFull Precision"]
D --> F["FP16\nHalf Precision"]
D --> G["INT8\nQuantized"]
E --> H["Layer Fusion\n& Kernel Selection"]
F --> H
G --> H
H --> I["Optimized TensorRT\nEngine (.trt)"]
I --> J["Deploy for\nInference"]
style A fill:#333,color:#fff
style B fill:#333,color:#fff
style C fill:#76b900,color:#000
style D fill:#005f30,color:#fff
style E fill:#005f30,color:#fff
style F fill:#005f30,color:#fff
style G fill:#005f30,color:#fff
style H fill:#76b900,color:#000
style I fill:#76b900,color:#000
style J fill:#333,color:#fff
Library
Purpose
When You Use It
CUDA Toolkit
GPU computation platform
Compiling custom CUDA kernels
cuDNN
DL operation primitives
Automatically via PyTorch/TensorFlow
TensorRT
Inference optimization
Deploying models to production
cuBLAS
Linear algebra on GPU
Matrix ops, automatically via frameworks
NCCL
Multi-GPU communication
Distributed training (automatic)
NCCL and Multi-GPU Communication
NCCL (pronounced "Nickel") handles data transfer between multiple GPUs. It orchestrates operations like AllReduce, Broadcast, and AllGather across GPUs connected via NVLink, exploiting high-bandwidth topology rather than slower PCIe. PyTorch's DistributedDataParallel and TensorFlow's tf.distribute.Strategy call NCCL automatically.
Framework Integration: PyTorch, TensorFlow, JAX
DGX Spark ships with pre-installed versions of major AI frameworks, each compiled against the system's CUDA, cuDNN, and NCCL versions. The pre-installed versions are matched and tested, avoiding version compatibility headaches. For different framework versions, NGC containers provide isolated environments with their own stack.
CUDA toolkit, cuDNN, TensorRT, and NCCL form a layered acceleration stack, all pre-configured on DGX Spark
cuDNN is called transparently by frameworks — understanding it helps diagnose performance issues
TensorRT converts trained models into optimized inference engines via ONNX export and trtexec
DGX Spark's ARM64 architecture means all native binaries must target AArch64
Post-Quiz: CUDA Toolkit & AI Libraries
1. What does cuDNN provide that the base CUDA toolkit does not?
A GPU compiler for .cu source filesGPU-accelerated deep learning primitives like convolutions, pooling, and normalizationContainer runtime hooks for GPU passthroughA web-based dashboard for monitoring training progress
2. Why must software compiled natively for DGX Spark target ARM64 rather than x86_64?
The Blackwell GPU only supports ARM instruction setsDGX Spark uses the Grace CPU which is an ARM64 (AArch64) processorARM64 is required by the NVIDIA Container Toolkit licensing termsx86_64 binaries are automatically translated, but ARM64 is faster
3. What is TensorRT's primary function in the AI deployment pipeline?
Training neural networks from scratch with automatic hyperparameter tuningConverting trained models into optimized inference engines with layer fusion and precision calibrationManaging CUDA toolkit version compatibility across different GPU architecturesDistributing training data across multiple nodes in a cluster
4. How does NCCL improve multi-GPU training performance on DGX Spark?
It compresses model weights to reduce memory usage per GPUIt routes collective operations (AllReduce, Broadcast) over high-bandwidth NVLink rather than slower PCIeIt automatically partitions the model across GPUs using pipeline parallelismIt replaces cuDNN for communication-heavy operations like attention layers
5. When a developer runs PyTorch on DGX Spark and calls a convolution operation, what library actually performs the GPU computation?
PyTorch's built-in GPU kernels written in PythoncuDNN, which selects the fastest algorithm for the specific tensor dimensions and hardwareTensorRT, which optimizes all operations at runtimeNCCL, which handles all GPU computations including math operations
Section 3: NGC Container Registry & Containerized Workflows
Pre-Quiz: NGC Containers & Workflows
1. What does the NVIDIA Container Toolkit provide that standard Docker does not?
Network isolation between containersOCI runtime hooks that expose GPU drivers, CUDA libraries, and device files inside containersAutomatic container image compression for faster pullsBuilt-in container orchestration with load balancing
2. What is the recommended approach for building a custom AI container on DGX Spark?
Start from a minimal Alpine Linux image and install CUDA from scratchStart from an NVIDIA base image (e.g., nvcr.io/nvidia/pytorch) and layer project-specific dependenciesCopy the host system's /usr/local/cuda directory into the containerUse a Windows container with CUDA support for maximum compatibility
3. When authenticating Docker with the NGC registry, what value is used as the username?
Your NVIDIA developer account email$oauthtoken (literal string)Your NGC organization nameadmin
4. What is the purpose of the --gpus all flag when running a Docker container on DGX Spark?
It installs GPU drivers inside the container imageIt tells the NVIDIA runtime to expose all available GPUs to the containerIt enables CPU-based GPU emulation for testingIt restricts the container to use only GPU memory, not system RAM
5. In a Docker Compose file for DGX Spark, how are GPU resources specified for a service?
Using the gpus: all top-level keyUsing deploy.resources.reservations.devices with driver nvidia and capabilities [gpu]Adding --gpus all to the command fieldGPU access is automatic in Docker Compose and needs no configuration
NGC Container Registry
The NGC container registry at nvcr.io hosts hundreds of pre-built, GPU-optimized container images. These include framework containers (PyTorch, TensorFlow, JAX), application containers (Triton, RAPIDS), and model containers (NIM microservices). Each image is tested on NVIDIA hardware.
Setting up NGC access:
Generate an API key at ngc.nvidia.com → Setup → API Key
Authenticate Docker: docker login nvcr.io (username: $oauthtoken, password: your API key)
Install NGC CLI from the ARM64 Linux tab (required for Grace CPU architecture)
The NVIDIA Container Toolkit installs OCI runtime hooks that expose GPU drivers, CUDA libraries, and device files inside containers without requiring these to be baked into the image. On DGX Spark, the toolkit is pre-installed and pre-configured.
# Run GPU-enabled container
docker run -it --gpus all nvcr.io/nvidia/pytorch:24.08-py3
# Specify individual GPUs
docker run -it --gpus '"device=0"' nvcr.io/nvidia/pytorch:24.08-py3
# Verify GPU access inside container
docker run --rm --gpus all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi
Building Custom Containers
Start from an NVIDIA base image and layer your requirements:
Build and run: docker build -t my-training:v1 . then docker run --gpus all -v /data:/data my-training:v1
Container Orchestration Patterns
Two orchestration patterns are common on DGX Spark:
Docker Compose for multi-container workflows (training + TensorBoard, etc.) with GPU resource reservations via deploy.resources.reservations.devices
Kubernetes with NVIDIA GPU Operator for larger-scale orchestration with automatic GPU scheduling, resource quotas, and multi-user workload management
Key Takeaways
NGC containers are pre-optimized for NVIDIA hardware and solve the "works on my machine" problem
The NVIDIA Container Toolkit provides GPU passthrough via OCI runtime hooks — no driver install needed in containers
Custom containers should start from NVIDIA base images, not install CUDA from scratch
Docker Compose and Kubernetes provide orchestration for team-scale workloads
Post-Quiz: NGC Containers & Workflows
1. What does the NVIDIA Container Toolkit provide that standard Docker does not?
Network isolation between containersOCI runtime hooks that expose GPU drivers, CUDA libraries, and device files inside containersAutomatic container image compression for faster pullsBuilt-in container orchestration with load balancing
2. What is the recommended approach for building a custom AI container on DGX Spark?
Start from a minimal Alpine Linux image and install CUDA from scratchStart from an NVIDIA base image (e.g., nvcr.io/nvidia/pytorch) and layer project-specific dependenciesCopy the host system's /usr/local/cuda directory into the containerUse a Windows container with CUDA support for maximum compatibility
3. When authenticating Docker with the NGC registry, what value is used as the username?
Your NVIDIA developer account email$oauthtoken (literal string)Your NGC organization nameadmin
4. What is the purpose of the --gpus all flag when running a Docker container on DGX Spark?
It installs GPU drivers inside the container imageIt tells the NVIDIA runtime to expose all available GPUs to the containerIt enables CPU-based GPU emulation for testingIt restricts the container to use only GPU memory, not system RAM
5. In a Docker Compose file for DGX Spark, how are GPU resources specified for a service?
Using the gpus: all top-level keyUsing deploy.resources.reservations.devices with driver nvidia and capabilities [gpu]Adding --gpus all to the command fieldGPU access is automatic in Docker Compose and needs no configuration
Section 4: NVIDIA NIM Microservices & AI Enterprise Stack
Pre-Quiz: NIM Microservices & AI Enterprise
1. What is the key advantage of NIM microservices for model deployment?
NIM trains models faster by using distributed computing automaticallyNIM converts model deployment from a systems engineering challenge into a container orchestration taskNIM eliminates the need for GPUs during inference by using CPU-only optimizationNIM provides a graphical interface for non-technical users to deploy models
2. How does Triton Inference Server differ from NIM in its approach to model serving?
Triton only supports TensorRT models, while NIM supports all frameworksTriton provides multi-model, multi-framework serving with fine-grained scheduling, while NIM packages single models as turnkey APIsTriton is for training only, while NIM is for inference onlyThere is no practical difference; they are the same tool with different names
3. What does Triton's dynamic batching feature accomplish?
It splits large models across multiple GPUs automaticallyIt automatically groups incoming requests to maximize GPU throughputIt dynamically adjusts model precision based on available memoryIt batches model updates to reduce deployment downtime
4. What does the NVIDIA AI Enterprise license provide beyond the open-source stack?
Access to CUDA and cuDNN, which are not available in the open-source stackEnterprise support with SLA, CVE response guarantees, full NIM catalog, and pre-built BlueprintsHigher GPU clock speeds unlocked through a software license keyAccess to x86_64 emulation for running legacy workloads on ARM64
5. In a production observability stack for AI services on DGX Spark, what role does the /v2/health/ready endpoint serve?
It triggers automatic model retraining when accuracy dropsIt enables load balancers and Kubernetes to route traffic away from unhealthy instancesIt exposes GPU temperature data for thermal throttling alertsIt provides a web dashboard for real-time inference visualization
NIM Microservices: Models as API Endpoints
NVIDIA NIM provides prebuilt, optimized containers that package foundation models as API endpoints. Each NIM container includes the model weights, an inference engine (typically TensorRT-LLM), and an OpenAI-compatible API server.
# Pull and run a NIM container
docker pull nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
docker run --gpus all -p 8000:8000 nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
# Query the model via OpenAI-compatible API
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "meta/llama-3.1-8b-instruct",
"messages": [{"role": "user", "content": "Explain GPU memory hierarchy."}]}'
NIM handles model optimization internally — batch sizes, KV-cache memory management, and TensorRT-LLM optimizations. The key advantage: if you can run a Docker container, you can serve a production-grade LLM.
NIM Microservice Request Flow
NVIDIA AI Enterprise Platform
NVIDIA AI Enterprise is the commercial software layer providing enterprise support, security certifications, API stability guarantees, and validated upgrade paths. DGX Spark includes an AI Enterprise license, unlocking NIM microservices, enterprise support, and NVIDIA Blueprints.
Capability
Open-Source Stack
AI Enterprise
CUDA/cuDNN
Included
Included
NGC Containers
Public catalog
Full catalog + enterprise images
NIM Microservices
Community models
Full model catalog + support
Security
Community patches
CVE response SLA
Support
Forums
Enterprise support with SLA
Blueprints
Not available
Pre-built reference architectures
Triton Inference Server: Multi-Model Serving
Triton Inference Server serves multiple models simultaneously with fine-grained control over scheduling, batching, and resource allocation. It supports TensorFlow SavedModels, PyTorch TorchScript, ONNX, TensorRT engines, and Python-based models through a unified interface.
Dynamic batching: Automatically groups requests to maximize GPU throughput
Model ensembles: Chains models in a pipeline (tokenizer → LLM → post-processor)
Concurrent execution: Runs different models on the same GPU
Metrics endpoint: Prometheus-compatible metrics for monitoring integration
Triton serves models via HTTP (port 8000), gRPC (port 8001), and metrics (port 8002).
flowchart TD
A["Client Requests"] --> B["Triton Inference Server"]
B --> C["Dynamic Batching\nEngine"]
C --> D["Text Classifier\nONNX Model"]
C --> E["Image Encoder\nTensorRT Engine"]
C --> F["LLM Service\nPython Backend"]
B --> G["HTTP Port 8000"]
B --> H["gRPC Port 8001"]
B --> I["Prometheus Metrics\nPort 8002"]
I --> J["Grafana\nDashboard"]
style A fill:#333,color:#fff
style B fill:#76b900,color:#000
style C fill:#005f30,color:#fff
style D fill:#005f30,color:#fff
style E fill:#005f30,color:#fff
style F fill:#005f30,color:#fff
style G fill:#333,color:#fff
style H fill:#333,color:#fff
style I fill:#333,color:#fff
style J fill:#1a1a1a,color:#fff
Monitoring and Observability
Production AI services require continuous monitoring across four layers:
Inference metrics: Triton exposes latency, throughput, queue depth via Prometheus
Container metrics: Docker/Kubernetes provide CPU, memory, network I/O data
Application logs: Structured logging from NIM and Triton for debugging and auditing
Health check endpoints (/v2/health/ready) enable load balancers and Kubernetes to automatically route traffic away from unhealthy instances.
Key Takeaways
NIM provides turnkey single-model deployment as OpenAI-compatible API endpoints
Triton provides flexible multi-model, multi-framework serving with dynamic batching
NVIDIA AI Enterprise adds enterprise support, security SLAs, and pre-built Blueprints
Production monitoring spans GPU, inference, container, and application layers, routed to Prometheus/Grafana
Post-Quiz: NIM Microservices & AI Enterprise
1. What is the key advantage of NIM microservices for model deployment?
NIM trains models faster by using distributed computing automaticallyNIM converts model deployment from a systems engineering challenge into a container orchestration taskNIM eliminates the need for GPUs during inference by using CPU-only optimizationNIM provides a graphical interface for non-technical users to deploy models
2. How does Triton Inference Server differ from NIM in its approach to model serving?
Triton only supports TensorRT models, while NIM supports all frameworksTriton provides multi-model, multi-framework serving with fine-grained scheduling, while NIM packages single models as turnkey APIsTriton is for training only, while NIM is for inference onlyThere is no practical difference; they are the same tool with different names
3. What does Triton's dynamic batching feature accomplish?
It splits large models across multiple GPUs automaticallyIt automatically groups incoming requests to maximize GPU throughputIt dynamically adjusts model precision based on available memoryIt batches model updates to reduce deployment downtime
4. What does the NVIDIA AI Enterprise license provide beyond the open-source stack?
Access to CUDA and cuDNN, which are not available in the open-source stackEnterprise support with SLA, CVE response guarantees, full NIM catalog, and pre-built BlueprintsHigher GPU clock speeds unlocked through a software license keyAccess to x86_64 emulation for running legacy workloads on ARM64
5. In a production observability stack for AI services on DGX Spark, what role does the /v2/health/ready endpoint serve?
It triggers automatic model retraining when accuracy dropsIt enables load balancers and Kubernetes to route traffic away from unhealthy instancesIt exposes GPU temperature data for thermal throttling alertsIt provides a web dashboard for real-time inference visualization