Chapter 10: Troubleshooting, Upgrade Workflows, and Exam Strategy
Learning Objectives
Apply structured OSI-based troubleshooting (bottom-up, top-down, divide-and-conquer) and choose the right approach for a given symptom.
Interpret transceiver DOM data, interface counters, and use mirror sessions to capture traffic for deep analysis.
Execute the eight-step AOS-CX firmware upgrade and explain the role of primary/secondary partitions, RSA-3072 signing, and ServiceOS.
Plan a VSX Live Software Upgrade and contrast it with ISSU on CX 6400 and Hitless ISSU on CX 6300 VSF.
Choose the correct recovery tier (password reset, zeroize, USB recovery, checkpoint rollback) for a given failure scenario.
Build an HPE7-A01 study plan using blueprint weights and apply the 60-30-10 time-management rule on exam day.
Section 1: Structured Troubleshooting
When a user reports "the network is broken," the OSI model is your universal triage tool. Bottom-up starts at L1 (cheap to check, every higher layer assumes the one below is healthy). Top-down is faster when the symptom is clearly application-level. Divide-and-conquer picks L3/L4 first ("ping the gateway?") and narrows from there.
1.1 OSI Layer Quick Reference
Layer
Typical Symptoms
First-Look AOS-CX Commands
L1 Physical
Link down, CRC errors, intermittent flaps
show interface, show interface transceiver
L2 Data Link
VLAN mismatch, MAC flapping, STP topology change
show vlan, show mac-address-table, show spanning-tree
L3 Network
No route, OSPF down, wrong next hop
show ip route, show ip ospf neighbor
L4 Transport
TCP resets, blocked ports, ACL drops
show access-list hitcounts
L5-L7
DNS, AAA, certificate failures
show aaa authentication, RADIUS test, ClearPass logs
Animation: OSI-Layer Troubleshooting Decision Tree
Figure 10.1 - Walking the OSI Stack from L1 Up
1.2 Transceiver DOM & Mirror Sessions
AOS-CX exposes Digital Optical Monitoring (DOM) on every supported transceiver: temperature, voltage, bias current, TX power, RX power. Healthy 10G multimode optics run TX/RX between -1 dBm and -7 dBm; drifting toward -10 dBm signals a dirty connector or failing laser. Remember: more negative dBm = weaker signal.
switch(config)# mirror session 1
switch(config-mirror-1)# source interface 1/1/10 both
switch(config-mirror-1)# destination interface 1/1/48
switch(config-mirror-1)# no shutdown
Sources can be rx, tx, both, or an entire VLAN. Remote mirror (ERSPAN-like) wraps mirrored frames in GRE for distant analyzers. Never mirror a 10G port to a 1G destination - the destination tail-drops and your capture is incomplete.
Key Points - Troubleshooting
Bottom-up for L1 symptoms; top-down for app-only symptoms; divide-and-conquer otherwise.
DOM RX/TX between -1 and -7 dBm = healthy 10G multimode; -10 dBm = trouble.
Always run show techbefore poking the problem - it gives TAC a baseline.
1. A user reports "I can't reach the file server, but I can browse the web." What is the most efficient troubleshooting approach?
Bottom-up - start with cabling and optics
Top-down - the symptom is application-specific
Reload the switch immediately
Run erase zeroize and rebuild
2. A 10G fiber link shows RX power of -11 dBm and intermittent CRC errors. What does this most likely indicate?
Normal operating range, no action needed
A duplex mismatch on the partner switch
Weak signal - dirty connector, bent fiber, or failing laser
An OSPF adjacency issue
3. You need to capture wireless client traffic from an Aruba AP into Wireshark. Which AOS-CX feature is the appropriate tool?
SNMP trap forwarding
Mirror session with the AP as source and a probe port as destination
SFP DOM polling
show tech redirected to TFTP
4. Before escalating a complex issue to HPE TAC, which command produces the comprehensive diagnostic bundle they will request?
show running-config
show events
show tech
show version
Section 2: Image Management and Upgrades
AOS-CX uses a dual-partition firmware architecture: every switch has primary and secondary flash partitions, each capable of holding a complete image. One is active; the other is the parachute. The cardinal rule: always upload new firmware to the non-active partition.
2.1 Image Verification & Boot Order
Every AOS-CX image is signed with RSA-3072 / SHA-256. Signatures are checked twice: at download (rejected if invalid) and at every boot (drops to ServiceOS if invalid). There is no flag to bypass.
Figure 10.2 - Switch Boots Primary, Signature Fails, Falls Back to Secondary
2.2 The Eight-Step Traditional Upgrade
Step
Action
Command
1
Identify active partition
show version, show images
2
Transfer image to inactive partition
copy sftp://... secondary vrf mgmt
3
Verify signature
verify signature flash secondary
4
Set boot partition
boot system secondary
5
Save config
copy running-config startup-config
6
Reboot
boot system flash secondary
7
Validate
show version, show ip ospf neighbor
8
Sync partitions
copy primary secondary
2.3 Unsafe Updates & ServiceOS
Some releases include bootloader, PoE controller, or PHY firmware updates. Interrupting them can permanently brick the component, so they are disabled by default. Enable a 30-minute window with allow-unsafe-updates 30 and never remove power during one - the switch may reboot multiple times.
ServiceOS is a recovery OS reached by pressing 0 during the boot menu countdown over a 9600/8-N-1 console. From it you can boot, password reset, erase zeroize, or copy usb to recover a switch.
2.4 VSX Live Software Upgrade
For redundant pairs (CX 8100/8300/8325/8400), VSX Live Upgrade orchestrates a hitless transition. Headline number: 12-19 ms of measurable impact during the active/standby switchover.
Animation: VSX Live Upgrade Timeline
Figure 10.3 - VSX Live Upgrade: Standby First, Role Swap, Active Second
Platform
Hitless Mechanism
Min AOS-CX
CX 6400 chassis
ISSU (redundant mgmt + line-card hot patch)
10.10
CX 8300/8325/8400 (VSX pair)
VSX Live Upgrade (LACP drain)
10.06+
CX 6300 stack (VSF)
Enhanced Software Upgrade (ESU)
10.11
CX 6300 stack (VSF)
Hitless ISSU (no conductor reboot)
10.13
Key Points - Image Management
Always stage new image to the non-active partition.
RSA-3072 / SHA-256 signatures verified at download and at every boot - no bypass.
Eight-step upgrade ends with copy primary secondary sync.
VSX Live Upgrade = 12-19 ms; ISSU on CX 6400 and Hitless ISSU on CX 6300 VSF (10.13+) are zero-impact.
allow-unsafe-updates 30 opens a 30-min window for bootloader/PoE/PHY updates - never pull power.
Pre-Reading Quiz - Image Management
5. A switch is currently running from the primary partition. You need to upgrade firmware. Where should you stage the new image?
Primary partition (overwrite the running image)
Secondary partition (the non-active one)
USB drive only
Either - it makes no difference
6. What cryptographic algorithm signs every AOS-CX image, and when is it verified?
MD5 - verified only at download
SHA-1 - verified once at install
RSA-3072 / SHA-256 - verified at download AND every boot
AES-256 - verified by ServiceOS only
7. During a VSX Live Upgrade on a CX 8325 pair, what is the approximate measurable traffic impact?
Zero impact - completely hitless
12-19 milliseconds during the LACP role swap
2-3 seconds during reboot
30+ seconds while OSPF reconverges
8. A release notes entry warns the upgrade includes a PoE controller firmware update. Which configuration is required and what must you avoid?
allow-unsafe-updates 30 - never remove power during the upgrade
boot system primary force - reboot twice quickly
No special config - PoE updates are always safe
erase zeroize first to clear PoE state
Section 3: Recovery Procedures
AOS-CX exposes four escalating recovery tiers. Picking the right one is the difference between a 5-minute fix and a wiped switch.
flowchart TD
Start([Switch in trouble]) --> Q1{What is broken?}
Q1 -->|Forgot admin password| Tier1[Tier 1: Password Recovery ServiceOS: password reset config preserved]
Q1 -->|Bad config / change switch boots OK| Tier4[Tier 4: Checkpoint Rollback checkpoint rollback name no reboot required]
Q1 -->|Re-deploying / wipe needed| Tier2[Tier 2: Factory Reset ServiceOS: erase zeroize config + certs wiped]
Q1 -->|Both images corrupt| Tier3[Tier 3: USB Recovery FAT32 USB with .swi ServiceOS: copy usb ... primary]
3.1 Password Recovery (Tier 1) - Config Preserved
Console cable at 9600/8-N-1.
Reboot; press 0 during boot menu countdown.
At ServiceOS login:, type admin with no password.
Run password reset, confirm, then reload.
Boot, log in as admin/blank, immediately set a new password.
VLANs, routes, certificates - everything is preserved.
ServiceOS> erase zeroize
This will erase all configuration including certificates and management files.
Continue? (y/n): y
erase zeroize wipes startup config, certificates, RADIUS shared secrets, SSH keys, and management state. Firmware images are preserved. The switch reboots into ZTP-ready state with DHCP on the management interface and admin/blank credentials. This is the right command before redeployment - the wrong command if you only need a password reset.
3.3 USB Image Recovery (Tier 3) - Both Images Dead
ServiceOS> dir usb
ServiceOS> copy usb /ArubaOS-CX_6300_10_14_0001.swi primary
ServiceOS> boot
USB must be FAT32. The image is RSA-3072-verified before being written - a tampered file cannot revive a switch.
Time Machine for switch config. Integrates with NAE - every checkpoint creates a purple diamond marker on performance graphs for instant change correlation.
ServiceOS access: console 9600/8-N-1, press 0 during boot menu countdown.
USB recovery requires FAT32 + valid RSA-3072 signature.
Checkpoint rollback is reversible without a reboot.
Pre-Reading Quiz - Recovery
9. A junior engineer locked everyone out by changing the admin password. The switch is in production and configuration must be preserved. Which recovery tier?
Tier 2: erase zeroize
Tier 1: ServiceOS password reset
Tier 3: USB image recovery
Reload from startup-config
10. How do you enter ServiceOS on an AOS-CX switch?
SSH to the switch and type service
Connect via HTTPS to /serviceos
Console at 9600/8-N-1, press 0 during boot menu countdown
Hold the reset button for 30 seconds
11. A switch is being redeployed at a different customer site. You need to wipe configuration AND certificates but keep the firmware images. Which command?
password reset
erase startup-config
erase zeroize from ServiceOS
checkpoint rollback factory
12. Both flash partitions failed signature verification after a power blip. The switch sits at a ServiceOS prompt. What is the recovery path?
Run password reset and reboot
FAT32 USB with .swi file, copy usb ... primary, then boot
RMA the switch immediately - no recovery is possible
checkpoint rollback running-config
Section 4: HPE7-A01 Exam Strategy
4.1 Exam Specifications
Exam code
HPE7-A01
Format
Multiple-choice + scenario/simulation
Questions / Time
75 questions / 120 minutes
Passing score
68% (~51/75 correct)
Cost
USD 350 (USD 195 emerging markets)
Average pace
96 seconds per question
4.2 Blueprint Weights Drive Study Time
Domain
Weight
60-hr plan
120-hr plan
WLAN
17%
10.2
20.4
Switching
14%
8.4
16.8
Routing
13%
7.8
15.6
Security
9%
5.4
10.8
Auth & Authz
8%
4.8
9.6
Resiliency / Virt
8%
4.8
9.6
Connectivity
8%
4.8
9.6
Performance
7%
4.2
8.4
Mgmt & Monitoring
6%
3.6
7.2
Troubleshooting
6%
3.6
7.2
Network Stack
4%
2.4
4.8
4.3 The 60-30-10 Time-Management Rule
Phase
Time
Goal
Pass 1
~72 min (60%)
Answer every easy/medium. Flag & skip >2 min items.
Pass 2
~36 min (30%)
Tackle flagged hard questions one by one.
Pass 3
~12 min (10%)
Review flagged answers. Don't change without concrete reason.
4.4 Question-Attack Tactics
Eliminate, don't pick. Cross out two clearly-wrong answers first.
Watch for absolutes. "Always," "never," "only" usually wrong; "typically" usually right.
Read the last sentence first in long scenarios - it tells you what's actually being asked.
For sims, run the simplest verification first - show vsx status, show ip ospf neighbor.
Key Points - Exam Strategy
75 Q / 120 min = 96 seconds average per question; pass = 68%.
Allocate study hours proportional to blueprint weights (WLAN 17% gets the most).