Chapter 10: Troubleshooting, Upgrade Workflows, and Exam Strategy

Learning Objectives

Section 1: Structured Troubleshooting

When a user reports "the network is broken," the OSI model is your universal triage tool. Bottom-up starts at L1 (cheap to check, every higher layer assumes the one below is healthy). Top-down is faster when the symptom is clearly application-level. Divide-and-conquer picks L3/L4 first ("ping the gateway?") and narrows from there.

1.1 OSI Layer Quick Reference

LayerTypical SymptomsFirst-Look AOS-CX Commands
L1 PhysicalLink down, CRC errors, intermittent flapsshow interface, show interface transceiver
L2 Data LinkVLAN mismatch, MAC flapping, STP topology changeshow vlan, show mac-address-table, show spanning-tree
L3 NetworkNo route, OSPF down, wrong next hopshow ip route, show ip ospf neighbor
L4 TransportTCP resets, blocked ports, ACL dropsshow access-list hitcounts
L5-L7DNS, AAA, certificate failuresshow aaa authentication, RADIUS test, ClearPass logs

Animation: OSI-Layer Troubleshooting Decision Tree

Figure 10.1 - Walking the OSI Stack from L1 Up
User reports problem L1 PHYSICAL show interface transceiver / DOM dBm L2 DATA LINK show vlan / mac-address-table / spanning-tree L3 NETWORK show ip route / ospf neighbor / ping gateway L4 - L7 TRANSPORT / APP ACL hits / DNS / AAA / show tech BOTTOM-UP TIP Cheap-to-check problems live at the bottom. Higher layers assume lower layers are healthy.

1.2 Transceiver DOM & Mirror Sessions

AOS-CX exposes Digital Optical Monitoring (DOM) on every supported transceiver: temperature, voltage, bias current, TX power, RX power. Healthy 10G multimode optics run TX/RX between -1 dBm and -7 dBm; drifting toward -10 dBm signals a dirty connector or failing laser. Remember: more negative dBm = weaker signal.

switch(config)# mirror session 1
switch(config-mirror-1)# source interface 1/1/10 both
switch(config-mirror-1)# destination interface 1/1/48
switch(config-mirror-1)# no shutdown

Sources can be rx, tx, both, or an entire VLAN. Remote mirror (ERSPAN-like) wraps mirrored frames in GRE for distant analyzers. Never mirror a 10G port to a 1G destination - the destination tail-drops and your capture is incomplete.

Key Points - Troubleshooting

Pre-Reading Quiz - Structured Troubleshooting

1. A user reports "I can't reach the file server, but I can browse the web." What is the most efficient troubleshooting approach?

Bottom-up - start with cabling and optics
Top-down - the symptom is application-specific
Reload the switch immediately
Run erase zeroize and rebuild

2. A 10G fiber link shows RX power of -11 dBm and intermittent CRC errors. What does this most likely indicate?

Normal operating range, no action needed
A duplex mismatch on the partner switch
Weak signal - dirty connector, bent fiber, or failing laser
An OSPF adjacency issue

3. You need to capture wireless client traffic from an Aruba AP into Wireshark. Which AOS-CX feature is the appropriate tool?

SNMP trap forwarding
Mirror session with the AP as source and a probe port as destination
SFP DOM polling
show tech redirected to TFTP

4. Before escalating a complex issue to HPE TAC, which command produces the comprehensive diagnostic bundle they will request?

show running-config
show events
show tech
show version

Section 2: Image Management and Upgrades

AOS-CX uses a dual-partition firmware architecture: every switch has primary and secondary flash partitions, each capable of holding a complete image. One is active; the other is the parachute. The cardinal rule: always upload new firmware to the non-active partition.

2.1 Image Verification & Boot Order

Every AOS-CX image is signed with RSA-3072 / SHA-256. Signatures are checked twice: at download (rejected if invalid) and at every boot (drops to ServiceOS if invalid). There is no flag to bypass.

switch# copy sftp://admin@10.0.0.5/ArubaOS-CX_6300_10_14_0001.swi secondary vrf mgmt
switch# verify signature flash secondary
switch(config)# boot system secondary
switch# copy running-config startup-config
switch# boot system flash secondary

Animation: Primary/Secondary Boot & Fallback

Figure 10.2 - Switch Boots Primary, Signature Fails, Falls Back to Secondary
Power On PRIMARY FL.10.14 (corrupt) Signature FAIL SERVICE OS boot secondary SECONDARY FL.10.13 (known good) Signature OK - RUNNING CARDINAL RULE Always upload new firmware to the NON-ACTIVE partition. Rollback is one boot away.

2.2 The Eight-Step Traditional Upgrade

StepActionCommand
1Identify active partitionshow version, show images
2Transfer image to inactive partitioncopy sftp://... secondary vrf mgmt
3Verify signatureverify signature flash secondary
4Set boot partitionboot system secondary
5Save configcopy running-config startup-config
6Rebootboot system flash secondary
7Validateshow version, show ip ospf neighbor
8Sync partitionscopy primary secondary

2.3 Unsafe Updates & ServiceOS

Some releases include bootloader, PoE controller, or PHY firmware updates. Interrupting them can permanently brick the component, so they are disabled by default. Enable a 30-minute window with allow-unsafe-updates 30 and never remove power during one - the switch may reboot multiple times.

ServiceOS is a recovery OS reached by pressing 0 during the boot menu countdown over a 9600/8-N-1 console. From it you can boot, password reset, erase zeroize, or copy usb to recover a switch.

2.4 VSX Live Software Upgrade

For redundant pairs (CX 8100/8300/8325/8400), VSX Live Upgrade orchestrates a hitless transition. Headline number: 12-19 ms of measurable impact during the active/standby switchover.

Animation: VSX Live Upgrade Timeline

Figure 10.3 - VSX Live Upgrade: Standby First, Role Swap, Active Second
t=0 complete VSX ACTIVE FL.10.13 -> FL.10.14 (upgraded second) VSX STANDBY FL.10.13 -> FL.10.14 (upgraded first) ISL 1 Stage image to standby 2 Standby reboots on new code 3 12-19 ms LACP traffic drain role swap 4 Active reboots on new code 5 Both on new code
PlatformHitless MechanismMin AOS-CX
CX 6400 chassisISSU (redundant mgmt + line-card hot patch)10.10
CX 8300/8325/8400 (VSX pair)VSX Live Upgrade (LACP drain)10.06+
CX 6300 stack (VSF)Enhanced Software Upgrade (ESU)10.11
CX 6300 stack (VSF)Hitless ISSU (no conductor reboot)10.13

Key Points - Image Management

Pre-Reading Quiz - Image Management

5. A switch is currently running from the primary partition. You need to upgrade firmware. Where should you stage the new image?

Primary partition (overwrite the running image)
Secondary partition (the non-active one)
USB drive only
Either - it makes no difference

6. What cryptographic algorithm signs every AOS-CX image, and when is it verified?

MD5 - verified only at download
SHA-1 - verified once at install
RSA-3072 / SHA-256 - verified at download AND every boot
AES-256 - verified by ServiceOS only

7. During a VSX Live Upgrade on a CX 8325 pair, what is the approximate measurable traffic impact?

Zero impact - completely hitless
12-19 milliseconds during the LACP role swap
2-3 seconds during reboot
30+ seconds while OSPF reconverges

8. A release notes entry warns the upgrade includes a PoE controller firmware update. Which configuration is required and what must you avoid?

allow-unsafe-updates 30 - never remove power during the upgrade
boot system primary force - reboot twice quickly
No special config - PoE updates are always safe
erase zeroize first to clear PoE state

Section 3: Recovery Procedures

AOS-CX exposes four escalating recovery tiers. Picking the right one is the difference between a 5-minute fix and a wiped switch.

flowchart TD Start([Switch in trouble]) --> Q1{What is broken?} Q1 -->|Forgot admin password| Tier1[Tier 1: Password Recovery
ServiceOS: password reset
config preserved] Q1 -->|Bad config / change
switch boots OK| Tier4[Tier 4: Checkpoint Rollback
checkpoint rollback name
no reboot required] Q1 -->|Re-deploying / wipe needed| Tier2[Tier 2: Factory Reset
ServiceOS: erase zeroize
config + certs wiped] Q1 -->|Both images corrupt| Tier3[Tier 3: USB Recovery
FAT32 USB with .swi
ServiceOS: copy usb ... primary]

3.1 Password Recovery (Tier 1) - Config Preserved

  1. Console cable at 9600/8-N-1.
  2. Reboot; press 0 during boot menu countdown.
  3. At ServiceOS login:, type admin with no password.
  4. Run password reset, confirm, then reload.
  5. Boot, log in as admin/blank, immediately set a new password.

VLANs, routes, certificates - everything is preserved.

3.2 Factory Default - Zeroize (Tier 2) - Config Wiped

ServiceOS> erase zeroize
This will erase all configuration including certificates and management files.
Continue? (y/n): y

erase zeroize wipes startup config, certificates, RADIUS shared secrets, SSH keys, and management state. Firmware images are preserved. The switch reboots into ZTP-ready state with DHCP on the management interface and admin/blank credentials. This is the right command before redeployment - the wrong command if you only need a password reset.

3.3 USB Image Recovery (Tier 3) - Both Images Dead

ServiceOS> dir usb
ServiceOS> copy usb /ArubaOS-CX_6300_10_14_0001.swi primary
ServiceOS> boot

USB must be FAT32. The image is RSA-3072-verified before being written - a tampered file cannot revive a switch.

3.4 Checkpoint Rollback (Tier 4) - No Reboot

switch# checkpoint create pre-ospf-redesign
switch# checkpoint diff running-config pre-ospf-redesign
switch# checkpoint rollback pre-ospf-redesign

Time Machine for switch config. Integrates with NAE - every checkpoint creates a purple diamond marker on performance graphs for instant change correlation.

Key Points - Recovery

Pre-Reading Quiz - Recovery

9. A junior engineer locked everyone out by changing the admin password. The switch is in production and configuration must be preserved. Which recovery tier?

Tier 2: erase zeroize
Tier 1: ServiceOS password reset
Tier 3: USB image recovery
Reload from startup-config

10. How do you enter ServiceOS on an AOS-CX switch?

SSH to the switch and type service
Connect via HTTPS to /serviceos
Console at 9600/8-N-1, press 0 during boot menu countdown
Hold the reset button for 30 seconds

11. A switch is being redeployed at a different customer site. You need to wipe configuration AND certificates but keep the firmware images. Which command?

password reset
erase startup-config
erase zeroize from ServiceOS
checkpoint rollback factory

12. Both flash partitions failed signature verification after a power blip. The switch sits at a ServiceOS prompt. What is the recovery path?

Run password reset and reboot
FAT32 USB with .swi file, copy usb ... primary, then boot
RMA the switch immediately - no recovery is possible
checkpoint rollback running-config

Section 4: HPE7-A01 Exam Strategy

4.1 Exam Specifications

Exam codeHPE7-A01
FormatMultiple-choice + scenario/simulation
Questions / Time75 questions / 120 minutes
Passing score68% (~51/75 correct)
CostUSD 350 (USD 195 emerging markets)
Average pace96 seconds per question

4.2 Blueprint Weights Drive Study Time

DomainWeight60-hr plan120-hr plan
WLAN17%10.220.4
Switching14%8.416.8
Routing13%7.815.6
Security9%5.410.8
Auth & Authz8%4.89.6
Resiliency / Virt8%4.89.6
Connectivity8%4.89.6
Performance7%4.28.4
Mgmt & Monitoring6%3.67.2
Troubleshooting6%3.67.2
Network Stack4%2.44.8

4.3 The 60-30-10 Time-Management Rule

PhaseTimeGoal
Pass 1~72 min (60%)Answer every easy/medium. Flag & skip >2 min items.
Pass 2~36 min (30%)Tackle flagged hard questions one by one.
Pass 3~12 min (10%)Review flagged answers. Don't change without concrete reason.

4.4 Question-Attack Tactics

Key Points - Exam Strategy

Pre-Reading Quiz - Exam Strategy

13. The HPE7-A01 exam has 75 questions in 120 minutes. What is the average pace, and what time-management rule should you apply?

120 sec/question; spend equal time on each
96 sec/question; apply the 60-30-10 rule (60% easy, 30% flagged, 10% review)
60 sec/question; skip everything you don't know on first pass
240 sec/question; deep-dive every problem

14. WLAN carries the highest blueprint weight. What percentage, and what does that mean for your study plan on a 60-hour budget?

8% / about 5 hours
17% / about 10 hours
25% / about 15 hours
14% / about 8 hours

15. Which question-attack tactic is most reliable on the HPE7-A01 exam?

Always pick the longest answer
Pick options containing absolutes like "always" or "never"
Eliminate clearly wrong answers first; absolutes are usually wrong
Change all flagged answers in the final pass

Your Progress

Answer Explanations