Study Guide: Troubleshooting, Upgrade Workflows, and Exam Strategy

When a user reports "the network is broken," the OSI model is your universal triage tool. Bottom-up starts at L1 (cheap to check, every higher layer assumes the one below is healthy). Top-down is faster when the symptom is clearly application-level. Divide-and-conquer picks L3/L4 first ("ping the gateway?") and narrows from there.

1.1 OSI Layer Quick Reference

Layer	Typical Symptoms	First-Look AOS-CX Commands
L1 Physical	Link down, CRC errors, intermittent flaps	`show interface`, `show interface transceiver`
L2 Data Link	VLAN mismatch, MAC flapping, STP topology change	`show vlan`, `show mac-address-table`, `show spanning-tree`
L3 Network	No route, OSPF down, wrong next hop	`show ip route`, `show ip ospf neighbor`
L4 Transport	TCP resets, blocked ports, ACL drops	`show access-list hitcounts`
L5-L7	DNS, AAA, certificate failures	`show aaa authentication`, RADIUS test, ClearPass logs

Animation: OSI-Layer Troubleshooting Decision Tree

Figure 10.1 - Walking the OSI Stack from L1 Up

1.2 Transceiver DOM & Mirror Sessions

AOS-CX exposes Digital Optical Monitoring (DOM) on every supported transceiver: temperature, voltage, bias current, TX power, RX power. Healthy 10G multimode optics run TX/RX between -1 dBm and -7 dBm; drifting toward -10 dBm signals a dirty connector or failing laser. Remember: more negative dBm = weaker signal.

switch(config)# mirror session 1
switch(config-mirror-1)# source interface 1/1/10 both
switch(config-mirror-1)# destination interface 1/1/48
switch(config-mirror-1)# no shutdown

Sources can be rx, tx, both, or an entire VLAN. Remote mirror (ERSPAN-like) wraps mirrored frames in GRE for distant analyzers. Never mirror a 10G port to a 1G destination - the destination tail-drops and your capture is incomplete.

Key Points - Troubleshooting

Bottom-up for L1 symptoms; top-down for app-only symptoms; divide-and-conquer otherwise.
DOM RX/TX between -1 and -7 dBm = healthy 10G multimode; -10 dBm = trouble.
Always run show tech before poking the problem - it gives TAC a baseline.
CRC errors = cabling or optic; runts = duplex mismatch; broadcast storms = L2 loop.

Pre-Reading Quiz - Structured Troubleshooting

1. A user reports "I can't reach the file server, but I can browse the web." What is the most efficient troubleshooting approach?

Bottom-up - start with cabling and optics

Top-down - the symptom is application-specific

Reload the switch immediately

Run erase zeroize and rebuild

2. A 10G fiber link shows RX power of -11 dBm and intermittent CRC errors. What does this most likely indicate?

Normal operating range, no action needed

A duplex mismatch on the partner switch

Weak signal - dirty connector, bent fiber, or failing laser

An OSPF adjacency issue

3. You need to capture wireless client traffic from an Aruba AP into Wireshark. Which AOS-CX feature is the appropriate tool?

SNMP trap forwarding

Mirror session with the AP as source and a probe port as destination

SFP DOM polling

show tech redirected to TFTP

4. Before escalating a complex issue to HPE TAC, which command produces the comprehensive diagnostic bundle they will request?

show running-config

show events

show tech

show version

Section 2: Image Management and Upgrades

AOS-CX uses a dual-partition firmware architecture: every switch has primary and secondary flash partitions, each capable of holding a complete image. One is active; the other is the parachute. The cardinal rule: always upload new firmware to the non-active partition.

2.1 Image Verification & Boot Order

Every AOS-CX image is signed with RSA-3072 / SHA-256. Signatures are checked twice: at download (rejected if invalid) and at every boot (drops to ServiceOS if invalid). There is no flag to bypass.

switch# copy sftp://admin@10.0.0.5/ArubaOS-CX_6300_10_14_0001.swi secondary vrf mgmt
switch# verify signature flash secondary
switch(config)# boot system secondary
switch# copy running-config startup-config
switch# boot system flash secondary

Animation: Primary/Secondary Boot & Fallback

Figure 10.2 - Switch Boots Primary, Signature Fails, Falls Back to Secondary

2.2 The Eight-Step Traditional Upgrade

Step	Action	Command
1	Identify active partition	`show version`, `show images`
2	Transfer image to inactive partition	`copy sftp://... secondary vrf mgmt`
3	Verify signature	`verify signature flash secondary`
4	Set boot partition	`boot system secondary`
5	Save config	`copy running-config startup-config`
6	Reboot	`boot system flash secondary`
7	Validate	`show version`, `show ip ospf neighbor`
8	Sync partitions	`copy primary secondary`

2.3 Unsafe Updates & ServiceOS

Some releases include bootloader, PoE controller, or PHY firmware updates. Interrupting them can permanently brick the component, so they are disabled by default. Enable a 30-minute window with allow-unsafe-updates 30 and never remove power during one - the switch may reboot multiple times.

ServiceOS is a recovery OS reached by pressing 0 during the boot menu countdown over a 9600/8-N-1 console. From it you can boot, password reset, erase zeroize, or copy usb to recover a switch.

2.4 VSX Live Software Upgrade

For redundant pairs (CX 8100/8300/8325/8400), VSX Live Upgrade orchestrates a hitless transition. Headline number: 12-19 ms of measurable impact during the active/standby switchover.

Animation: VSX Live Upgrade Timeline

Figure 10.3 - VSX Live Upgrade: Standby First, Role Swap, Active Second

Platform	Hitless Mechanism	Min AOS-CX
CX 6400 chassis	ISSU (redundant mgmt + line-card hot patch)	10.10
CX 8300/8325/8400 (VSX pair)	VSX Live Upgrade (LACP drain)	10.06+
CX 6300 stack (VSF)	Enhanced Software Upgrade (ESU)	10.11
CX 6300 stack (VSF)	Hitless ISSU (no conductor reboot)	10.13

Key Points - Image Management

Always stage new image to the non-active partition.
RSA-3072 / SHA-256 signatures verified at download and at every boot - no bypass.
Eight-step upgrade ends with copy primary secondary sync.
VSX Live Upgrade = 12-19 ms; ISSU on CX 6400 and Hitless ISSU on CX 6300 VSF (10.13+) are zero-impact.
allow-unsafe-updates 30 opens a 30-min window for bootloader/PoE/PHY updates - never pull power.

Pre-Reading Quiz - Image Management

5. A switch is currently running from the primary partition. You need to upgrade firmware. Where should you stage the new image?

Primary partition (overwrite the running image)

Secondary partition (the non-active one)

USB drive only

Either - it makes no difference

6. What cryptographic algorithm signs every AOS-CX image, and when is it verified?

MD5 - verified only at download

SHA-1 - verified once at install

RSA-3072 / SHA-256 - verified at download AND every boot

AES-256 - verified by ServiceOS only

7. During a VSX Live Upgrade on a CX 8325 pair, what is the approximate measurable traffic impact?

Zero impact - completely hitless

12-19 milliseconds during the LACP role swap

2-3 seconds during reboot

30+ seconds while OSPF reconverges

8. A release notes entry warns the upgrade includes a PoE controller firmware update. Which configuration is required and what must you avoid?

allow-unsafe-updates 30 - never remove power during the upgrade

boot system primary force - reboot twice quickly

No special config - PoE updates are always safe

erase zeroize first to clear PoE state

Section 3: Recovery Procedures

AOS-CX exposes four escalating recovery tiers. Picking the right one is the difference between a 5-minute fix and a wiped switch.

flowchart TD Start([Switch in trouble]) --> Q1{What is broken?} Q1 -->|Forgot admin password| Tier1[Tier 1: Password Recovery
ServiceOS: password reset
config preserved] Q1 -->|Bad config / change
switch boots OK| Tier4[Tier 4: Checkpoint Rollback
checkpoint rollback name
no reboot required] Q1 -->|Re-deploying / wipe needed| Tier2[Tier 2: Factory Reset
ServiceOS: erase zeroize
config + certs wiped] Q1 -->|Both images corrupt| Tier3[Tier 3: USB Recovery
FAT32 USB with .swi
ServiceOS: copy usb ... primary]

3.1 Password Recovery (Tier 1) - Config Preserved

Console cable at 9600/8-N-1.
Reboot; press 0 during boot menu countdown.
At ServiceOS login:, type admin with no password.
Run password reset, confirm, then reload.
Boot, log in as admin/blank, immediately set a new password.

VLANs, routes, certificates - everything is preserved.

3.2 Factory Default - Zeroize (Tier 2) - Config Wiped

ServiceOS> erase zeroize
This will erase all configuration including certificates and management files.
Continue? (y/n): y

erase zeroize wipes startup config, certificates, RADIUS shared secrets, SSH keys, and management state. Firmware images are preserved. The switch reboots into ZTP-ready state with DHCP on the management interface and admin/blank credentials. This is the right command before redeployment - the wrong command if you only need a password reset.

3.3 USB Image Recovery (Tier 3) - Both Images Dead

ServiceOS> dir usb
ServiceOS> copy usb /ArubaOS-CX_6300_10_14_0001.swi primary
ServiceOS> boot

USB must be FAT32. The image is RSA-3072-verified before being written - a tampered file cannot revive a switch.

3.4 Checkpoint Rollback (Tier 4) - No Reboot

switch# checkpoint create pre-ospf-redesign
switch# checkpoint diff running-config pre-ospf-redesign
switch# checkpoint rollback pre-ospf-redesign

Time Machine for switch config. Integrates with NAE - every checkpoint creates a purple diamond marker on performance graphs for instant change correlation.

Key Points - Recovery

Password reset = config preserved; zeroize = config wiped, images kept.
ServiceOS access: console 9600/8-N-1, press 0 during boot menu countdown.
USB recovery requires FAT32 + valid RSA-3072 signature.
Checkpoint rollback is reversible without a reboot.

Pre-Reading Quiz - Recovery

9. A junior engineer locked everyone out by changing the admin password. The switch is in production and configuration must be preserved. Which recovery tier?

Tier 2: erase zeroize

Tier 1: ServiceOS password reset

Tier 3: USB image recovery

Reload from startup-config

10. How do you enter ServiceOS on an AOS-CX switch?

SSH to the switch and type service

Connect via HTTPS to /serviceos

Console at 9600/8-N-1, press 0 during boot menu countdown

Hold the reset button for 30 seconds

11. A switch is being redeployed at a different customer site. You need to wipe configuration AND certificates but keep the firmware images. Which command?

password reset

erase startup-config

erase zeroize from ServiceOS

checkpoint rollback factory

12. Both flash partitions failed signature verification after a power blip. The switch sits at a ServiceOS prompt. What is the recovery path?

Run password reset and reboot

FAT32 USB with .swi file, copy usb ... primary, then boot

RMA the switch immediately - no recovery is possible

checkpoint rollback running-config

Section 4: HPE7-A01 Exam Strategy

4.1 Exam Specifications

Exam code	HPE7-A01
Format	Multiple-choice + scenario/simulation
Questions / Time	75 questions / 120 minutes
Passing score	68% (~51/75 correct)
Cost	USD 350 (USD 195 emerging markets)
Average pace	96 seconds per question

4.2 Blueprint Weights Drive Study Time

Domain	Weight	60-hr plan	120-hr plan
WLAN	17%	10.2	20.4
Switching	14%	8.4	16.8
Routing	13%	7.8	15.6
Security	9%	5.4	10.8
Auth & Authz	8%	4.8	9.6
Resiliency / Virt	8%	4.8	9.6
Connectivity	8%	4.8	9.6
Performance	7%	4.2	8.4
Mgmt & Monitoring	6%	3.6	7.2
Troubleshooting	6%	3.6	7.2
Network Stack	4%	2.4	4.8

4.3 The 60-30-10 Time-Management Rule

Phase	Time	Goal
Pass 1	~72 min (60%)	Answer every easy/medium. Flag & skip >2 min items.
Pass 2	~36 min (30%)	Tackle flagged hard questions one by one.
Pass 3	~12 min (10%)	Review flagged answers. Don't change without concrete reason.

4.4 Question-Attack Tactics

Eliminate, don't pick. Cross out two clearly-wrong answers first.
Watch for absolutes. "Always," "never," "only" usually wrong; "typically" usually right.
Read the last sentence first in long scenarios - it tells you what's actually being asked.
For sims, run the simplest verification first - show vsx status, show ip ospf neighbor.

Key Points - Exam Strategy

75 Q / 120 min = 96 seconds average per question; pass = 68%.
Allocate study hours proportional to blueprint weights (WLAN 17% gets the most).
60-30-10 rule: 60% easy/medium first pass, 30% flagged hard, 10% review.
Day -1 is logistics, not learning. Confirm Pearson VUE check-in, ID, quiet room.

Pre-Reading Quiz - Exam Strategy

13. The HPE7-A01 exam has 75 questions in 120 minutes. What is the average pace, and what time-management rule should you apply?

120 sec/question; spend equal time on each

96 sec/question; apply the 60-30-10 rule (60% easy, 30% flagged, 10% review)

60 sec/question; skip everything you don't know on first pass

240 sec/question; deep-dive every problem

14. WLAN carries the highest blueprint weight. What percentage, and what does that mean for your study plan on a 60-hour budget?

8% / about 5 hours

17% / about 10 hours

25% / about 15 hours

14% / about 8 hours

15. Which question-attack tactic is most reliable on the HPE7-A01 exam?

Always pick the longest answer

Pick options containing absolutes like "always" or "never"

Eliminate clearly wrong answers first; absolutes are usually wrong

Change all flagged answers in the final pass

Chapter 10: Troubleshooting, Upgrade Workflows, and Exam Strategy

Learning Objectives

Section 1: Structured Troubleshooting