Study Guide: Chapter 5 — Link Aggregation, LACP, and VSX Multi-Chassis LAG

A single uplink between two switches is a single point of failure and a single bottleneck. Modern campus and data center designs eliminate both problems by bonding multiple physical links into one logical pipe and by pairing two physical chassis so they appear as one logical forwarding entity. On Aruba AOS-CX, the building blocks for that design are Link Aggregation Groups (LAGs), the Link Aggregation Control Protocol (LACP), and Virtual Switching Extension (VSX).

Section 1 — LAG Fundamentals

Pre-Reading Check — LAG Fundamentals

1. Which AOS-CX command converts a static LAG to a dynamic (LACP) LAG?

interface lag 1 dynamic

lacp mode active

enable lacp

protocol lacp 802.3ad

2. What happens if both ends of a link are configured with LACP passive mode?

The LAG comes up but with reduced bandwidth

The LAG comes up only after a 30-second hold timer

The LAG never comes up because neither side initiates LACPDUs

The LAG falls back to static mode automatically

3. Two 10G members in a LAG. How much throughput can a single TCP flow achieve?

20G — bandwidth aggregates per flow

10G — a single flow rides one member due to consistent hashing

15G — frames load-balance approximately evenly

5G — half of aggregate to allow headroom

4. What problem does lacp fallback-static solve?

It allows mismatched MTU between members

It bypasses hashing for high-priority traffic

It lets one member forward traffic when the LACP partner is silent (e.g., a server during PXE boot)

It forces the LAG to use static mode permanently

Think of a LAG like a multi-lane highway built between two cities. One lane (a single physical link) might carry the load most days, but during rush hour, or when a lane is closed for maintenance, the highway needs more capacity and redundancy. A LAG bundles multiple physical interfaces into one logical highway: bandwidth scales (close to) linearly, and a single lane closure doesn't shut down the road.

Static vs LACP

AOS-CX supports two LAG flavors:

Static LAG — both ends are configured manually. There is no negotiation. If you wire it correctly and the configs match, traffic flows. If a member is misconfigured or the wrong port is patched, the switch happily forwards into a black hole.
Dynamic LAG (LACP, IEEE 802.3ad) — both ends exchange LACPDUs (Link Aggregation Control Protocol Data Units) to verify the partner, agree on which member ports are eligible, and detect mis-cabling before forwarding starts.

Analogy: A static LAG is like a handshake agreement between two contractors — fast, but if either side forgets, the work doesn't line up. LACP is the same handshake plus a signed contract that's re-checked every second.

In AOS-CX, every LAG starts as static. You opt in to LACP by adding lacp mode active (or passive) under the LAG interface.

Static LAG configuration

switch# configure terminal
switch(config)# interface lag 1
switch(config-lag-if)# description "Static LAG to Peer"
switch(config-lag-if)# no shutdown
switch(config-lag-if)# exit
switch(config)# interface 1/1/1
switch(config-if)# lag 1
switch(config-if)# no shutdown
switch(config)# interface 1/1/2
switch(config-if)# lag 1
switch(config-if)# no shutdown

Convert to LACP

switch(config)# interface lag 20
switch(config-lag-if)# description "LACP LAG to Peer"
switch(config-lag-if)# lacp mode active
switch(config-lag-if)# lacp rate fast
switch(config-lag-if)# no shutdown

lacp rate fast shortens the LACPDU heartbeat from 30 s to 1 s — failure detection in roughly 3 s.

LACP Active vs Passive

Mode	Behavior
Active	Sends LACPDUs continuously. Initiates negotiation.
Passive	Only responds to LACPDUs. Will not bring up the LAG without an active partner.

The rule: at least one side must be active. Passive-on-passive is the silent-treatment failure mode.

LAG Hash Algorithms

The switch decides which member port carries each frame using a hash over packet fields (typically src/dst MAC, IP, and L4 ports). Consistency matters because reordering frames within a flow breaks TCP performance. AOS-CX uses an L2/L3/L4 hash by default. Two 10G members yield 20G aggregate but still 10G max per flow.

Member Port Consistency

Speed and duplex — cannot mix 1G and 10G members.
VLAN membership — all members carry the same trunk/access VLAN configuration.
MTU and storm control — applied at the LAG level, inherited by members.
L2 vs L3 mode — a routed LAG must have all members in routed mode.

Always configure VLAN/L3 settings on the LAG interface, never on individual members.

LACP Fallback-Static

From AOS-CX 10.02, lacp fallback-static lets one member port forward traffic if the LACP partner is unresponsive — the classic PXE-boot fix.

switch(config)# interface lag 30
switch(config-lag-if)# lacp mode active
switch(config-lag-if)# lacp fallback-static

Verification

switch# show interface lag 1
switch# show lacp interfaces

Healthy member shows flags: ALFNCD — Active, Long-timeout, Fast/slow, In-sync, Collecting, Distributing.

Post-Reading Check — LAG Fundamentals

1. Which AOS-CX command converts a static LAG to a dynamic (LACP) LAG?

interface lag 1 dynamic

lacp mode active

enable lacp

protocol lacp 802.3ad

2. What happens if both ends of a link are configured with LACP passive mode?

The LAG comes up but with reduced bandwidth

The LAG comes up only after a 30-second hold timer

The LAG never comes up because neither side initiates LACPDUs

The LAG falls back to static mode automatically

3. Two 10G members in a LAG. How much throughput can a single TCP flow achieve?

20G — bandwidth aggregates per flow

10G — a single flow rides one member due to consistent hashing

15G — frames load-balance approximately evenly

5G — half of aggregate to allow headroom

4. What problem does lacp fallback-static solve?

It allows mismatched MTU between members

It bypasses hashing for high-priority traffic

It lets one member forward traffic when the LACP partner is silent (e.g., a server during PXE boot)

It forces the LAG to use static mode permanently

Section 2 — VSX Architecture

Pre-Reading Check — VSX Architecture

1. How many control planes does a VSX pair operate with?

One shared control plane elected from both peers

Two independent control planes that synchronize state

A single primary control plane; the secondary is passive

No control plane — VSX is purely data plane

2. What is the role of the VSX keepalive?

Carry user data when the ISL is down

Synchronize MAC tables between peers

Detect peer aliveness independently of the ISL to prevent split-brain

Provide management access to the secondary peer

3. Why does VSX active-gateway eliminate FHRP failover delay?

It elects a single primary that pre-warms its ARP cache

Both peers always answer for the same virtual IP/MAC simultaneously

It uses GARP every 10 ms

It is faster VRRP — sub-second hello timers

4. The ISL is configured as a multi-chassis LAG with which keyword?

vsx-isl

multi-chassis

peer-link

cluster-link

A LAG handles redundancy within one chassis. But what if the chassis fails? Stacking shares a single control plane (one bug, everyone falls down). VSX takes a different approach.

VSX (Virtual Switching Extension) clusters exactly two AOS-CX switches with independent control planes that synchronize state over a dedicated link. From a downstream device's perspective, the pair acts as one switch — but each peer runs its own copy of OSPF, BGP, STP, and management processes. If one peer reboots or hits a bug, the other keeps forwarding.

Analogy: Stacking is one brain controlling two bodies — efficient until the brain has a stroke. VSX is two pilots in a cockpit, each fully qualified, sharing notes constantly. If one passes out, the other already knows the plan.

ISL — Inter-Switch Link

The ISL is a multi-chassis LAG between the two VSX peers carrying:

L2 control sync — MAC table, ARP, IGMP snooping
Configuration sync — anything tagged vsx-sync
Data fallback — traffic destined to a peer-only egress

Sizing rule: ISL bandwidth should equal the largest single VSX LAG capacity, or higher.

interface lag 100 multi-chassis
   description ISL
   no shutdown
   no routing
   vlan trunk native none
   vlan trunk allowed all
   lacp mode active

interface 1/1/4
   lag 100
interface 1/1/5
   lag 100

vsx
   inter-switch-link lag 100

Keepalive

The keepalive is a dedicated L3 link (not the ISL) used to detect peer aliveness. If the ISL goes down, both peers consult the keepalive: if alive but ISL-disconnected, peers enter a defined recovery state. If both ISL and keepalive are down, each assumes the partner is dead.

vrf ka

interface 1/1/6
   vrf attach ka
   ip address 192.168.0.1/30
   no shutdown

vsx
   keepalive peer 192.168.0.2 source 192.168.0.1 vrf ka

Primary vs Secondary Role

vsx
   system-mac 02:01:00:00:01:00
   inter-switch-link lag 100
   role primary
   keepalive peer ...

Both peers actively forward data. The role matters for: configuration sync direction, tie-breaking, and LSU orchestration. A shared system-mac is required so downstream LACP partners see one System ID across both chassis.

Active-Gateway and Active-Forwarding

Both peers respond to the same virtual IP and MAC on a VLAN. Any host can route locally through whichever peer it reaches first — no failover delay because there is no failover.

vlan 10
   name employee

interface vlan 10
   description employee
   vsx-sync active-gateways
   ip address 172.17.0.2/24
   active-gateway ip 172.17.0.1 mac 12:00:00:00:00:01

vsx-sync Configuration

Common vsx-sync targets	Why
VLANs	Both peers must know all VLANs traversing the ISL or VSX LAGs
Active-gateways	Both peers must answer the same gateway IP/MAC
ACLs	Symmetric forwarding requires symmetric policy
QoS classifiers and queues	Asymmetric QoS produces asymmetric latency
Route maps and prefix lists	Routing policy must match on both peers

Post-Reading Check — VSX Architecture

1. How many control planes does a VSX pair operate with?

One shared control plane elected from both peers

Two independent control planes that synchronize state

A single primary control plane; the secondary is passive

No control plane — VSX is purely data plane

2. What is the role of the VSX keepalive?

Carry user data when the ISL is down

Synchronize MAC tables between peers

Detect peer aliveness independently of the ISL to prevent split-brain

Provide management access to the secondary peer

3. Why does VSX active-gateway eliminate FHRP failover delay?

It elects a single primary that pre-warms its ARP cache

Both peers always answer for the same virtual IP/MAC simultaneously

It uses GARP every 10 ms

It is faster VRRP — sub-second hello timers

4. The ISL is configured as a multi-chassis LAG with which keyword?

vsx-isl

multi-chassis

peer-link

cluster-link

Section 3 — VSX LAG Configuration

Pre-Reading Check — VSX LAG Configuration

1. When the ISL fails but the keepalive remains up, what happens?

Both peers shut down their VSX LAG members

The secondary disables its VSX LAG member ports; the primary continues forwarding alone

The primary reboots to recover the ISL

Traffic continues normally on both peers

2. What does linkup-delay-timer protect against?

Ports flapping under storm-control

A rebooted peer attracting traffic before it has synced state from its partner — black-holing it

The primary becoming secondary by mistake

LACP partners timing out during link bring-up

3. To configure a VSX LAG that spans both peers, you must use:

Different LAG IDs on each peer

The same LAG ID and the multi-chassis keyword on both peers

Static LAG only — LACP is incompatible with VSX

A reserved LAG number above 256

4. Why must the keepalive ride a different physical path than the ISL?

For QoS prioritization

To reduce LACPDU collisions

If they share fiber, a single fiber cut produces split-brain — defeating the design

VSX licensing requires it

A VSX LAG (also called MC-LAG, multi-chassis LAG) is a single logical LAG whose member ports are split across the two VSX peers. The downstream device believes it is talking to one switch with two links.

Analogy: Two phone lines from two different carriers, but a magic phone that lets you publish one number that rings on both.

Defining VSX Peers and ISL — Putting It All Together

A complete minimal VSX bring-up on the primary:

! 1. Define the ISL LAG
interface lag 100 multi-chassis
   no shutdown
   no routing
   vlan trunk native none
   vlan trunk allowed all
   lacp mode active

interface 1/1/49
   lag 100
interface 1/1/50
   lag 100

! 2. Keepalive in its own VRF
vrf ka
interface 1/1/48
   vrf attach ka
   ip address 192.168.0.1/30
   no shutdown

! 3. Bind under VSX
vsx
   system-mac 02:01:00:00:01:00
   inter-switch-link lag 100
   role primary
   keepalive peer 192.168.0.2 source 192.168.0.1 vrf ka
   linkup-delay-timer 180

The linkup-delay-timer prevents a rebooted peer from forwarding on its VSX LAG members until it has fully synced state. Without it, the rebooted peer might attract traffic and black-hole it for 30+ seconds.

Multi-Chassis LAGs — Configuring a VSX LAG

Identical config on both peers, with one member port from each peer:

interface lag 10 multi-chassis
   description "VSX LAG to ToR-1"
   no shutdown
   no routing
   vlan trunk native 1
   vlan trunk allowed 10,20,30
   lacp mode active

Then, on each peer, add a local member:

! Primary
interface 1/1/1
   description "ToR-1 link 1"
   lag 10

! Secondary
interface 1/1/1
   description "ToR-1 link 2"
   lag 10

Split-Brain Prevention

ISL State	Keepalive State	Peer Behavior
Up	Up	Normal active-active forwarding
Down	Up	Secondary disables its VSX LAG member ports. Primary continues forwarding alone.
Up	Down	Warning logged; forwarding continues; admin should fix keepalive immediately
Down	Down	Each peer assumes partner is dead; both forward independently

Crucial: the keepalive must ride a different physical path from the ISL.

VSX State Machine

stateDiagram-v2
    [*] --> Booting
    Booting --> LinkupDelay: chassis powers on
    LinkupDelay --> ActiveActive: timer expires; ISL up; KA up
    ActiveActive --> ISLDown_KAUp: ISL fails
    ActiveActive --> ISLUp_KADown: keepalive fails
    ISLDown_KAUp --> SecondaryShutdownVSXLAGs: secondary disables VSX LAG members
    SecondaryShutdownVSXLAGs --> ActiveActive: ISL restored
    ISLUp_KADown --> ActiveActive: keepalive restored
    ActiveActive --> SplitBrain: ISL and KA both fail
    SplitBrain --> ActiveActive: both restored; renegotiate
    ActiveActive --> [*]: graceful shutdown

Verification

switch# show vsx status
switch# show vsx brief
switch# show vsx config
switch# show vsx config keepalive
switch# show interface lag 10
switch# show lacp interfaces
switch# show lacp aggregates

A healthy show vsx status confirms ISL up, keepalive established, roles primary/secondary, configuration in-sync, and linkup-delay timer expired.

Key Points

VSX LAG = same LAG ID on both peers + multi-chassis keyword + one local member each.
linkup-delay-timer is mandatory — without it, a rebooted peer black-holes traffic.
If the ISL drops but keepalive is up, the secondary voluntarily shuts its VSX LAG member ports.
Keepalive on a separate physical path than the ISL — co-bundling defeats split-brain prevention.
Verify with show vsx status, show vsx config, show lacp interfaces.

Post-Reading Check — VSX LAG Configuration

1. When the ISL fails but the keepalive remains up, what happens?

Both peers shut down their VSX LAG members

The secondary disables its VSX LAG member ports; the primary continues forwarding alone

The primary reboots to recover the ISL

Traffic continues normally on both peers

2. What does linkup-delay-timer protect against?

Ports flapping under storm-control

A rebooted peer attracting traffic before it has synced state from its partner — black-holing it

The primary becoming secondary by mistake

LACP partners timing out during link bring-up

3. To configure a VSX LAG that spans both peers, you must use:

Different LAG IDs on each peer

The same LAG ID and the multi-chassis keyword on both peers

Static LAG only — LACP is incompatible with VSX

A reserved LAG number above 256

4. Why must the keepalive ride a different physical path than the ISL?

For QoS prioritization

To reduce LACPDU collisions

If they share fiber, a single fiber cut produces split-brain — defeating the design

VSX licensing requires it

Section 4 — VSX Lifecycle Operations

Standing up a VSX pair is the easy part. The real test is upgrading it without taking the network down.

Live Software Upgrades (LSU / ISSU)

A single-command rolling upgrade of a VSX pair, run on the primary:

switch# vsx update-software tftp://10.1.1.5/CX-10.07.swi secondary vrf mgmt

The flow:

Image staging — secondary downloads to its secondary partition; primary's running code is untouched.
Secondary reboots to the new image. Primary handles all traffic via VSX LAG redistribution and ISL (~1–3 min window).
Secondary rejoins with the new code, takes over forwarding.
Primary upgrades and reboots. Newly upgraded secondary now handles traffic.
Primary rejoins with the new code. Roles return to original.

Pre-flight Checklist

Check	Command	Why
Both peers in sync	`show vsx status`	LSU refuses to start on an unhealthy pair
Sufficient disk space	`show images`	Secondary partition must have room
Image accessible	Test TFTP/SCP from both peers	Both peers must reach the image server
Linkup-delay-timer set	`show vsx config`	Prevents black-holing on rejoin
Backup config	`copy running-config tftp://...`	Standard change control

LSU is hardware-specific. Virtual AOS-CX (OVA distribution) cannot be live-upgraded — it requires deploying a new VM and migrating config.

VSX Restore and Recovery

Replace the chassis (identical model required).
Restore configuration from backup.
Reconnect ISL and keepalive cables before powering on, if possible.
The healthy peer detects the new chassis via keepalive and ISL.
vsx-sync pushes synchronized configuration to the new peer automatically.
Verify with show vsx status and confirm both peers are in-sync before declaring complete.

VSX vs Stacking vs VSF — Comparison

Aspect	Traditional Stacking	VSF	VSX
Control planes	One shared	One shared (master elected)	Two independent, synchronized
Member count	Up to 8	2–10	Exactly 2
Failure domain	Stack master fault risks all	Stack master fault risks all	Peer fault contained
Topology	Chain or ring	Chain or ring	Point-to-point (ISL + KA)
Use case	Access layer	Access/aggregation	Core, aggregation, DC
MC-LAG	LAG within stack	LAG within stack	True MC-LAG, active-active
Upgrade impact	Full reboot	Rolling on some platforms	LSU — minimal impact
8320 / 8325	N/A	Not supported	Supported (only option)
6300	N/A	Supported (only option)	Not supported
6400	N/A	Supported	Supported
Mgmt	Single IP	Single IP	Two IPs (one per peer)

The trade-off: VSF is simpler to manage; VSX is more resilient.

Best Practices Summary

Always use LACP between switches and to dual-homed servers — silent miswires are the #1 cause of VSX LAG outages.
Always set linkup-delay-timer — without it, a rebooted peer can black-hole traffic for 30+ seconds.
Keepalive on a different physical path than the ISL — same bundle = shared single point of failure.
vsx-sync everything — VLANs, active-gateways, ACLs.
Match models exactly — VSX requires identical hardware. A 8325 cannot peer with a 8320.
Pre-stage images before LSU.
Test failover in lab — pull the ISL, pull the keepalive, kill a peer.
Document the role mapping — knowing which physical chassis is "primary" matters at 3 AM.

Chapter 5 — Link Aggregation, LACP, and VSX Multi-Chassis LAG

Learning Objectives

Section 1 — LAG Fundamentals

Static vs LACP

Static LAG configuration

Convert to LACP

LACP Active vs Passive

LAG Hash Algorithms

Member Port Consistency

LACP Fallback-Static

Verification

Key Points

Section 2 — VSX Architecture

ISL — Inter-Switch Link

Keepalive

Primary vs Secondary Role

Active-Gateway and Active-Forwarding

vsx-sync Configuration

Key Points

Section 3 — VSX LAG Configuration

Defining VSX Peers and ISL — Putting It All Together

Multi-Chassis LAGs — Configuring a VSX LAG

Split-Brain Prevention

VSX State Machine

Verification

Key Points

Section 4 — VSX Lifecycle Operations

Live Software Upgrades (LSU / ISSU)

Pre-flight Checklist

VSX Restore and Recovery

VSX vs Stacking vs VSF — Comparison

Best Practices Summary

Key Points

Your Progress

Answer Explanations