Chapter 5 — Link Aggregation, LACP, and VSX Multi-Chassis LAG
Learning Objectives
Configure static and LACP-based link aggregation groups (LAGs) on AOS-CX, including hash algorithm and member-port consistency considerations.
Explain the VSX architecture, including the Inter-Switch Link (ISL), the keepalive, primary/secondary roles, and active-active forwarding via active-gateway.
Configure VSX LAGs that span two physical switches so a downstream device sees a single logical aggregation.
Plan VSX upgrades using Live Software Upgrades (LSU/ISSU), including prerequisites, rollback considerations, and how VSX compares to traditional stacking and VSF.
A single uplink between two switches is a single point of failure and a single bottleneck. Modern campus and data center designs eliminate both problems by bonding multiple physical links into one logical pipe and by pairing two physical chassis so they appear as one logical forwarding entity. On Aruba AOS-CX, the building blocks for that design are Link Aggregation Groups (LAGs), the Link Aggregation Control Protocol (LACP), and Virtual Switching Extension (VSX).
Section 1 — LAG Fundamentals
Pre-Reading Check — LAG Fundamentals
1. Which AOS-CX command converts a static LAG to a dynamic (LACP) LAG?
interface lag 1 dynamic
lacp mode active
enable lacp
protocol lacp 802.3ad
2. What happens if both ends of a link are configured with LACP passive mode?
The LAG comes up but with reduced bandwidth
The LAG comes up only after a 30-second hold timer
The LAG never comes up because neither side initiates LACPDUs
The LAG falls back to static mode automatically
3. Two 10G members in a LAG. How much throughput can a single TCP flow achieve?
20G — bandwidth aggregates per flow
10G — a single flow rides one member due to consistent hashing
15G — frames load-balance approximately evenly
5G — half of aggregate to allow headroom
4. What problem does lacp fallback-static solve?
It allows mismatched MTU between members
It bypasses hashing for high-priority traffic
It lets one member forward traffic when the LACP partner is silent (e.g., a server during PXE boot)
It forces the LAG to use static mode permanently
Think of a LAG like a multi-lane highway built between two cities. One lane (a single physical link) might carry the load most days, but during rush hour, or when a lane is closed for maintenance, the highway needs more capacity and redundancy. A LAG bundles multiple physical interfaces into one logical highway: bandwidth scales (close to) linearly, and a single lane closure doesn't shut down the road.
Static vs LACP
AOS-CX supports two LAG flavors:
Static LAG — both ends are configured manually. There is no negotiation. If you wire it correctly and the configs match, traffic flows. If a member is misconfigured or the wrong port is patched, the switch happily forwards into a black hole.
Dynamic LAG (LACP, IEEE 802.3ad) — both ends exchange LACPDUs (Link Aggregation Control Protocol Data Units) to verify the partner, agree on which member ports are eligible, and detect mis-cabling before forwarding starts.
Analogy: A static LAG is like a handshake agreement between two contractors — fast, but if either side forgets, the work doesn't line up. LACP is the same handshake plus a signed contract that's re-checked every second.
Animation — LACP Negotiation Handshake
Active-Active LACP: both peers send LACPDUs, verify System IDs, then converge to forwarding state.
In AOS-CX, every LAG starts as static. You opt in to LACP by adding lacp mode active (or passive) under the LAG interface.
Static LAG configuration
switch# configure terminal
switch(config)# interface lag 1
switch(config-lag-if)# description "Static LAG to Peer"
switch(config-lag-if)# no shutdown
switch(config-lag-if)# exit
switch(config)# interface 1/1/1
switch(config-if)# lag 1
switch(config-if)# no shutdown
switch(config)# interface 1/1/2
switch(config-if)# lag 1
switch(config-if)# no shutdown
Convert to LACP
switch(config)# interface lag 20
switch(config-lag-if)# description "LACP LAG to Peer"
switch(config-lag-if)# lacp mode active
switch(config-lag-if)# lacp rate fast
switch(config-lag-if)# no shutdown
lacp rate fast shortens the LACPDU heartbeat from 30 s to 1 s — failure detection in roughly 3 s.
Only responds to LACPDUs. Will not bring up the LAG without an active partner.
The rule: at least one side must be active. Passive-on-passive is the silent-treatment failure mode.
LAG Hash Algorithms
The switch decides which member port carries each frame using a hash over packet fields (typically src/dst MAC, IP, and L4 ports). Consistency matters because reordering frames within a flow breaks TCP performance. AOS-CX uses an L2/L3/L4 hash by default. Two 10G members yield 20G aggregate but still 10G max per flow.
Member Port Consistency
Speed and duplex — cannot mix 1G and 10G members.
VLAN membership — all members carry the same trunk/access VLAN configuration.
MTU and storm control — applied at the LAG level, inherited by members.
L2 vs L3 mode — a routed LAG must have all members in routed mode.
Always configure VLAN/L3 settings on the LAG interface, never on individual members.
LACP Fallback-Static
From AOS-CX 10.02, lacp fallback-static lets one member port forward traffic if the LACP partner is unresponsive — the classic PXE-boot fix.
switch(config)# interface lag 30
switch(config-lag-if)# lacp mode active
switch(config-lag-if)# lacp fallback-static
Verification
switch# show interface lag 1
switch# show lacp interfaces
Every AOS-CX LAG starts static; lacp mode active|passive opts in to dynamic.
At least one side of an LACP LAG must be active.
Hashing keeps a single flow on a single member — bandwidth aggregates across flows, not within one.
Configure VLAN/L3 settings on the LAG interface, not member ports.
lacp fallback-static rescues PXE-boot scenarios where the server hasn't loaded its LACP driver yet.
Post-Reading Check — LAG Fundamentals
1. Which AOS-CX command converts a static LAG to a dynamic (LACP) LAG?
interface lag 1 dynamic
lacp mode active
enable lacp
protocol lacp 802.3ad
2. What happens if both ends of a link are configured with LACP passive mode?
The LAG comes up but with reduced bandwidth
The LAG comes up only after a 30-second hold timer
The LAG never comes up because neither side initiates LACPDUs
The LAG falls back to static mode automatically
3. Two 10G members in a LAG. How much throughput can a single TCP flow achieve?
20G — bandwidth aggregates per flow
10G — a single flow rides one member due to consistent hashing
15G — frames load-balance approximately evenly
5G — half of aggregate to allow headroom
4. What problem does lacp fallback-static solve?
It allows mismatched MTU between members
It bypasses hashing for high-priority traffic
It lets one member forward traffic when the LACP partner is silent (e.g., a server during PXE boot)
It forces the LAG to use static mode permanently
Section 2 — VSX Architecture
Pre-Reading Check — VSX Architecture
1. How many control planes does a VSX pair operate with?
One shared control plane elected from both peers
Two independent control planes that synchronize state
A single primary control plane; the secondary is passive
No control plane — VSX is purely data plane
2. What is the role of the VSX keepalive?
Carry user data when the ISL is down
Synchronize MAC tables between peers
Detect peer aliveness independently of the ISL to prevent split-brain
Provide management access to the secondary peer
3. Why does VSX active-gateway eliminate FHRP failover delay?
It elects a single primary that pre-warms its ARP cache
Both peers always answer for the same virtual IP/MAC simultaneously
It uses GARP every 10 ms
It is faster VRRP — sub-second hello timers
4. The ISL is configured as a multi-chassis LAG with which keyword?
vsx-isl
multi-chassis
peer-link
cluster-link
A LAG handles redundancy within one chassis. But what if the chassis fails? Stacking shares a single control plane (one bug, everyone falls down). VSX takes a different approach.
VSX (Virtual Switching Extension) clusters exactly two AOS-CX switches with independent control planes that synchronize state over a dedicated link. From a downstream device's perspective, the pair acts as one switch — but each peer runs its own copy of OSPF, BGP, STP, and management processes. If one peer reboots or hits a bug, the other keeps forwarding.
Analogy: Stacking is one brain controlling two bodies — efficient until the brain has a stroke. VSX is two pilots in a cockpit, each fully qualified, sharing notes constantly. If one passes out, the other already knows the plan.
Both peers forward simultaneously (active-active). The ISL syncs state; the keepalive (orange) confirms peer aliveness.
ISL — Inter-Switch Link
The ISL is a multi-chassis LAG between the two VSX peers carrying:
L2 control sync — MAC table, ARP, IGMP snooping
Configuration sync — anything tagged vsx-sync
Data fallback — traffic destined to a peer-only egress
Sizing rule: ISL bandwidth should equal the largest single VSX LAG capacity, or higher.
interface lag 100 multi-chassis
description ISL
no shutdown
no routing
vlan trunk native none
vlan trunk allowed all
lacp mode active
interface 1/1/4
lag 100
interface 1/1/5
lag 100
vsx
inter-switch-link lag 100
Keepalive
The keepalive is a dedicated L3 link (not the ISL) used to detect peer aliveness. If the ISL goes down, both peers consult the keepalive: if alive but ISL-disconnected, peers enter a defined recovery state. If both ISL and keepalive are down, each assumes the partner is dead.
vrf ka
interface 1/1/6
vrf attach ka
ip address 192.168.0.1/30
no shutdown
vsx
keepalive peer 192.168.0.2 source 192.168.0.1 vrf ka
Primary vs Secondary Role
vsx
system-mac 02:01:00:00:01:00
inter-switch-link lag 100
role primary
keepalive peer ...
Both peers actively forward data. The role matters for: configuration sync direction, tie-breaking, and LSU orchestration. A shared system-mac is required so downstream LACP partners see one System ID across both chassis.
Active-Gateway and Active-Forwarding
Both peers respond to the same virtual IP and MAC on a VLAN. Any host can route locally through whichever peer it reaches first — no failover delay because there is no failover.
vlan 10
name employee
interface vlan 10
description employee
vsx-sync active-gateways
ip address 172.17.0.2/24
active-gateway ip 172.17.0.1 mac 12:00:00:00:00:01
vsx-sync Configuration
Common vsx-sync targets
Why
VLANs
Both peers must know all VLANs traversing the ISL or VSX LAGs
Active-gateways
Both peers must answer the same gateway IP/MAC
ACLs
Symmetric forwarding requires symmetric policy
QoS classifiers and queues
Asymmetric QoS produces asymmetric latency
Route maps and prefix lists
Routing policy must match on both peers
Key Points
VSX = exactly two AOS-CX switches with independent control planes.
The ISL uses the multi-chassis keyword and carries L2/MAC/config sync + data fallback.
The keepalive rides a separate physical path (separate VRF or OOBM) — never co-bundled with the ISL.
A shared system-mac makes the pair appear as one LACP System ID to downstream devices.
active-gateway means both peers answer the same virtual IP/MAC — zero FHRP failover delay.
vsx-sync is a per-feature toggle pushing config from primary to secondary.
Post-Reading Check — VSX Architecture
1. How many control planes does a VSX pair operate with?
One shared control plane elected from both peers
Two independent control planes that synchronize state
A single primary control plane; the secondary is passive
No control plane — VSX is purely data plane
2. What is the role of the VSX keepalive?
Carry user data when the ISL is down
Synchronize MAC tables between peers
Detect peer aliveness independently of the ISL to prevent split-brain
Provide management access to the secondary peer
3. Why does VSX active-gateway eliminate FHRP failover delay?
It elects a single primary that pre-warms its ARP cache
Both peers always answer for the same virtual IP/MAC simultaneously
It uses GARP every 10 ms
It is faster VRRP — sub-second hello timers
4. The ISL is configured as a multi-chassis LAG with which keyword?
vsx-isl
multi-chassis
peer-link
cluster-link
Section 3 — VSX LAG Configuration
Pre-Reading Check — VSX LAG Configuration
1. When the ISL fails but the keepalive remains up, what happens?
Both peers shut down their VSX LAG members
The secondary disables its VSX LAG member ports; the primary continues forwarding alone
The primary reboots to recover the ISL
Traffic continues normally on both peers
2. What does linkup-delay-timer protect against?
Ports flapping under storm-control
A rebooted peer attracting traffic before it has synced state from its partner — black-holing it
The primary becoming secondary by mistake
LACP partners timing out during link bring-up
3. To configure a VSX LAG that spans both peers, you must use:
Different LAG IDs on each peer
The same LAG ID and the multi-chassis keyword on both peers
Static LAG only — LACP is incompatible with VSX
A reserved LAG number above 256
4. Why must the keepalive ride a different physical path than the ISL?
For QoS prioritization
To reduce LACPDU collisions
If they share fiber, a single fiber cut produces split-brain — defeating the design
VSX licensing requires it
A VSX LAG (also called MC-LAG, multi-chassis LAG) is a single logical LAG whose member ports are split across the two VSX peers. The downstream device believes it is talking to one switch with two links.
Analogy: Two phone lines from two different carriers, but a magic phone that lets you publish one number that rings on both.
Defining VSX Peers and ISL — Putting It All Together
A complete minimal VSX bring-up on the primary:
! 1. Define the ISL LAG
interface lag 100 multi-chassis
no shutdown
no routing
vlan trunk native none
vlan trunk allowed all
lacp mode active
interface 1/1/49
lag 100
interface 1/1/50
lag 100
! 2. Keepalive in its own VRF
vrf ka
interface 1/1/48
vrf attach ka
ip address 192.168.0.1/30
no shutdown
! 3. Bind under VSX
vsx
system-mac 02:01:00:00:01:00
inter-switch-link lag 100
role primary
keepalive peer 192.168.0.2 source 192.168.0.1 vrf ka
linkup-delay-timer 180
The linkup-delay-timer prevents a rebooted peer from forwarding on its VSX LAG members until it has fully synced state. Without it, the rebooted peer might attract traffic and black-hole it for 30+ seconds.
Multi-Chassis LAGs — Configuring a VSX LAG
Identical config on both peers, with one member port from each peer:
interface lag 10 multi-chassis
description "VSX LAG to ToR-1"
no shutdown
no routing
vlan trunk native 1
vlan trunk allowed 10,20,30
lacp mode active
Then, on each peer, add a local member:
! Primary
interface 1/1/1
description "ToR-1 link 1"
lag 10
! Secondary
interface 1/1/1
description "ToR-1 link 2"
lag 10
Split-Brain Prevention
ISL State
Keepalive State
Peer Behavior
Up
Up
Normal active-active forwarding
Down
Up
Secondary disables its VSX LAG member ports. Primary continues forwarding alone.
Up
Down
Warning logged; forwarding continues; admin should fix keepalive immediately
Down
Down
Each peer assumes partner is dead; both forward independently
Crucial: the keepalive must ride a different physical path from the ISL.
VSX State Machine
stateDiagram-v2
[*] --> Booting
Booting --> LinkupDelay: chassis powers on
LinkupDelay --> ActiveActive: timer expires; ISL up; KA up
ActiveActive --> ISLDown_KAUp: ISL fails
ActiveActive --> ISLUp_KADown: keepalive fails
ISLDown_KAUp --> SecondaryShutdownVSXLAGs: secondary disables VSX LAG members
SecondaryShutdownVSXLAGs --> ActiveActive: ISL restored
ISLUp_KADown --> ActiveActive: keepalive restored
ActiveActive --> SplitBrain: ISL and KA both fail
SplitBrain --> ActiveActive: both restored; renegotiate
ActiveActive --> [*]: graceful shutdown
Verification
switch# show vsx status
switch# show vsx brief
switch# show vsx config
switch# show vsx config keepalive
switch# show interface lag 10
switch# show lacp interfaces
switch# show lacp aggregates
A healthy show vsx status confirms ISL up, keepalive established, roles primary/secondary, configuration in-sync, and linkup-delay timer expired.
Key Points
VSX LAG = same LAG ID on both peers + multi-chassis keyword + one local member each.
linkup-delay-timer is mandatory — without it, a rebooted peer black-holes traffic.
If the ISL drops but keepalive is up, the secondary voluntarily shuts its VSX LAG member ports.
Keepalive on a separate physical path than the ISL — co-bundling defeats split-brain prevention.
Verify with show vsx status, show vsx config, show lacp interfaces.
Post-Reading Check — VSX LAG Configuration
1. When the ISL fails but the keepalive remains up, what happens?
Both peers shut down their VSX LAG members
The secondary disables its VSX LAG member ports; the primary continues forwarding alone
The primary reboots to recover the ISL
Traffic continues normally on both peers
2. What does linkup-delay-timer protect against?
Ports flapping under storm-control
A rebooted peer attracting traffic before it has synced state from its partner — black-holing it
The primary becoming secondary by mistake
LACP partners timing out during link bring-up
3. To configure a VSX LAG that spans both peers, you must use:
Different LAG IDs on each peer
The same LAG ID and the multi-chassis keyword on both peers
Static LAG only — LACP is incompatible with VSX
A reserved LAG number above 256
4. Why must the keepalive ride a different physical path than the ISL?
For QoS prioritization
To reduce LACPDU collisions
If they share fiber, a single fiber cut produces split-brain — defeating the design
VSX licensing requires it
Section 4 — VSX Lifecycle Operations
Pre-Reading Check — VSX Lifecycle Operations
1. During an LSU, which peer is upgraded first?
Primary, then secondary
Secondary, then primary
Both simultaneously, with traffic re-routing externally
Whichever has the most uptime
2. Which platform supports VSF but NOT VSX?
AOS-CX 8325
AOS-CX 8320
AOS-CX 6300
AOS-CX 6400
3. What is the maximum number of switches in a VSX cluster?
2
4
8
10
4. When replacing a failed VSX peer chassis, which is NOT required of the replacement?
Identical model
Same software version as the surviving peer
Same system-mac in VSX config
Identical serial number to the failed unit
Standing up a VSX pair is the easy part. The real test is upgrading it without taking the network down.
Live Software Upgrades (LSU / ISSU)
A single-command rolling upgrade of a VSX pair, run on the primary: