Search This Blog

Wednesday, September 11, 2013

VLAN failsafe problem for F5 HA new builds

The F5 load balancers are powerful devices that support variety of high availability features. A full list  and description can be found at Configuring High Availability in TMOS Management Guide for BIG-IP Systems" document.

Supported HA features:

System fail-safe
Monitors the switch board component and a set of key system services.
Gateway fail-safe
Monitors traffic between the BIG-IP system and a gateway router.
VLAN fail-safe
Monitors traffic on a VLAN.
Problem

Both F5s in HA cluster fail over and go into standby mode when fail-safe is enabled on the VLAN that doesn't see any traffic.

Analysis and workaround descriptions

When you are building a new HA cluster this is not going to cause any major issues. Usually for new builds the cluster will be build without any servers behind the LTM devices. If both devices go into standby mode it may be surprising but a simple ping from one F5 to another should bring both them into standby/active again.

Unfortunately the issue can be experienced as well as in production when you are adding a new VLAN to a running HA cluster. If the VLAN has fail-safe enabled and if there are not devices behind both F5 LTMs the new VLAN may trigger the fail over on both nodes. As expected it is caused by the VLAN fail-safe when there is not traffic on the VLAN.

There are couple workarounds that can be applied, two examples are listed below.

Workaround 1

As per the SOL13297: Overview of VLAN failsafe (10.x - 11.x) set true to LTM data base variable failover.vlanfailsafe.resettimeronanyframe.
 
modify /sys db failover.vlanfailsafe.resettimeronanyframe value [true|false]

Workaround 2

Create a pool with all self IPs of the affected VLAN to allow the LTM to detect traffic on the new VLAN and to prevent fail-safe to kick in.

No comments:

Post a Comment