AOS 8 – Mobility Conductor – Layer 2 Redundancy VRRP Troubleshooting

In a previous post I outlined how to configure L2 Mobility Conductor (MCR) redundancy. 

In this post I will breakdown how Layer 2 VRRP works and how to troubleshoot common issues when it doesn’t work.  This post will be very handy for anyone taking the ACMX v8 exam.

There are a few useful commands when troubleshooting Conductor Redundancy and VRRP.  To validate the configuration and status of VRRP, run “show vrrp <vrrp instance #>”.  Here is the output from my lab:

If you don’t remember what VRRP instance number you used in your configuration, run “show vrrp” and you will see the output for any VRRP instance running on that controller. The MCR should only have one VRRP instance running so you will get the same output as the previous command.

The show command output gives you the current Admin State, Virtual Router State, priority configured, etc. 

Invalid VRRP Password:

One common issue is fat fingering the VRRP password.  If you want to validate the password, you can run “encrypt disable” in configuration mode.  This command allows passwords and keys to be displayed in plain text.  Just remember to run “encrypt enable” when you are finished troubleshooting.

As you can see in the output above, the auth data password is now shown in plain text.  To demonstrate the symptoms of having an incorrect password set I’ll modify the VRRP password on MM02:

Now if I check the VRRP status I see that MM02 thinks it is master now:

Unfortunately, MM01 thinks it is Master as well:

In the output we can see that the Auth data passwords don’t match but there are a few other commands that we can run to validate the VRRP status as well:

The first command to use is “show vrrp <vrrp instance #> statistics” . This command provides a number of useful stats on the vrrp instance:

In the output you can see that there are a number of Authentication failures. 

Another great place to look is the MCR error log.  Run the command “show log errorlog <number of entries>”

The errorlog quickly points out that there is an authentication data mismatch.

The controller system log can also be parsed to watch VRRP transition between states by running the command “show log system <number of entries> | include VRRP”

Invalid VMWare NIC/vSwitch configuration:

If you are using virtual MCR appliances another failure scenario is the misconfiguration of the VNICs. As I stated in a previous post, the vSwith or vNIC port group used by the appliances must have Promiscuous Mode = Accept and Forged Transmits = Accept.  This is what it looks like in my lab using a standalone ESXi host. Please consult with your virtualization team on what this should look like in your environment. 

If I set both of those settings to reject (which is default behavior) you will notice that VRRP breaks. Both MCR appliances will say they are master:

This looks like the behavior we saw earlier when the passwords did not match but as you can see in the output the passwords do match now. The behavior difference when the VM NIC settings are incorrect is both MCR appliances will show that they are sending out advertisements but are not receiving any:

Once you have corrected the VM NIC settings, VRRP should become operational again.  The expected behavior is the MCR appliance running as Master should send advertisements and the Backup will receive the advertisements:

Most of the VRRP issues can be resolved using the process above. If you really want to dig deep into the VRRP process on a controller you can enable enhanced logging for the VRRP process:

logging system process vrrp subcat all level debugging

logging system process vrrp level debugging

To view the log run the command: “show log system <number of entries> | include VRRP”

Once you have finished troubleshooting be sure to disable the additional logging:

no logging system process vrrp subcat all

no logging system process vrrp

I hope this breakdown of the possible L2 redundancy failures is helpful.  

Leave a comment