- ESX host stops when shutting down with the Emulex BladeEngine 10Gb Ethernet Controller be2net driver loaded
- ESX loses network connectivity with Emulex BladeEngine 10Gb Ethernet Controller be2net driver loaded
We followed the recommendation in these articles and updated the be2net driver to version 2.102.554.0. However, we still experienced hangs of the ESXi host and network outages whenever the host was rebooted or had its dvS-connections reconfigured.
These hangs were accompanied by VMKernel.log-messages like this one:
... vmkernel: 10:06:11:06.193 cpu0:4153)WARNING: CpuSched: 939: world 4153(helper11-0) did not yield PCPU 0 for 2993 msec, refCharge=5975 msec, coreCharge=6374 msec,
After opening a support call with VMware we finally found out that these problems were caused by improper handling of VLAN hardware offloading by the be2net driver, and that they only occur when you are using distributed virtual switches (dvS) like we did.
So, after configuring the blade hosts with virtual standard switches (vSS) the problem went away.
So, after configuring the blade hosts with virtual standard switches (vSS) the problem went away.
Since then we were waiting for a fixed be2net-driver (from Emulex) to be able to return to dvS. We really did not want to abandon this option because it offers some benefits (load based teaming of the physical uplinks and Network I/O Control) over the standard switch.
Today, the waiting finally ended. Emulex has finished the fixed driver, it is available here:
VMware ESX/ESXi 4.x Driver CD for Emulex OneConnect 10Gb Ethernet Controller
Update (18. Jul 2011): In the meantime VMware made two new KB articles available that reference the problems described here and the new driver:
Update (18. Jul 2011): In the meantime VMware made two new KB articles available that reference the problems described here and the new driver:
- KB1034748: Connecting or disconnecting the only virtual machine on a ESXi vDS using Emulex OneConnect NICs causes 100% host CPU and network connectivity loss
- KB2001858: Emulex OneConnect loses network connectivity with tagged traffic
In the latter one it is also recommended to update the NIC's firmware. The current one (as of today) is available at HP as a bootable ISO file. Thanks to makö for pointing this out in this post's comments.
Thanks makö, I updated my post accordingly.
ReplyDelete- Andreas
Ok, there seems to be a lot more to it than just that. This blog features some really interesting posts around all these Emulex issues:
ReplyDeletehttp://www.jmeow.com/
He also describes a vCenter Plugin for managing Emulex CNAs:
http://www.emulex.com/downloads/emulex/vmware/vsphere-41/management.html
Looks like HP released a fix.
ReplyDeletehttp://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c03005737