Mellanox ConnectX-3 MCX311A-XCAT 10GbE and Solaris 11.3 (WHL #42)
The heat. F*ck me.
I think post #42 deserves something nerdy. More nerdy than usual. Glad to report that I just bought something suitable – two Mellanox SFP+ cards (MCX311A-XCAT), capable of 10GbE over fibre (IEEE 802.3ae) and per DAC cables (not even sure which 802 that is – 802.3ak is Twinax as used in DAC, but thats quad-Twinax as used in CX-4 connectors, like Infiniband does). 55.00€ delivered from the Netherlands, as fast network connections are “Neuland” in Germany and therefore carry a heavy premium. Slap on some 1.5m of compatible FS.com DAC cable for another measly 17.02€ and we have 10 gigabit Ethernet between two desktop computers. Hooray!
tl;dr: (also known as “please Google, list this result for other folks looking for confirmation; and: SPOILER ALERT:) Mellanox ConnectX-3 cards do work in Windows, Linux and Solaris operating systems – firmware patching might be required. And a non-Solaris OS might be required to do that.
Except, we don’t. Well, not out of the box for any operating system. Which is the reason this became a blog post. But I need to dig deeper for that.
Quite a while ago somebody donated a set of two 20Gb/s single port and one 10Gb/s dual port Infiniband (CX-4) cards to me. Gullible me (youthful 29 at the time ) thought of 10GbE for free at the time, but it turns out, it’s not that easy with the quad-lane Infiniband architecture. First of all, there’s 8b10 encoding that is not included in the marketing figures. Boom, 20% of performance gone. Second, one would like to set up a IPoIB (IP over Infiniband) connection so that it’s behaving just some run-of-the-mill network connection like your GBase-T network card. Well, that can hurt performance badly. And third – one needs drivers for these cards. Which is why I see Mellanox as a Wan Hung Lo company. OK, I have to admit that these Infinihost 3 / ConnectX(-1) cards are pretty old and EOL for ages, which probably is the reason I got them basically for free. I don’t expect support for these at all, no improvements, no drivers for new operating systems. But I expect them to offer at least the last version of the driver when the device was still under support. Nope. They not only wiped their website clean of any drivers, they also asked Oracle to remove already implemented support in Solaris. Which, given the
shithole customer-friendly, non-profit company Oracle has proven to be, they probably did. I just googled to verify my bold claims and found firmware on the Mellanox page, which I didn’t find at the time. Could be my bad googling skills, could be the train wreck website. Not sure. Still, drivers are gone, and you’ll most likely not code your own.
So for Windows, one could use the OpenFabrics OFED drivers – which do work, but I never made it even close to 8 (or 16!) Gb/s. It’s more like a double or triple GBase-T, so 2-3 Gb/s tops. Still a vast improvement, but certainly below what Infiniband should be capable of. AFAR Linux worked out of the box, but performance wasn’t much better. So I never found out if that was just a driver-side thing or a limitation in IPoIB use. And Solaris..well, nope. So recently I decided to donate it to someone else, just like a lot of other stuff. My post office loves me.
10Gbase-T was a hot candidate as a replacement. Cheap cabling, working with anything else just fine, probably consumer-level boards available in a few months (TR4/2011 platforms “already” have single boards with 10T onboard). But when it comes to Solaris, there’s the same driver problem like with the EOL cards: Support? Nope, buy proper (not too) modern hardware! The Oracle Hardware Compatibility List is pretty…short. And I haven’t heard from anybody that got one of the “new” Aquantia AQC107-based cards (ASUS XG-C100C and lots of others, even Aquantia themselves wanted to release something and participate in the popularity of these cards) to work with Solaris. A few exotics aside, there’s only one choice, and that is Intel NICs. X540, X550. Don’t get onboard X552 ones, they are too new and not supported either. Thing is: One X540 costs 200€, one X550 even 300€. Two cards required, Cat6 cabling is basically free (in stark contrast to CX-4 or, to some extent, DAC and fibre). Furthermore, 10Gbase-T is much more power-hungry than any other technology, and one of the machines will run quite a few hours per day. So in the end I decided against RJ45 and in favour of SFP+ ports. Even though that will forever be an island solution and never interface to the entire rest of the network at home. On the other hand, this very connection never was intended to be part of the network. It was isolated from the beginning, having the file server as offline as possible. If the NSA wants to get in, they probably have to bitbang over the mains.
The ConnectX-3 cards were a bit of a gamble as well. As seen in the HCL linked above – there’s no entry for them. Same is true for the much more popular (but not that much cheaper) ConnectX-2 cards, which are now phased out, probably gaining the “never heard of that!” EOL status at Mellanox in the near future. Someone at Hardwareluxx forums confirmed that HP CN1100E cards will work out of the box, as they are pretty much identical to the Emulex OCe11102-F (vendor ID 19A2, device ID 0714) that ARE listed on the HCL. IBM 49Y7942 are probably the same. He even said that these work well in PCIe x4 slots (more on that in a second), so these should be a safe bet.
However, offers on eBay and the like are pretty sparse. So I went with the ConnectX-3 and a matching, non-Mellanox DAC cable purchased separately. I’m still unsure if I will buy full-size brackets for a tenner, or if I will let someone 3D-print one of these brackets on servethehome. Currently, they are sitting in the PCIe slot, hoping for nothing to jank on the cables, especially not during operation.
While Linux (Ubuntu 17.10 x64 that was on my multiboot stick) works out of the box, Windows does need drivers. They can be found on the Mellanox site, unless EOLmageddon has hit already (I have a copy on my filer, just ask if you cannot source it any longer). Install, reboot, and we’re live. Even easier than most Realtek Wan Hung Lo networking or audio crap. Solaris…didn’t work. Card was recognized properly:
pci bus 0x0001 cardnum 0x00 function 0x00: vendor 0x15b3 device 0x1003
Mellanox Technologies MT27500 Family [ConnectX-3]
However dmesg is talking about “mcxnex0 no port detected”, yeah, thanks. So I booted Ubuntu on that machine as well, just to see if there’s a problem with the PCIe lanes. I have to say that the ConnectX-3 are PCIe 3.0 devices, while the Solaris server runs on a Supermicro X8DTL board. PCIe 2.0 max. And only two x8 slots, which are now occupied by two IBM BR10i 8-port SAS storage controllers. The others are x4 – one at 2.0 speeds, one at 1.1. Going with 10GBase-T and the Intel cards, those would have been 3.0@x8 in a 2.0@x4 configuration, certainly not ideal and prone to issues. Same with the OCe11102 or ConnectX-2, which are x8 as well. The ConnectX-3 are 3.0@x4, so they “just” need to slow down at full link width.
When Ubuntu worked fine, I knew that it was not a compatibility issue in terms of hardware, it was a software problem. And when I fiddled around, slowly accepting that I messed up with buying the cheaper cards, I remembered that the Windows installer said the firmware was out of date, but the damn installer failed to patch it. Mellanox offers (easy to install!) firmware updater software for both Windows and Linux, so I updated both cards from 2.10.4290 to 2.43.5000 by executing
flint -d mt4099_pciconf0 -i fw-ConnectX3-rel-2_42_5000_MCX311A-XCA_Ax-FlexBoot-3.4.752.bin burn
Guess what? Doing that makes the card available in Solaris. dmesg is clean, card is available as net2. Now that’s pretty shitty, Mellanox and Oracle! I mean I had hardware before that wasn’t working with current drivers, but they all SAID that something is wrong and that I needed to patch firmware. Having to find that out by chance just before disassembly and reselling the entire set is abysmal.
That should answer this archived post over at Oracle forums – yes, it’s working until one of the parties decides that customers need better hardware…
So, this is the network page in the EXCELLENT napp-it system overview:
MTU 9000 is working great, Windows performance is…meh, occasional peaks to 900 MB/s, but more like 600 MB/s stable performance. SMB, so there’s that…limited by the performance of a single core. Geez, it’s 2018, boys.
But Ubuntu as a client?
Transfer level is OVER 9000!