Two new Mac Pros, two Thunderbolt 2 RAIDs, one Thunderbolt Bridge ….

Storage & Archiving

Two new Mac Pros, two Thunderbolt 2 RAIDs, one Thunderbolt Bridge ….

Posted by Neil Smith on January 4, 2014 at 8:58 am

Been testing Thunderbolt 2 Bridge networking between two new Mac Pros … a couple of 6 core nMPs with D500s arrived today and I upgraded them both to 64 GBs RAM each – so identical machines with max RAM.

Attached an ARECA Thunderbolt 2 8 x bay RAID 32 TBs to one – this unit is still in beta so expect the driver to be tweaked and these prelim speeds to go up … but still impressive speeds from eight spindles.

Attached a Pegasus 2 R8 8 x bay 24 TB to the other one:

Tested them both in DAS mode.

Then connected a Thunderbolt cable between the two nMPs and set up a fixed IP Thunderbolt 2 Bridge between them and then tested the Tbolt 2 RAID attached to the other machine.

Tested the Tbolt Bridge speed both with only one RAID at a time going and then with both of them going together.

Used Blackmagic Disk Speed test for all the testing …the DAS mode Tbolt 2 RAIDS came in around where I expected them but Tbolt Bridge testing was very inconsistent and throughput erratic and non consistent …. get the feeling that the SMB IP stack has not been optimized for Tbolt2 bridging or else I’m doing something totally wrong.

Here’s some screen grabs that illustrate the DAS mode speeds and the Tbolt 2 Bridge speeds … we’ll be demoing the two nMPS with Xsan and the Tbolt2 Bridging networking at our X Pro monthly meeting on Sat Jan 18th, if anyone wants to come along and see the topology in action – details on our website at https://www.lumforge.com

1) 6 core nMP in DAS mode attached to ARECA Thunderbolt 2 RAID:

2) 6 core nMP in DAS mode attached to Promise P2R8 RAID:

3) 6 core nMP attached through Thunderbolt2 Bridge to ARECA RAID – one BMD test only :

4) 6 core nMP attached through Thunderbolt2 Bridge to P2R8 RAID – one BMD test only:

5) both nMPs attached through Tbolt2 Bridge to other RAID – two BMD tests running at the same time:

But please note, the Thunderbolt2 Bridge dual speed tests were all over the place … anywhere from 40MB/s to 800 MB/s … the throughput rates never settled down … seemed like there was a quota of IP bandwidth available that was somehow parceled out to each nMP depending on disk caching or packet density … never achieved a steady state data flow.

Any explanations on reason for erratic double Thunderbolt Bridge transfer speeds? Was rather hoping that you could have two editors sharing projects from their own DAS Tbolt2 RAID with no SAN in the middle.

We’ll be demoing above configurations and announcing prices for our DAS, SAN and hybrid Thunderbolt2 solutions at the Jan 18th event if you want to place orders … ARECA Thunderbolt 2 RAIDs and new MAGMA Tbolt 2 expansion chassis will start shipping in Feb.

Cheers,
Neil

Neil Smith
CEO
LumaForge LLC
faster storage faster finishing
323-850-3550
http://www.lumaforge.com

Justin Hammons replied 11 years, 3 months ago 12 Members · 34 Replies
34 Replies

Bob Zelin
January 4, 2014 at 1:22 pm

we observed the eratic speeds with Thunderbolt 1 bridging. I was hoping that this would disappear with Thunderbolt 2. While I have not done a Tbolt2 to Tbolt 2 bridge network test, like you have, I was hoping to be able to provide a super low cost NAS solution for 2 – 3 editors that want to share, but without having consistant speeds, this is pointless. I look forward to doing my own tests, but thank you so much for publishing these results. While 10GbE is a low cost alternative to a fibre system, as you know, it’s STILL not cheap enough for some people, and I was hoping that Tbolt2 bridging was the answer. I guess, not yet !

Bob Zelin

Bob Zelin
Rescue 1, Inc.
maxavid@cfl.rr.com
Neil Smith
January 4, 2014 at 4:36 pm

Thanks for feedback, Bob … for straightforward data transfer between two new Mac Pros then Thunderbolt 2 Bridging is very useful … I transferred half a terabyte of files between the ARECA and P2R8 in under ten minutes which was a lot quicker than having to copy to a transfer drive and then copy again.

But for a couple of editors trying to work constantly off HD or 4K files in realtime it might be a pain even though we were getting 500 MB/s to 600 MB/s in both directions at the same time at peak moments – even had a moment of 800 MB/s in both directions at one point!

Am going to try with FPC X 10.1 running on both nMPs and see if it’s useable … if it is, you could at least have an editor and an assistant working on the same show with different Libraries.

What’s the root cause of the IP stack being so erratic and inconsistent? … maybe the SMB IP stack needs some collision detection code or ‘Jumbo Frames’ option written for it. The other thing I tested was putting the cables on the same and different Tbolt buses but that didn’t seem to make much difference … as you know there are six Tbolt 2 ports but only three buses … was wondering if the internal Tbolt2 switch was adding to the inconsistent data flow?

Will start testing the new MAGMA Tbolt2 to PCI expansion chassis next week with 10 GbE, 8 Gb/s FC and 6 Gb/s SAS cards to see what kind of throughput we get … at the moment we’re capped by the 800 MB/s limit of Tbolt1 but Tbolt2 should take that up to over 1200 MB/s.

Interesting times for sure in the Apple world … I love the quietness of nMPs … even with two of them on the desk right in font of you you can hardly hear them purr … but have to say, when you have six Thunderbolt cables plugged into the I/O ports it really is fiddly to take them in and out … and with the slightest bit of tension they pop out … lost a couple of renders yesterday when the RAIDs dismounted unintentionally.

Neil

Neil Smith
CEO
LumaForge LLC
faster storage faster finishing
323-850-3550
http://www.lumaforge.com
Alex Gerulaitis
January 4, 2014 at 8:13 pm

Neil,

Thanks so much for doing the tests – very interesting.

[Neil Smith] “Any explanations on reason for erratic double Thunderbolt Bridge transfer speeds?”

Don’t know but can only guess the bridging / device driver are not optimized for such high speeds. I’ve seen similar behavior with 10GbE links (speeds way under 10Gbs ceiling) and that was mostly attributable to NIC drivers and OS TCP/IP stack not optimized for sequential transfers.

Any chance of getting a little more granular speed measurements – perhaps with iPerf/jPerf?

Would be interesting to get bandwidth-over-time graphs similar to ones in this image, along with testing various packet sizes, and measure network performance separately from storage.

There’s a good chance this will pinpoint the bottleneck.

P.S. Clicking on the images in your post results in “404 Not Found” – wanted to see the top two in full glory and couldn’t. Any chance of fixing it… in post?
Neil Smith
January 4, 2014 at 10:23 pm

We can fix everything in post, Alex!

Will try and do some more detailed testing but I’m off to CES on Monday for the week and up to my neck in transcoding footage for UHDTVs … apparently 4K is the next “big thing”.

Just did a ‘quick and dirty test’ on running FCP X 10.1 on both nMPs with each one accessing the Tbolt 2 RAID on the other nMP playing 4K ProRes4444 files across the Thunderbridge link … either one would play well or the other would play well but not both at the same time … strange me thinks.

1) 2 x nMPs playing 4K ProRes4444 files over Thunderbridge link with 4K monitoring in background:

2) Close-up of single Thunderbolt cable connecting 2 x nMPs:

So I disconnected the two nMPS and put a Thunderbolt 2 MBP laptop in the middle and mounted both RAIDs … ran the BMD system test and got reasonable speeds to each RAID separately:

3) MBP Thunderbolt 2 laptop connected to 2 x nMPs:

N.B. the numbers show in the BMD tests varied greatly when testing Thunderbridge connectivity … pls don’t take them as gospel .. besides which, ARECA is busy tweaking their drivers based on real-life testing … but still think the real is issue is how Apple implemented the SMB IP stack in Mavericks.

4) MBP connected to ARECA Tbolt2 RAID through thunder bridge link to nMP:

5) MBP connected to PROMISE Tbolt2 RAID through ThunderLink bridge to nMP:

6) FCP X 10.1 playing 4K ProRes4444 files from both Tbolt2 RAIDs with dropped frames:

So, it looks like though there is plenty of bandwidth available in Tbolt 2 (theoretically 20 Gb/s) we’re not going to be able to utilize IP over Tbolt2 until some smart cookie comes up with some nifty software to optimize sustained throughput and reduce packet contention … over to Quantum, Alex.

Neil Smith
CEO
LumaForge LLC
faster storage faster finishing
323-850-3550
http://www.lumaforge.com
Alex Gerulaitis
January 5, 2014 at 1:40 am

Thanks Neil,

I still think isolating network performance from storage and measuring is separately might help zero in on the bottleneck.

Let me know if you’d like me to hop over and do some iPerf tests on your setup.
Neil Smith
January 5, 2014 at 3:10 pm

Yes, agree on importance of separating Tbolt 2 networking from Tbolt 2 drive performance to get a better understanding of where the real bottleneck is .. like you, I suspect that the underlying issue is in how Apple implemented the IP stack in Mavericks and maybe the internal Tbolt 2 bus switch in the nMPs.

And yes, be great to have you come over and put iPerf though its paces and see what we find …. I’m off to CES tomorrow for the week (much joy) so maybe the week after when I’m back you can come over to The Lot and we’ll roll up our sleeves and see what we can suss out.

Will also have a 16 x bay Tbolt 2 RAID to test by then which should saturate the Tbolt2 bandwidth even more so … it was still good to see that even on 8 bay arrays I was getting peak I/O of over 800 MB/s Read/Write using the BMD speed test between two 6 core nMPs … which means that if we can find a way to smooth out the IP traffic then utilizing IP over Thunderbolt Bridge will be a viable way to connect a small group of Tbolt2 editors together.

For basic file transfer between the two nMPs, Tbolt2 bridging works very well – transferred half a terabyte of 4K files from one RAID to the other in under ten minutes … but for editorial work where we need a consistent real-time playback off the timeline there still needs some optimization done.

See you in a week’s time, assuming I survive CES … it’s going to be interesting to see where all this 4K content we’re producing on these spiffing nMPs is going to end up … if 4K UHDTV delivery into the home takes off, then the consumer market for 4K content will be more significant than the DCI cinema 4K opportunity.

One’s thing for sure, if we do move to 4K workflows then the demand for storage and bandwidth is only going to grow rapidly.

Cheers,
Neil

Neil Smith
CEO
LumaForge LLC
faster storage faster finishing
323-850-3550
http://www.lumaforge.com
Chris Murphy
January 6, 2014 at 6:00 am

re: 2) 6 core nMP in DAS mode attached to Promise P2R8 RAID, 1077MB/s writes vs 817MB/s reads. I see the scren shots, but it seems like reversed numbers. Anyone have an explanation?

As for why IP over Thunderbolt is erratic, that’s a matter of how Apple’s implementing it. If they’re emulating ethernet or something like InfiniBand/RDMA in software it’s probably a huge CPU and memory hog. It’s a PCIe bus, so imagine taking a plain PCIe cable from computer to computer. First I’d kinda expect that to fry one or both logic boards, but aside from that, there’s no mechanism for them to communicate anything this way so that has to be coded somehow. That comes well before SMB, and if it is SMB being used, then it sounds like they’re emulating ethernet in software. Very expensive to do that. It’s not like these ethernet cards have junk on them, or generic purpose chips. They’re specialized and the 10gigE ones have heat sinks on some of those chips.

Can you repeat this bridge test and take a screen shot of Activity Monitor set to All Processes with the %CPU column clicked on? Or even better would be to open Terminal and use:

top -s10 -ocpu

Wait at least 10 seconds for it to update, the initial display is not sorted. Then take a screen shot. BTW I see the screen shots above but when I click on them I get a 404 error, so I can’t see them bigger than they are inline. Any ideas?
Chris Murphy
January 6, 2014 at 6:13 am

Maybe one of the network gurus knows whether Apple’s gigabit ethernet hardware uses TCP offload. In any case the physical and link layers have been dealt with in hardware for some time. In data centers much of TCP/IP is also crunched, or at least “pre-digested” by the ethernet card. Otherwise the (general purpose) CPU has to do a whole lot more work.

It’s a totally different implementation, but you can kinda see an ethernet card as the network equivalent of what a GPU does with graphics. If we didn’t have a GPU the CPU would have to do all of the graphics rendering and it would be very very costly if even today’s CPUs could keep up (they can’t). It’s not comparable in that the whole network stack in software with a general purpose CPU is doable, but my expectation is that this is just super expensive CPU wise, and why it’s erratic is because it’s subject to being pre-empted by the kernel for other time sensitive tasks vying for CPU time.
Neil Smith
January 6, 2014 at 9:20 am

I think you’re onto something, Chris and we’re getting closer to an explanation of the erratic behavior of ‘IP over Thunderbolt’ .. just found an insightful article by ILJITSCH VAN BEIJNUM on ‘Ars Technica’ written back in October 2013 where he highlights the “choppy” Tbolt Bridge throughput issue:

” …. The Thunderbolt network interface also indicates that it supports TCP segmentation offloading for both IPv4 and IPv6 (TSO4 and TSO6), but presumably, there’s no actual network hardware in the Thunderbolt interface that could perform this function. The idea behind TSO is that the network software creates one large packet or segment, and the networking hardware splits that packet into pieces that conform to the MTU limit. This allows gigabit-scale networks to operate without using excessive amounts of CPU time. What seems to be happening here is that the system maintains an outward appearance of using the standard MTU size so nothing unexpected happens, but then simply transmits the large TCP segment over Thunderbolt without bothering with the promised segmentation. ….”

Here’s the link to the full article – worth reading for the detailed analysis that Iljitsch provides on the inconsistent throughput of IP over Tbolt:

https://arstechnica.com/apple/2013/10/os-x-10-9-brings-fast-but-choppy-thunderbolt-networking/

Presumably the same issues that Iljitsch identified with Tbolt 1 bridge networking apply equally, if not more so, to Tbolt 2 bridging.

I’ll repeat the testing between the two nMPs and see what I can measure.

The other thing I was testing this evening was to try and set up a Compressor 4.1 distributed render farm using Thunderbolt bridging between a Tbolt 2 MBP and the two nMPs … didn’t have much success but if we could get it to work it would be a useful way to edit offline on a MBP with proxies and then connect to a nMP and utilize all the available CPU cores for online conform and deliverables.

Anyone else tried Compressor 4.1 over Tbolt bridging yet?

Neil

Neil Smith
CEO
LumaForge LLC
faster storage faster finishing
323-850-3550
http://www.lumaforge.com
Neil Smith
January 6, 2014 at 9:46 am

Apologies guys, just seen that Bob Z referenced the same Ars Technica article in an earlier thread below … should have read that before I posted.

Have sent an email to Iljitsch asking him if he’d care to join in our discussion.

Neil

Neil Smith
CEO
LumaForge LLC
faster storage faster finishing
323-850-3550
http://www.lumaforge.com

Page 1 of 4

1 2 … 4 →

Reply to this Discussion! Login or Sign Up

Creative Communities of the World Forums