-
All fibre channel SAN users – read this if your SAN starts acting strange.
Hi
We’ve been running SanMP for about 17 months now, and this week had the first relatively large problem. This problem worsened at the same time as I upgraded SanMP from V1.5 to V1.6 to get the sync working again, so I’m not quite sure wether this version had anything to do with the problem.
After upgrading the software, some random machines would unmount SAN volumes. And you know when you unmount a media drive while FCP is running, it crashes… This machine could be rebooted by itself. Normally after this, our one specific edit suite, Edit 3 would normally also drop a SAN volume. When we rebooted Edit 3, EVERY TIME the whole SAN would crash. Dead – all drives unmount. 4 FCP suites and 3 FCP suites. All clients storm out and act all upset. Boo hoo. Stress. Lots of it…
This gradually got worse. I uninstalled sanMP V1.6 and went down to V1.5. Problem worsened, but seems to be isolated to edit 3. The ADTX drive array’s software did not show any issues – all volumes and drives functioning normally.
Eventually I figured out the problem is on edit 3 – running dual fibre links to the qlogic switch. Our Qlogic fibre switch’s performance monitor showed me the two fibre channels were not running symmetrically. MMMmmmmm.
So I pulled out the one fibre cable. Problem gone. Swopped cables, LC converters and ports on the Apple (LSI) fibre channel card, positively isolated the problem to be the one port on the apple fibre card.
So- if your san starts acting strange – watch the fibre ports’ data throughput carefully – it can show you where the problem is.
Now I must just pick up the courage to try sanMP V1.6 again…
Luckly we did not lose any data this time, or damage any volumes. We have in the past had some damaged SanMP volumes, which we could not mount with write access anymore. Eventually had to copy all data off, re-format and copy all data back to solve it. This has happened twice in the past 17 months, but it could have been edit 3’s faulty fibre port that caused it…
Regards
Francois