Worse preview perfomance after adding 2nd processor…

VEGAS Pro

Worse preview perfomance after adding 2nd processor…

Posted by Pinson Denis on August 7, 2012 at 1:13 am

Hi everyone,

my config :
HP XW6600
2 x 2.33 GHz quadcore
10Gb RAM
Quadro fx 570
SSD with windows 7
ESATA external drive in RAID 0 for video
Vegas pro 11

I’ve just added a 2nd processor to my computer and the preview in vegas 11 is more laggy than with 1 processor… I tried several tweaks : changing number of threads, the quatity of memory for RAM preview and even the memory needed by vegas (in the internal option tab)… but nothing helps… The usage of the CPU is quite low (40 – 60%) and the usage of RAM is around 5 Gb… Hopefully the rendering time is better !

I didn’t add a CUDA graphic card yet because on my other computer (same xw6600 with only one processor) a GTX570 didn’t add reel performance (it made crash vegas sometimes!)…

HELP….

—
Denis PINSON
Director and DOP for TV programs & Documentaries
http://www.archipelprod.com

Pinson Denis replied 13 years, 7 months ago 5 Members · 14 Replies
14 Replies

Steve Rhoden
August 7, 2012 at 3:33 am

If you mean preview playback, try by
unchecking Adjust size and quality for optimal playback ?
(right click on the preview screen for it)

Steve Rhoden
(Cow Leader)
Film Editor & Compositor.
Filmex Creative Media.
1-876-832-4956
Nigel O’neill
August 7, 2012 at 9:26 am

I am not sure if the Vegas code is optimised for or can take advantage of dual CPU systems

My system specs: Intel i7 970, 12GB RAM, ASUS P6T, Vegas Pro 10e (x32/x64), Windows 7 x64 Ultimate, Vegas Production Assistant 1.0, VASST Ultimate S Pro 4.1, Neat Video Pro 2.6
Pinson Denis
August 7, 2012 at 6:14 pm

I tried with this option checked and unchecked… Doesn’t change…

In fact, during playing the timeline, it seems a few frames at the end and at the beginning of each clip are freezing (FPS drops to 15 or 18fps and go back to 25 fps at the start of the next clip)… Inside the clip, it plays quite smooth at 25fps…
Would it come from the HDD access ? the files are native DSLR H264 with no effect on them…
When playing XDCam this problem doesn’t happen.

—
Denis PINSON
Director and DOP for TV programs & Documentaries
http://www.archipelprod.com
Steve Rhoden
August 7, 2012 at 7:34 pm

That’s a tough one!!!

Steve Rhoden
(Cow Leader)
Film Editor & Compositor.
Filmex Creative Media.
1-876-832-4956
Pinson Denis
August 7, 2012 at 8:38 pm

lol Thanks !

—
Denis PINSON
Director and DOP for TV programs & Documentaries
http://www.archipelprod.com
Pinson Denis
August 7, 2012 at 9:51 pm

Even in “draft quarter resolution” it continue to jag at the end of each clip…

I’m thinking that maybe it would come from the fact i updated my BIOS before adding the 2nd CPU…

Playing the same H264 clips in premiere is good…

ARGHHHH…..

—
Denis PINSON
Director and DOP for TV programs & Documentaries
http://www.archipelprod.com
Dave Haynie
August 11, 2012 at 3:58 pm

A couple of things… I take it this is an E5400 “Harpertown” Xeon, vintage 2007 or so. When you see a slowdown, look at what could be slower.

When you have a single processor, all cores more or less coordinate, via shared L2 (and L3, in newer CPUs) and very wide internal buses, to optimize memory access. And of course, in more recent systems (pretty much all AMDs in recent memory, and Intels starting with the “i” series), each CPU has two or three dedicated memory channels.

But back up a few years, and your second CPU is now totally sharing memory access with the first. This basically means that each core has half the bandwidth to main memory that it had, previously, when the system is busy and missing cache hits.

Only, it’s probably actually less than half. Modern memory systems are like muscle cars — very fast in a straight line, but trouble when you have to take that corner. DDR2 memories can actually track a couple of simultaneous streaming memory accesses at the same time, but I think it’s only two. You now have eight potentially different chunks of memory being hit all at the same time.

That’s not a problem if CPU is really the bottleneck. Give them enough processing to do, and you’ll see 2x the performance you had with just the one CPU. But hit memory too much, and it’s absolutely going to drop.

The only thing I’m confused on is why the HDSLR AVC (which is either AVCHD or Canon-style, which is higher bitrate but actually simpler to decode) would be worse than your performance on MXF/MPEG-2 at similar or higher bitrates, if memory performance is the real issue. But at the very least, it’s unlikely that Vegas would be optimized for this kind of system, since it’s both pretty high end, and old-fashioned (eg, current Xeons will use the i-series style on-chip memory controllers and high-speed point-to-point links between each other). Could simply be there’s more intermediate processing in large memory pools for AVC decode.

Try some performance monitoring. During editing and playback, how many cores do you see doing work, and what level are they hitting (percentage of maximum). Try telling Vegas to use fewer cores. Turn off hyperthreading, if this series hyperthreads — that’s yet another thing thrashing main memory, and also on-chip cache.

-Dave
Nigel O’neill
August 12, 2012 at 2:48 am

Dave, is it possible Vegas is not actually utilising the full potential of the 2nd CPU as the Vegas code is not optimised for it? We run itanium Superdomes at work and unless the code is optimised for the architecture, you don’t see the benefits.
Dave Haynie
August 12, 2012 at 6:31 am

[Nigel O'Neill] “Dave, is it possible Vegas is not actually utilising the full potential of the 2nd CPU as the Vegas code is not optimised for it?

Probably not in the way you’re thinking. Your machine is a pure SMP (symmetric multiprocessing) machine. Vegas only knows about the number of threads its allowed to use, it doesn’t really understand machine architecture beyond that — that’s up to the Windows scheduler.

The main problem here is that you doubled your memory load, but didn’t double your memory speed. This is the basic flaw with pure SMP systems, and why very large parallel systems go to NUMA (Non-Uniform Memory Architecture) or loosely-coupled/distributed computing (think of a bunch of CPU + memory modules interconnected with very high speed networks). In fact, that’s the basic architecture of the AMD processors going back quite a ways (to the original Opteron), and on Intel CPUs since the i-series was introduced.

If you had a new dual-socket system, you’d have separate memory for each processor, which would eliminate the contention you can, depending on the work being done, see on your system. Think of it this way: if every CPU core is running entirely out of cache, your system will go twice as fast as it did with the single CPU. If every CPU is hitting memory at the same time, it’ll go much slower than it would have otherwise.

So sure, it’s possible there are access patterns that would improve Vegas’ performance on your kind of system, and perhaps some that are better suited to more common media workstations. When you break up a job into multiple threads, there’s some degree of thinking about just how to break that up. For example, if you’re decoding video frames, do you pipeline things so each processor is always working on a full frame, or do you break each frame up into N chunks and feed them to each processor? In the former case, you’re going to spend more resources on memory, in the latter case, more of the CPU overhead will be spent on communications between the chips. And of course, for AVC or MPEG, you could every split things by GOP, but that starts to get big. Well, big-ish… I guess around 712MB of buffer, if you wanted to feed a GOP in HD to each CPU on an 8-core system.

And there’s another pretty obvious one I forgot to mention: memory. Not only do you have contention for the main memory bus, but of course, in a fully multithreaded application, you’re added a big chunk of memory use per CPU. Your mileage may vary, of course, by the job being done. There’s a floating recommendation around here of 2GB per core, minimum… not sure if that comes from Sony or not. I’ve seen about 7.5-8GB of memory in use on my six-core processor running Vegas renders, so obviously, not a hard and fast rule. But that’s another thing to double-check — look at your physical memory, and see if virtual memory is growing while doing the playback or anything else that’s slowed down.

Memory thrashing is a potential issue for every streaming application. In some kinds of work, there’s a small amount of data coming into the CPU and lots of computation. That’s fairly typical of a large number of server-type applications (certainly not all of them). Your system will do really well at this. On the other hand, think about video processing… you’re reading in fairly low bandwidth data (AVC at 21-44Mb/s, typically) but expanding this to uncompressed memory buffers, that’s about 8GB per minute per channel for 1080p24 video. And of course, for you to see a video display, each de-compressed chunk has to be composited into another buffer, and finally, displayed. Sure, there’s some streamlining there, but the basic idea is that you’re creating lots of data in memory, but not doing a great deal with it. So there’s going to be lots of main memory access. Most systems today have fewer cores and/or faster memory, so memory isn’t usually the bottleneck. But under the right circumstances, maybe. And that chance goes up as the CPU performance increases or memory performance decreases.

[Nigel O'Neill]
We run itanium Superdomes at work and unless the code is optimised for the architecture, you don’t see the benefits.”

The Superdome is a “ccNUMA” architecture system. That’s a NUMA system with an efficient structure to allow very large numbers of CPUs.. each Itanium has its own local memory, and it can access memory elsewhere in the system. But that access is far less efficient — it’s both lower speed, and subject to contention, particularly if you scale all the way up to 128 cores. If the application and OS treat such a system as a pure SMP system, the performance will be much lower than if processes can basically be scheduled between local CPUs and memory, using the shared cross-point and links primarily for communications.

-Dave
Nigel O’neill
August 12, 2012 at 12:31 pm

Wow!

Your response seriously blew me away! I don’t even think my tech guys at work could have given me a more comprehensive response. Not even the guys at Atomic write like that. Double Wow!

If you are a full-time video editor, I think you are in the wrong profession 🙂

I am in IT, but moved into middle management after the millennium. I seriously need to get technical again ;-).

Wow!

Just had to say it again!

Page 1 of 2

1 2 →

Reply to this Discussion! Login or Sign Up

Creative Communities of the World Forums