Creative Communities of the World Forums

The peer to peer support community for media production professionals.

Activity Forums VEGAS Pro Vegas Pro 11 and OpenCL GPU-accelerated video processing

  • Steve Rhoden

    October 28, 2011 at 1:57 pm

    tsk…These GPU,CUDA,OpenCL @#x%*+ series,
    certainly twirls a non-Pro’s head…lol

    Steve Rhoden
    (Cow Leader)
    Film Maker
    Filmex Creative Media.
    1-876-832-4956
    https://filmex-creative-media.blogspot.com/

  • Dave Haynie

    October 29, 2011 at 6:40 pm

    It’s not all that difficult to understand… just a good bit to take in all at once. I know the whole story, which I now relate…

    Back in the early 1980s, a computer graphics display was basically just a chunk of visible memory. The CPU changed around some bytes in memory, you would see the changes on-screen.

    As graphics got more complex, folks started to notice that CPUs weren’t all that good at manipulating graphics in certain ways. First on expensive CAD workstations, then on the Amiga series of computers, and ultimately everywhere, graphics chips got some of their own processing. The first of these were 2D operations: line drawing in hardware, bit-blitting (manipulation of 2, 3, or 4 different graphics planes in various ways), etc. While no one yet called these “GPUs”, that’s what they would eventually be dubbed: Graphics Processing Units.

    Again first on workstations, such as those from Silicon Graphics Inc (SGI), designers started thinking about 3D graphics as well as 2D. In fact, it was in 1992 that SGI released a software function library called the Open Graphics Library (OpenGL). This allows a program to do 2D and 3D graphics operations without regard to the hardware. And it allows hardware to greatly accelerate these operations, making them many times faster than a CPU can do that same work.

    Curiously, most of the companies that did 2D well on the PC failed to make the transition to 3D. The first really successful 3D graphics chips in PCs were the VooDoo series, from a company called 3dfx. These were only for game play, they didn’t do really high quality graphics, and they didn’t have 2D features.

    There was a ton of competition for 3D, and the winners out of the 1990s are the same we know today: ATi (now part of AMD) and nVidia. In fact, it was nVidia that actually popularized the term “GPU”. And not coincidently, the complexity of GPUs has grown to rival that of CPUs, which does tend to limit the number of companies that can be successful at maintaining both performance and cost.

    Originally, Graphics Processing Units had a pretty fixed internal architecture, which mirrored the graphics pipelines used in 3D graphics APIs, as defined by both OpenGL and Microsoft’s more recent Direct3D. And while there is still a good bit of hardware in a modern GPU that’s dedicated to graphics, some of these steps have been increasingly replaced by programmable “stream processors”, computational elements that can be programmed to do a variety of different computations.

    This has lead to the idea of General Purpose GPU (GPGPU) computing. In short, if the GPU can be programmed to do 3D graphics 25x faster than my CPU, or decode MPEG-4 10x faster, why not use that power for other kinds of computing? Modern nVidia GPUs have up to 512 computing cores, modern AMD/ATi GPUs have over 1500 computing cores. This is fairly cheap processing power compared to the CPU, when it works well for the problem at-hand.

    And so we have used these. Early attempts at this were actually coded directly “to the metal” on a GPU. The problem, of course, is that ATi and nVidia update their GPUs one a year or so, and you don’t want to have to rewrite your applications each time. Both companies realized this fairly early on, and so they devised libraries that would allow a program to use a GPU without knowning every detail of that GPU.

    So nVidia launched a library called ompute Unified Device Architecture, or CUDA. This isn’t really just a library, it’s actually a whole methodology. You write a CUDA program using a specially designed C-compiler from nVidia. And as some here have seen, CUDA has evolved… there are two major revisions, 1.x and 2.x, which define some large scale features of the GPUs, but even things like how much of the C language you can use.

    ATi/AMD started out with programming framework called Stream, which included their own GPGPU system, dubbed “Close to Metal”.. sometimes applications for this just said they supported “Streams”. But pretty early on, they adpoted the Open Compute Language (OpenCL) as their interface. OpenCL is also supported by a custom C compiler, but it’s architecture-independent. OpenCL runs on AMD processors, it’s supported within nVidia’s CUDA interface, and it’s even supported on x86 processors via libraries from Intel, AMD, and IBM. OpenCL is managed by the not-for-profit Khronos Group.

    A third GPGPU interface is Microsoft’s Direct Compute API. This is included in any implementation of DirectX 11, but also runs on DirectX 10 era graphics processors. Microsoft doesn’t seem to have a great deal of support for this yet, but it’s another version of the same kind of thing: doing traditional CPU work much faster on a GPU.

    All of these aim to let programmers use the GPU for “general purpose” work — the kind of stuff you do with a CPU, rather than something specific to graphics. That’s a good idea, as I’ll illustrate below: GPUs are wicked harcore performers.

    A modern CPU is fast largely because it has multiple cores. My AMD 1090T has six CPU cores, and if you write you program to split a job up into six independent execution threads, it can actually go six times faster (competition for resources, like memory, may make it slower in practice, though with video rendering, we’re usually pretty close to maximum). An Intel i7 has up to six “hyperthreaded” cores… each core can appear as two separate CPUs, but it’s really one CPU with a double set of registers. With careful programming, the i7 can seem like 12 cores, but if things aren’t optimized, the hyperthreading gets even more complex. Looking at numbers, the peak performance of my AMD 1090T system is about 44 GFLOPS (billion floating point operations per second). The Core i7 980XE can deliver a peak of 109 GFLOPS.

    The Sony PS3 is fast in part because it has one dual-core PowerPC CPU, but six 128-bit Stream Processing Engines (SPEs…actually seven, but one is disabled to increase yield). Each SPE can crank out 25.6 GFLOPS, so that’s 153.6 GFLOPS (billion floating point operations per second)… not bad for a $200 game console (of course, it has a GPU that can go even faster, for GPU-related work). But actually taking advantage of these SPEs is an even more complex problem. The programmer doesn’t have to worry too much about scheduling work to each core on a PC’s CPU. But they have to very carefully schedule work between the PS3’s SPEs to maximize performance.

    GPUs are crazy fast, at least in theory. I’m back to using the AMD HD6970 in my system. This GPU is capable of a peak computation of 2.7 TFLOPS (trillion floating point operations per second)… yowza! But it’s using 1536 stream processors to deliver this performance. If you could figure out to use all of that, that’s almost 25x faster than the i7, over 17x the performance of a PS3, and a whopping 61x faster than my system’s CPU. The problem, of course, is tapping that performance.

    -Dave

  • Nigel O’neill

    October 31, 2011 at 7:50 am

    Great article. I was going to get a GTX590, but I don’t think the rest of my system will keep. I’ll settle for a GTX580 and save some $$ and avoid a PSU upgrade. Yay!

    My system specs: Intel i7 970, 12GB RAM, ASUS P6T, Vegas Pro 10e (x32/x64), Windows 7 x64 Ultimate, Vegas Production Assistant 1.0, VASST Ultimate S Pro 4.1, Neat Video Pro 2.6

We use anonymous cookies to give you the best experience we can.
Our Privacy policy | GDPR Policy