How Much Video Processing Performance Boost Do the Latest PC Processors Deliver?
Now that Intel has launched Ivy Bridge-based CPUs that triple the core count of early Nehalem-based workstations from four to 12, can video editors expect significant performance gains? In this article we'll assess the performance gain from the 12-core HP Z800 to the 24-core HP Z820 with respect to both editing and streaming encoding.
Back in 2009, when HP shipped new workstations powered by the Nehalem line of CPUs, the performance boost was so significant that they instantly rendered obsolete workstations based upon previous architectures. Now that Intel has launched Ivy Bridge-based CPUs that triple the core count of early Nehalem-based workstations from four to 12, can we expect similar performance gains? That’s what I’ll explore in this article, which compares the performance of a 12-core (24 with HTT) HP Z800 against a 24/48 core HP Z820 in both editing and streaming encoding functions.
The Z800, which I reviewed in EventDV in 2010, incorporated two 3.33GHz X5680 Xeon processors, with 24GB of RAM running the 64-bit version of Windows 7. The graphics card was an NVIDIA Quadro FX 4800 with 1.5GB of dedicated memory and access to 3.5GB more system memory.
The Z820 (Figure 1, below) includes two 2.7 GHz E5-2697E CPUs, with 64 GB of RAM also running Windows 7. Graphics is supplied by an NVIDIA Quadro K5000 with 4 GB of video RAM. By virtue of its updated architecture, the Z820 also enjoys a faster system bus than the Z800 (8 GT compared to 6.4GT), faster memory (1866 MHz compared to 1333 MHz) and one additional memory channel (3 vs. 4), all contributing to a greater maximum memory throughput of 59.7 GB/sec (Gigabypres per second) compared to 32 GB/sec for the Z800.
Figure 1. The Z820 hasn’t changed much on the outside.
Though there are some minor hardware differences inside the box, HP did not retool the enclosure for its latest workstation generation. Put the Z800 next to the Z820 and obscure the product name, and only the most eagle-eyed observer will be able to tell them apart.
Setting Expectations
All of my tests were rendering tests, and the key question was how much faster the Z820 would perform than the Z800, which I measured as the percentage reduction in rendering time. If the Z800 took 10 minutes to render a project, and the Z820 took five minutes, the Z820 cut rendering time by 50% (10 min-5 min/10). How much of a performance boost is reasonable to expect?
Let’s start with simple theory. Since the Z820 has twice as many cores as the Z800, performing the same work in half the time sounds reasonable, making the 50% number seem attainable. However, since the CPUs on the Z800 were about 20% faster (3.33 GHz compared to 2.7 GHz), each core should operate about 20% faster, cutting the 50% down to about 40%. However, on tasks that involve lots of data, like the RED and 4K projects, the faster memory bandwidth of the Z820 should also pay some performance dividends.
So, theory would suggest that the Z820 should perform between 40-60% faster, depending upon the tasks. And this is actually a pretty decent starting point. However, keep in mind that just because there are 48 cores doesn’t mean that all tasks are efficiently split over those 48 cores. As an example, Figure 2 (below) shows the Performance tab of the Windows Task Manager on the Z820 while encoding a single file using the VP6 codec in the Adobe Media Encoder. You can see this view on any Windows computer via the three-finger salute (Ctrl-Alt-Delete), choosing Windows Task Manager, and clicking the Performance tab.
Figure 2. This view of Windows Task Manager makes Intel engineers cry.
Why is CPU utilization so low? Because the VP6 codec is licensed from On2 (or what formerly was On2) and it’s always been highly inefficient from a multiprocessing perspective, meaning that it doesn’t make efficient use of additional CPU when available. That’s largely because VP6 was developed before multicore computers were widely available, and was put out to pasture before it made sense to update the code to take advantage of multiple cores.
None of my tests involved outputting to VP6. The high-level point is that multicore efficiency varies from program to program, and even task to task within a program. If a task is particularly inefficient from a multicore perspective, the faster clock speed on the Z800’s CPUs (3.3 Ghz) would be a bigger advantage than the extra cores on the Z820 with the slower CPU speed (2.7 Ghz). In addition, even when a program does effectively split operation over multiple cores, this involves some overhead and management, which poaches resources away from the rendering or other operation taking place.
For all these reasons, it’s not surprising when a particular program, or function within a program, doesn’t come close to harvesting the theoretical performance benefits the additional cores would seem to make available. This is particularly so with applications such as Premiere Pro, which uses a range of third-party codecs to work with DV, HDV, AVCHD, and the alphabet soup of other codecs presented by the various input formats. Since a program can never be faster than its slowest operation, if these codecs are inefficiently written, they can slow the entire operation.
OK, now that our expectations are set, looks move on to our tests.