The dwindling of Moore’s Law has been known for some time in the CPU design community. Even before this happened, so-called “Dennard Scaling” or frequency scaling plateaued around 2007 and there is no near-term solution for this:
Moore’s Law itself is limited by the heat problem, which was discussed in a pivotal 2012 paper “Dark Silicon and the End of Multicore Scaling”: https://www.cc.gatech.edu/~hadi/doc/paper/2012-toppicks-dark_silicon.pdf
Essentially the heat problem is starting to affect how much of the chip can stay powered on. This can’t be fixed by a better chip cooler. Even though integration allows putting more systems on the chip, increasingly large portions must stay powered off (dark silicon).
Instructions Per Clock (IPC) improvements which benefit single-thread performance are also dwindling, since designers have already exploited all the straightforward methods. There are a few remaining tricks such as data speculation. Data speculation differs from control speculation, which is currently used to predict a branch. In theory data speculation could provide an additional 2x performance on single-threaded code, but it would require significant added complexity. See “Limits of Instruction Level Parallelism with Data Speculation”: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.9196&rep=rep1&type=pdf
VLIW (Very Long Instruction Word) methods could give another 2x or so but would require new software and compilers. Intel’s unsuccessful Itanium was at attempt at this, but some researchers are still investigating the technique: https://millcomputing.com/
Going forward, performance improvements will come from heterogeneous processing features such as Quick Sync, AVX vector instructions, etc, which software must specifically access. This can be highly advantageous where it’s applicable. E.g, FCPX can export to H264 about 4x or 5x faster using Quick Sync than without it.
The GPU side is not quite as limited and there are significant future gains available from an architecture and fabrication standpoint. However that only helps if the highly parallel programming model can be leveraged. E.g, long-GOP encode/decode cannot be significantly accelerated via traditional GPU techniques. OTOH both nVidia and AMD have incorporated video encode/decode support using the NVENC and VCE APIs, but computers must have the specific card type and app developers write to the manufacturer-specific API.
Intel’s latest Xeon E7-8890 v3 has 18 cores and can do nearly 3 teraflops. However the heat problem limits the base frequency to 2.5Ghz and turbo to 3.3Ghz. In theory this could provide a single-socket 18-core Mac Pro, but the current price for this chip is $7,700 so Apple would need a big discount.