Today NVIDIA is unveiling its flagship single-GPU graphics card that's squarely aimed toward the well-heeled enthusiast. Readers with a penchant for graphics will have seen the numerous leaks springing up in the preceding two weeks, with NVIDIA dutifully filling as many as it can, but now is the time to set the record straight with an in-depth review of the Kepler-based GeForce GTX 680 2GB GPU.
Back in 2010, NVIDIA's publicly-known GPU trajectory, as divulged by CEO Jen-Hsun Huang, informed us that the Fermi GPU architecture was to be succeeded by Kepler in late-2011, which in turn is set to be replaced by Maxwell in late-2013, with the primary focus one of increasing the performance-per-watt metric through a combination of die shrinks and general optimisations.
Fermi, if you recall, debuted in the consumer space in the form of the now-maligned GeForce GTX 480 in March 2010. Aiming to be all things to every type of user, NVIDIA's strategy of one-fits-all for consumer, workstation and professional markets meant that the big-die GPU, a hybrid of sorts, traded pure gaming performance for a forward-looking architecture. Worse still, the innate complexities of bringing-up such an overarching GPU left NVIDIA with a manufacturing and marketing headache.
Perhaps the first incarnation of consumer Fermi came too early; it seemed as if sales and marketing won the battle against engineering, resulting in the release of a half-baked product. With time being the greatest teacher, six months' later NVIDIA cleaned up Fermi and effectively re-released it as the GeForce GTX 580. Higher clocks and a complete architecture made it what Fermi should have been in the first place, and even today GTX 580 continues to offer reasonable value at the readjusted £315 price point.
Crysis 2, 30.33 FPS: DirectX 11 Ultra Upgrade installed, high-resolution textures enabled, Extreme detail level.
Deus Ex: Human Revolution, 46.20 FPS: Highest possible settings, tessellation enabled, FXAA High enabled.
Just Cause 2, 46.60 FPS: Maximum settings, CUDA water enabled, 4xMSAA, 16xAF.
Left 4 Dead 2, 126.10 FPS: Maximum settings, 4xMSAA, 16xAF.
Mafia 2, 51.35 FPS: Maximum settings, PhysX Medium enabled, AA enabled, AF enabled.
Metro 2033, 40.72 FPS: Maximum settings, PhysX disabled, 4xMSAA, 16xAF.
Portal 2, 127.90 FPS: Maximum settings, 4xMSAA, 16xAF.
The Elder Scrolls V: Skyrim, 59.55 FPS: Ultra preset, Bethesda high-resolution texture pack, indoor cave scene.
How Much Boost?
Because GPU Boost happens in realtime and the boost factor varies depending on exactly what's being rendered, it's hard to pin the performance gain down to a single number. To help clarify the typical performance gain, all Kepler GPUs with GPU Boost will list two clock speeds on its specification sheet: the base clock and the boost clock. The base clock equates to the current graphics clock on all NVIDIA GPUs. For Kepler, that's also the minimum clock speed that the GPU cores will run at in a 3D application. The boost clock is the typical clock speed that the GPU will run at in a 3D application.
For example, the GeForce GTX 680 has a base clock of 1006 MHz and a boost clock of 1058 MHz. What this means is that in 3D games, the lowest the GPU will run at is 1006 MHz, but most of the time, it'll probably run at around 1058 MHz. It won't run exactly at this speed--based on realtime monitoring and feedback, it may go higher or lower, but in most cases it will run close to this speed.
GPU Boost doesn't take away from overclocking. In fact, with GPU Boost, you now have more than one way to overclock your GPU. You can still increase the base clock just like before and the boost clock will increase correspondingly. Alternatively, you can increase the power target. This is most useful for games that are consuming near 100% of this power target.
Kepler vs. Fermi - the differences
What you're looking at is a hugely simplified high-level overview of Kepler and Fermi GPU architectures. A cursory glance indicates that Kepler is a super-sized version of Fermi, which is a reasonably accurate method by which to describe it, but the devil is in the details, because it is both super-sized from a core count yet physically almost half the size.
Both NVIDIA architectures, as used on the GTX 680 and GTX 580, are based on combining mini-GPUs that are known as Graphics Processing Clusters (GPC). Both GPUs have four of these GPC 'squares,' flanked by the usual ROPs and memory controllers - more on those later. Each GPC is now home to two beefed-up Streaming Multiprocessor units (SMs) - the thin rectangular sections - rather than four in Fermi, though the overhaul is radical enough for NVIDIA to term them SMX - hey, 'X' just sounds cooler, right? - rather than plain ol' SM.
Going from top to bottom, Kepler's to-motherboard host interface has been jacked up from PCIe 2.0 to PCIe 3.0, enabling, potentially, double the bandwidth, which is useful as GPUs become increasingly more powerful. Controlling the thread scheduling on both chips is what NVIDIA terms the master GigaThread Engine. This feeds each GPC. In this sense, nothing much has changed.
However, the main focus of Kepler rests with improving the oomph provided by each GPC, and each of these little green squares you see is a CUDA core. It doesn't take a genius to figure out that Kepler has more, lots more, than Fermi, so the next step is to provide an exploded view of an SMX, to see just what NVIDIA has been up to.
Why Is Power Efficiency Important?
When we first launched Fermi with the GeForce GTX 480, people told us how much they loved the performance, but they also told us they wished it consumed less power. Gamers want top performance, but they want it in a quiet, power efficient form factor. The feedback we received from Fermi really drove this point home. With Kepler, one of our top priorities was building a flagship GPU that was also a pleasure to game with.
Kepler introduces two key changes that greatly improve the GPU's efficiency. First, we completely designed the streaming multiprocessor, the most important building block of our GPUs, for optimal performance per watt. Second, we added a feature called GPU Boost that dynamically increases clock speed to improve performance within the card's power budget.
Kepler's new SM, called SMX, is a radical departure from past designs. SMX eliminates the Fermi "2x" processor clock and uses the same base clock across the GPU. To balance out this change, SMX uses an ultra wide design with 192 CUDA cores. With a total of 1536 cores across the chip, the GeForce GTX 680 handily outperforms the GeForce GTX 580.
But what's benefited the most is power efficiency. Compared to the original Fermi SM, SMX has twice the performance per watt. Put another way, given a watt of power, Kepler's SMX can do twice the amount of work as Fermi's SM. And this is measured apples-to-apples, on the same manufacturing process. Imagine a conventional 50 watt light bulb that shines as brightly as a 100 watt light bulb—that's what Kepler's like when gaming.