The future of graphics cards revealed
31st May 2009 | 11:00
Why the graphics market is set for an enormous shake up
A brief history of graphics cards
Are we addicted to graphics power? Are Nvidia and AMD, in reality, the pushers of the most addictive shader skag?
In any other walk of life you'd be dragged off to rehab for this intensely self-destructive cycle of substance abuse. But just like that heroin addict, who's currently trying to break into your home and steal your pride and joy gaming system, the PC industry is addicted to graphics.
It's the driving force behind the games we love and the reason we love the PC so much in the first place and why consoles will always be toys in comparison. Of course, we could turn around, throw our hands heavenwards and shout 'Enough is enough, this madness must end! Stop the development! My graphics card is good enough.' But where would that get us? We'd still be playing Tomb Raider on an original 3dfx Voodoo card.
The desperate truth is we need, we long for that hit of hardcore 3D acceleration. We need help, we need treatment, what we need is the latest graphics card. Sliding that long card into a tight PCI Express slot always feels so good.
For many years now we've been happy in this abusive relationship. Clinging to our ageing card, trying to scrap the last remnants of a decent frame rate together by installing hacked drivers and dropping the resolution, until we end up crawling back to our favourite green or red dealer for a fresh hit of delicious 3D.
But today that magic hit isn't just about graphics: from HD decoding and physics acceleration to GP-GPU features, that graphics card is offering a lot of technology. The next generation of cards is set to up this technology to a new level and with the advent of a new pusher on the scene – in the form of chip-giant Intel – the entire graphics market is set for an enormous shake up.
It'll be a combination of new competition, changing demands and the evolving technology that is bringing general processing and graphics processing closer and closer together. But what will happen when these two worlds collide? Lets find out…
Back in time
Not that we want to dwell on the past but how did this addictive relationship start? The answer lies with what PCs were back in the early nineties and how 3D images are generated in the first place. So lets take you back, back, back in time to when the original Doom, Duke Nuke 'em 3D and Wing Commander adorned our screens.
3D gaming was a simplistic affair – sometimes referred to as 'vector games'. 3D line objects were made of vectors; a mathematical construct and nothing more than a line in space defined by two points. Put three vectors together and you get a triangle, put enough triangles together and you can form anything.
Luckily for your average 7MHz, 16-bit processor, vectors can be manipulated using simple matrices functions, so they can be scaled and rotated in our imaginary space before being drawn to the screen. But lines aren't very exciting, unless they're white and Bolivian in origin.
As a stepping stone to true 3D, Doom and its clones were based on 2D maps that had simple height information and the actual 3D effect were a textured wall projection. Similarly, the monsters were flat bitmaps positioned on that same 2D map, scaled according to their distance from the player.
NOT SO ELITE: Graphics have come a long way since the dark days of 1982
This combined with pseudo lighting effects enabled id Software to generate a basic, fully textured 3D world on a lowly 386 PC. Faster processors have enabled devs to combine the texture handling used in Doom with a true full 3D Vector Engine to create the likes of Descent in 1995, and in 1996, the seminal Quake.
But despite all the cleverness of these engines, incredibly basic abilities, such as texture filtering were and remain simply too processor intensive for a standard CPU to even consider attempting in real time.
The first time our gelatinous eyeballs gazed upon the smooth textures and lighting effects in Quake or the explosive effects of Incoming, we were hooked. It was these types of effects and abilities that enabled a mid-nineties PC to pull off arcade-level graphics.
While not wanting to delve into degree level subjects, to really understand why graphics cards exist as they do today, it's helpful to know what's required to create that eye-pleasing 3D display we so enjoy. As you'll see graphics cards started with handling only a fraction of the total process, up until today where they embrace almost the entire task.
We've already mentioned vectors and how they can be used to build up models from triangular meshes. You start here with your models, these need to be transformed and scaled to fit into a virtual 'world view', the application then applies a 'view space', which is how the player will view this world. It's a pyramid volume cut out of the world space and bounds the only area of interest to the renderer.
From this pyramid we get the clipping space, which is the visible square of our virtual viewport and finally these are translated into the screen space where the 2D x/y coordinates are calculated ready for the pixel rendering. These steps are important as originally this was done on the CPU, but stages were slowly shifted to the GPU. So are you still with us?
As that's the simple part, each of those 'views' is required for different stages in rendering. For instance, to help optimise the rendering it makes sense to discard all the undrawn triangles. Occlusion culling will remove obscured objects, trivial clipping removes objects outside the 'view space' and finally culling determines which triangles are facing away from the viewer and so can be ignored.
The clipping space view is created from this remaining world space and any models that bisect the viewing boundary box need to be clipped off and retessellated, leaving only the visible triangles in the final scene.
Lighting, unified shaders and Larrabee
Light and bright
With an optimised view space created, lighting can be applied. It's important to understand this isn't the visual representation of light, it's calculating how 'bright' every surface is going to be. A scene can have a global light-source, along with point-sources and spotlight sources.
Every triangle surface will have material properties, such as ambient, diffuse, specular and emissive material colours. For every source and every triangle a calculation will be made to determine its total luminosity. As you can imagine the more sources there are, the larger the calculation expense.
It's important to remember that, at this stage, all we know is the luminance for each triangular surface, the actual rendering comes later..
For now lets just say each pixel can now be blended with its corresponding lighting values, textures and other effects, such as bump maps and light maps. On top of this each pixel will have filtering applied, fogging, shadow values and even antialiasing to produce the final image.
If you're feeling a bit dazed and wondering what that's useful for, it's so you have an overview of what goes into creating a single 3D frame, which is on screen for mere milliseconds. As graphics cards have developed more of that pipeline has been moved or added to the graphics card.
With the original 3D cards, only the end rasterisation and rendering stages were performed on-card and that was by dumb, fixed-units that could only perform a single render pass. Multitexturing and multi-pass rendering improved visual quality and when DirectX 7.0 was released in 1999, graphics cards got a little smarter because of Transform and Lighting (T&L).
T&L moved the lighting and vertex transformation stages on to the graphics card and was the first move away from CPU-based vertex handling. It wasn't until the introduction of DirectX 8 that things really got interesting, as the first shaders appeared.
Vertex shaders enable programmers to manipulate vertices directly on the card, while pixel shaders replaced the fixed multi-texture engines with programmable ones. These gave graphics cards their first smarts, even though these were limited; there couldn't be any branches in the code, there were limits on the number of commands and variables, plus the total program length was very short.
So while technically these cards were running programs of a sort, the two types of shader units were different in design and very limited.
The smart stuff
It took until DirectX 9.0c was released in 2004 with Shader Model 3.0 that cards started to look more like a collection of smart processors than dumb fixed logic. Dynamic branching, program lengths over 512 commands and access to hundreds of registers made graphics cards sound more like mini-super computers.
The final evolution came with unified shaders introduced in DirectX 10 and Shader Model 4.0. At this point there's no distinction between vertex or pixel shaders. Cards have 'unified' shaders, akin to having hundreds of tiny dedicated processing units and are found on both the GeForce 8 and Radeon HD 2000, and later generations of cards.
This has enabled both AMD and Nvidia to start offering GP-GPU features and programming languages for current graphics cards and which allow them to process physics and other mathematically complex data alongside 3D rendering.
As testament to the idea that shaders are becoming processors in their own right, Intel is wading into the graphics arena and the ripples could permanently erode the market that once seemed so rock solid.
As we already know the new GPU is codenamed Larrabee and its heart is based, in part, on the original x86 Pentium core. Intel is on record as saying it can, in theory, run OS kernel level code. The idea is to take a bunch of optimised, in-order x86 Pentium cores, add in a Vector Processing Unit and tie the whole thing together via each core's L2 cache using a high-speed ring bus.
Alongside the multi-core design there's a dedicated texture filtering unit, plus the usual extra gubbins for the memory controller, display and system interfaces. Intel is approaching the problem in the opposite direction to AMD and Nvidia. It's almost dumbing-down an x86 core to help fit as many as possible onto a GPU die.
All parties are selling these as more than just a graphics solution. Intel is partnering with Dreamworks, who will be using Larrabee as an accelerated computing platform for ray tracing frames within its animated features. With Intel measuring a 1GHz, 24-core Larrabee GPU running almost five times faster than an eight-core Xeon processor at 2.6GHz at ray tracing. This shows the huge acceleration potential GP-GPU solutions have in the real world.
Currently no one has any idea how well Larrabee will perform, if it performs at all. However, we managed to dig out some figures from a paper Intel published. It estimates the performance of a Larrabee processor running F.E.A.R., Gears of War and Half-Life 2: Episode 2. The most interesting section took the DirectX commands generated from a sequence of random frames from each of these games.
These commands were fed through a 'functional model' of Larrabee rendering at 1,600x1,200 with 4x AA. The test was to see how many 1GHz cores were required to keep a constant 60fps output for each game. The answers is between 10 and 24 cores depending on the game.
Clearly this is nowhere near the performance of top-end cards, the frames would have to be nearer 180fps at that resolution, but even so at 3GHz with 24 cores that would be achievable and still in the realms of reality.
When Larry comes
By the time Larrabee launches, it could be almost 2010 and both Nvidia and AMD will have had next-gen DirectX 11 devices well out of the stable. Intel's own figures show that its core scaling works well up to and over 48 cores with apparently only a two to ten per cent drop in performance.
It's impossible at this stage to know how much a Larrabee card will cost, but we can make several massive assumptions based on existing technology.
For example, a 24-core GPU would require 6MB of L2 cache, that's roughly 300 million transistors. Lets guesstimate that the x86 modified Pentium cores are twice their original sizes at 6 million transistors, that's around 450 million transistors in total for a 24-core Larrabee GPU.
Now, if you accept those transistor counts and accept fab costs are closer to that of a full processor than a GPU, at roughly half the transistor count of a 3GHz Core i7, the consumer price could be up to £230. That's not including the 1GB of GDDR5, of course.
The issue is whether Intel can put out a GPU that's affordable and a good performer, when the Larrabee's launched. At least AMD and Nvidia will put us out of our misery soon enough, as they're both expected to field hardware supporting DirectX 11 in the second half of 2009.
It will be interesting to see, which of the two has the most powerful GP-GPU solution, but regardless Intel won't get an easy ride. The quality of Intel's drivers is going to be a key issue and support for dual-GPU or SLI-style, dual-card support may be a necessity, if it wants to compete for the performance crown.
First published in PC Format Issue 226
Liked this? Then check out The ultimate guide to graphics cards
Sign up for TechRadar's free Weird Week in Tech newsletter
Get the oddest tech stories of the week, plus the most popular news and reviews delivered straight to your inbox. Sign up at http://www.techradar.com/register