6 best Crossfire and SLI graphics cards on test
25th Jul 2009 | 09:00
Multi-GPU set-ups from £160 to £860
Add an extra graphics card. Enjoy improved gaming performance. As concepts go, multi-GPU graphics is a bit of a no brainer.
But is it the next logical step for 3D rendering, the graphics equivalent of multi-core CPUs? Or is it a gimmick that does little more than prove the fact that some people are stupid enough to buy into anything?
Back at the original launch of Nvidia's SLI platform in 2004, it was actually hard to believe SLI was real. Surely a solution as complex and expensive as SLI could never become mainstream? Certainly, the fact that Nvidia's main rival – back then that was ATI before it was swallowed whole by AMD – quickly followed suit with a copycat technology known as CrossFire, wasn't enough to prove the idea had mainstream merit.
At the time, ATI suits admitted off the record they weren't convinced there was a real market for multi-GPU. Then something strange began to happen. Although the number of actual multi-GPU systems remained miniscule, PC enthusiasts began to buy SLI-capable mobos in their droves. Even if they didn't run two graphics cards in parallel, they did want to give themselves that option.
More than anything else, the idea of adding another, cheaper copy of your current graphics card when it begins to run out of puff is extremely seductive. Ironically, however, the ultimate proof that multi-GPU is here to stay comes from AMD, not Nvidia.
AMD has given up engineering really massive GPUs and has instead decided to use multi-GPU technology as the basis for all its future flagship graphics boards. But that doesn't make multi-GPU the best hammer for cracking every graphics nut.
Does a pair of mid-range boards, for instance, really deliver better performance than a single high-end card, for instance? What about the law of diminishing returns as you go beyond two GPUs? Moreover, have we reached the stage where either or both of CrossFire and SLI have become truly reliable?
For the answer to these questions and much, much more, you know what to do.
Entry level GPU
Nvidia's GeForce GTS 250 and the ATI Radeon HD 4770 from AMD share a common purpose.
For gaming junkies, you might even say it's a sacred calling. Both aim to deliver the maximum performance in return for the minimum pecunias. Other 3D cards might be faster, but none can match the bang-for-buck ratio achieved by these mass market pixel pumpers.
The funny thing is that the way they go about it couldn't be more different. Take the GTS 250. Contrary to what the branding insinuates, this is not a new mid-range derivative of Nvidia's mighty GT200 GPU, the graphics chip that forms the beating heart of both the GeForce GTX 285 and 295. It's yet another rehash of the trusty old G92 core that began life eons ago in the GeForce 8800 GT.
Since then, it has fought Nvidia's cause in a number of different guises and yet remarkably little has changed. Now as then, the latest version of G92 packs 128 stream processors, the mini-programmable execution cores responsible for calculating funky visual effects in the latest games.
Likewise it still has 64 texture filtering units, 16 pixel outputs and a 256-bit memory bus. In fact, the only significant tweak involves the manufacturing process used to produce G92 dies.
What began life as a 65nm chip has since been given a slight 55nm squeeze. Consequently, each one is physically smaller. And smaller dies make for cheaper chips. All of which means the GTS 250 adds up to a slightly dated enthusiast class GPU sold at a mainstream price. You can now bag a 512MB GTS 250 for well under £100. But what about AMD's Radeon HD 4770?
Well, it's new from the ground up and sports an architecture optimised to give the best possible performance where it really counts for mainstream customers. AMD has therefore decided to focus the chip's resources heavily on shader processing.
With no less than 640 stream processors and a core clockspeed of 750MHz, the 4770 has around 75 per cent of the raw computational power of AMD's fastest single GPU, the Radeon HD 4890. That's a card that typically sells for nearly £200 and is therefore two and half times more expensive than the 4770.
A question of bandwidth
Indeed, the 4770 also matches the 4800 for pixel output with 16 ROPs and comes close in the texture processing department with 32 units, just eight fewer than its bigger brother.
In fact, in terms of floating point processing power – an interesting if somewhat academic measure of a graphics chip's computational grunt – the 4770 even manages to get within about 10 per cent of Nvidia's might GeForce GTX 285, a graphics card that sells for around £300. So, how has AMD pulled this off at such a low price point? By reducing the size of the 4770's die, that's how.
For starters, thanks to the use of 40nm chip production technology the 4770 has the tiniest transistors yet seen in any GPU. But AMD has also made one very significant compromise in architectural terms. The 4770 has a 128-bit memory bus.
That's half the width of the 250's memory bus and one quarter the size of a GeForce GTX 285's. The upside is lower manufacturing costs. The narrower bus requires fewer connections making both the chip packaging and graphics board design simpler and cheaper.
The penalty, of course, is less bandwidth into and out of the GPU. That sounds bad, but AMD knows that at lower resolutions bandwidth is less critical.
And given that the 4770 is a mainstream board, it's not likely to be paired with large, high resolution monitors - in single-card configurations, at least. Instead, the 4770 will typically be driving 20-or 22-inch monitors with 1,680 x 1,050 pixel grids.
You may be wondering what all this has to do with multi-GPU performance. Actually, it's highly relevant for reasons that ultimately involve memory bandwidth. For starters, any multi- GPU setup comes with increased expectations.
What with the multiple cards and the mobo needed to support them, you're looking at a fairly expensive rig. That in turns means you're more likely to be running at higher resolutions.
The mechanics of multi-GPU technology also count. To cut a long story short, the most common multi-GPU rendering method is alternate-frame rendering (AFR) which, as the name suggests, involves the GPUs taking turns drawing full frames.
That requires both cards having a complete copy of the graphics data, which further compounds the problem of data bandwidth. This is precisely where the differences between the Radeon HD 4770 and GeForce GTS 250 are most telling.
As our benchmark results show, a pair of Radeon HD 4770s in dual-GPU CrossFireX configuration have a nasty habit of losing the plot at higher resolutions. Far Cry 2 is the best example, with performance plummeting horribly above 1,680 x 1,050. Yup, it's the 4770's poxy 128-bit memory bus doing the damage.
Making matters worse, early examples of the 4770, including the HIS boards tested here, are limited to 512MB. At really high resolutions and detail settings, that can force the cards to use main system memory to store graphics data which further reduces performance.
By contrast, the GTS 250's enthusiast class origins and 256-bit memory make a much better platform for multi-GPU antics. As the resolutions ramp up, it maintains its composure and performs in a much more linear fashion.
The fact that the Gigabyte and Zotac GTS 250s used for this test have 1GB frame buffers also helps. That's particularly true at the epic 2,560 x 1,600 resolution where data swapping over the PCI-e bus can become a major handicap for 512MB cards.
As individual graphics boards go they are considered the purist's choice and the very finest single GPUs from the two masters of 3D graphics hardware. These two graphics cards also make for an intriguing comparison. The GTX 285 is nothing less than a brutal graphics bludgeon for your games.
With 1.4 billion transistors, it's the biggest single processor die currently available for the PC – and that includes both graphics processors and the more traditional CPU.
By way of comparison, Intel's latest quad Core i7 processor, for instance, gets by with a measly 731 million transistors, roughly half that of the GTX 285. Indeed, by just about every measurement the GTX 285 is a bona fide heavyweight.
It packs no less than 32 render output units and a massive 80 texture units along with 240 of Nvidia's unified and symmetrical stream processors (it's always worth remembering that AMD and Nvidia's shaders are not directly comparable - although for a rough guide, divide AMD's numbers by five for comparisons). The GTX 285 then can pump out a huge number of pixels.
But in the context of multi-GPU performance, it's the GTX 285's memory subsystem that really makes the difference. With AMD throwing in the towel on uber-GPUs, this chip is uniquely equipped with a 512-bit memory bus. Combined with GDDR3 memory running at an effective rate of over 2.4GHz, the result is 159GB/s of raw bandwidth.
When you're attempting to shuffle around the immense quantities of graphics data that come with running the latest games running at stupendously high resolutions, such as 2,560 x 1,600, that much meaty bandwidth comes in extremely handy.
The Radeon HD 4890 is a very different beast. It has a significantly lower transistor count at just 956 million. It is, in short, a much less complex chip.
But in many ways, it's also a much cleverer one. AMD has really gone to town on the chip's shader array, cramming in 800 stream processors. Consequently, it actually delivers more theoretical computational throughput than the much bigger GTX 285. For the record, we're talking 1.36TFLOPs from the AMD camp compared to 1.06TFLOPs from Nvidia.
Of course, to achieve that floating point performance in a smaller, cheaper but arguably more elegant chip something has to give somewhere else. The 4890 has literally half the render output and texture units, just 16 and 40 respectively, compared to the GTX 285. Its 256-bit memory bus is likewise 50 per cent down.
AMD has offset that to some extent by using the latest and greatest GDDR5 memory interface running at effective clock speed of 3.9GHz. But the total available bandwidth still falls significantly short at 125GB/s. In single-GPU configuration, the design choices AMD has made make an awful lot of sense. After all, the 4890 is not a full enthusiast class GPU.
The vast majority of people who buy it will never run it at resolutions higher than 1,920 x 1,200. So why waste engineering resources and push up the cost of the chip to optimise it for the likes of 2,560 x 1,600?
We're happy to leave Nvidia to chase that tiny market, seems to be the current message from AMD. However, when it comes to running these cards in multi-GPU trim, those higher resolutions and image quality settings suddenly become a much more important issue. The more that you crank up the pixel count or add lashings of anti-aliasing and anisotropic filtering, the more any bandwidth shortcomings, both inside and outside the GPU, will drag performance down.
On paper, therefore, what we have is a contest between a pair of cards cleverly designed to deliver maximum performance within a relatively limited remit (that'll be the Radeons) and a pair engineered with no expense spared (yup, that's the GeForces). But is this reflected in the performance results?
For the most part we'd say yes. When the going gets really tough, it's the two GTX 285s that give the best results. However, the benchmark numbers are slightly distorted by the fact that the overhead of running two cards tends to put a cap on average frame rate results. That's why the average frame rate figures at 1,680 x 1,050 and 1,920 x 1,200 look pretty similar. Hence, in many ways it's the minimum frame rates that count in this part of the market.
In other words, what matters is whether frame rates remain high enough for smooth rendering in the most demanding scenarios. Here, the GeForce boards have a distinct advantage. Even at 2,560 x 1,600 with the rendering engine set to full reheat, a liquid smooth 50 frames per second is as low as the GTX 285 setup will go in Far Cry 2.
The Radeons, by contrast, trough at 37 frames per second. It's a similar story in Crysis Warhead, with the GTX 285s coming the closest to actually running this most excruciatingly demanding game engine smoothly. Also working in Nvidia's favour is the sense that in this part of the market, value for money is much less of a factor.
In single-card configuration, a £310 GTX 285 looks like poor value next to a £195 Radeon HD 4890. But if you can afford £400 for a pair of 4890s, you can probably stretch to £600 for the GeForce boards. And if you have a 30-inch LCD monitor and want the best dual-card performance, that's exactly what we would recommend you do.
Can you have too much of a good thing? That's the first question that leaps to mind in the context of these ridiculous dual-card, quad-GPU graphics solutions. Well, that and the one about your sanity for even considering such wanton decadence.
For starters, the raw specifications of both the ATI Radeon 4870 X2 CrossFireX and Nvidia's competing GeForce GTX 295 SLI are almost too much to comprehend.
In no particular order, the edited highlights include a grand total of nearly 8GB of graphics memory shared between all four cards, getting on for a terabyte per second of memory bandwidth and 2,560 stream shader cores. Madness.
Then there's more madness: the cost. You won't get much change out of £700 for a pair of Radeon HD 4870 X2 boards. Hard to believe, but the GeForce GTX 295 duo is even worse at slightly under £900. Unless you're a Westminster MP bagging goodies on the tax payer's ticket, that's pretty hard to swallow.
These cards are monstrous physical specimens, too, immensely long and enormously heavy. Ultimately, the very concept of cost has to be excluded from the equation if the 4870 X2 CrossfireX and GeForce GTX 295 SLI are to make any sense at all.
For that matter, given how much power these beastly boards consume you'd better not give much thought to the livelihood of birds, bees and lovely old trees, either.
With four high-end GPUs in your PC and you'll be looking at a system that guzzles around 700 watts and that doesn't even include a monitor or speakers. With current concerns regarding the environment in mind, that's probably downright immoral.
And yet at the same time the mad scientist in us can't quite resist the lure of the most powerful graphics solutions available to humanity. Who wouldn't cackle with delight and cry out "It's alive!" as four monster GPUs spool up? Okay, maybe it's just us…
The problem is, the laughter quickly turns to tears when you inspect the benchmark results. But before we come to the detailed performance analysis, let's remind ourselves what makes these beasties tick. The ATI Radeon HD 4870 X2 has been around for some time and is based on a pair of AMD's enormously successful RV770 GPUs.
Key specs include 1,600 stream shaders per card, a core clockspeed of 750MHz and 1GB of GDDR5 memory running at an effective rate of 3.6GHz.
In total, each board packs nearly two billion GPU transistors and a theoretical maximum computational capability of 2.6 TFLOPs.
Needless to say, you can double most of those figures for a pair of 4870 X2s running in quad-GPU CrossfireX mode. But perhaps the best thing about the 4870 X2 is that it brings with it absolutely no compromises compared with the single-GPU card upon which it is based. It has the same clockspeeds and memory buffer as the Radeon HD 4870.
That means that when CrossFire mode doesn't work, you can at least be confident of getting the best single-GPU performance that AMD can offer. Or at least that used to be the case until AMD released the slightly upgraded Radeon HD 4890. So it goes.
As for the GeForce GTX 295, it follows a slightly more typical path for a multi-GPU graphics card in that some compromises have been made. It's based on Nvidia's epic, 1.4 billion-transistor GT200 GPU. But such is the heinous power hungriness of that chip, Nvidia had to give it a bit of a chop here and there. Mercifully, that doesn't include the shader array.
All 240 units are present and enabled in both GPUs which translates into 480 per card and 960 in quad-GPU SLI configuration.
Likewise, all 80 texture units make an appearance. Take a peek at the clockspeeds, however, and you begin to see where Nvidia has cut corners. Both the core clockspeed of 576MHz and the shader clock of 1,242MHz are well down on those of the fastest single-GPU GT200 board, the GeForce GTX 285.
What's more, Nvidia has taken the knife to GT200's render outputs, reducing the number from 32 to 28 per GPU core. That in turn has a knock-on effect on the memory bus and frame buffer. The former shrinks from 512-bit to 448-bit, while the latter drops from 1GB to 896MB, again per GPU.
All of which means that in the unfortunate event that multi-GPU scaling fails to work, systems based on both the Radeon HD 4870 X2 CrossFireX and the GeForce GTX 295 SLI will fail to match the performance of their closest single-GPU relatives.
That's a sobering thought when you think about how much a pair of these cards cost. The good news is that in our benchmarks, there's no evidence of either CrossFireX or SLI failing to provide at least some multi-GPU scaling.
Both are significantly faster across the board than the best single-GPU cards currently on the market. However, when you factor in the competition from the fastest dual-card solutions – a pair of Radeon HD 4890s from AMD or two Nvidia GeForce GTX 285s – the wheels begin to come off.
In the case of the Nvidia comparison, the quad-GPU setup delivers very little extra performance in World of Conflict and is actually significantly slower in Far Cry 2, even if it does take an easy victory in Crysis Warhead at the enthusiastcentric 2,560 x 1,600 resolution.
In the AMD camp, forking out for quad-GPU kit will buy you moderate performance gains in the region of 20 to 25 per cent in Far Cry 2 and World of Conflict, but you'll have to swallow a performance penalty of a similar magnitude in Crysis Warhead.
Is quad-GPU therefore a step too far? We think so.
Benchmarks, technical analysis and verdict
BENCHMARKS:As you can see from the performance analysis, multi-GPU has still got a long way to come (click here for full res image)
SPECIFICATIONS: As you can see the specs of each card are quite similar but offer different levels of performance (click here for full res image)
There's nothing quite like a Supertest for weeding out the truth. This time around, the most unmistakable observation must be that there's a limit to how far you should currently go with multi-GPU technology. Not to put too fine a point on it, but both AMD's and Nvidia's quad-GPU installations kind of sucked.
The other major negative we observed was the fact that multi-GPU reliability remains slightly below the level we would like to see. Admittedly, we had relatively few troubles during setup. In fact that part of the process went smoothly save for the GeForce GTS 250 pairing, which took an awful lot of fettling. But in that case, we were using boards from mixed vendors with slightly different clockspeeds, which is never an ideal scenario.
Let it serve as a reminder that if you decide to start mixing and matching, there's no guarantee you'll end up with a working system. Elsewhere, the Radeon HD 4890s refused to run Crysis Warhead in DX10 mode, forcing us to record results in DX9. That was a particularly odd problem given the other AMD pairings ran the DX10 path with no complaints.
Overall, that may not sound like a lot. But the fact is, when we test single cards we usually have no stability or compatibility issues, period. That's the kind of flawless reliability we demand from multi-GPU technology, before we will give it a totally unreserved thumbs up. But don't go thinking it was all bad. There's plenty of good news.
We're extremely impressed by both the outright performance and the consistency of the GeForce GTX 285s. Money no object, these are the cards we would choose to run in a gaming PC hooked up to a big screen. We wouldn't kick a pair of 4890s out of bed, either. Back in the real world, of course, money is very much an object and for that reason you won't be surprised to hear that our overall winner comes from the most affordable third of our test.
But it's not the Radeon HD 4770s, though they certainly deserve honourable mention. A pair can be had for around £160, usefully cheaper than a single Radeon HD 4890. More importantly, so long as you are running at moderate resolutions, the two 4770s will be the quicker solution. But first prize must go to the trusty o' GeForce GTS 250s. In some ways, that's hardly a surprising result.
After all, they're based on a pretty ancient graphics chipset. But in 1GB trim, that chipset remains uncommonly well optimised for multi-GPU implementations. If you have a 24-inch monitor or perhaps one of the latest full HD 22-inch panels, a pair of 1GB GTS 250s will give you significantly better performance than any single card solution. And, yes, that does indeed include the GeForce GTX 285, and all for £260. 'Nuff said.
First published in PC Format Issue 228
Liked this? Then check out
Sign up for TechRadar's free Weird Week in Tech newsletterGet the oddest tech stories of the week, plus the most popular news and reviews delivered straight to your inbox. Sign up at http://www.techradar.com/register