Advocacy

  Myths
  Press

Dojo (HowTo)

  General
  Hack
  Hardware
  Interface
  Software

Reference

  Standards
  People
  Forensics

Markets

  Web

Museum

  CodeNames
  Easter Eggs
  History
  Innovation
  Sightings

News

  Opinion

Other

  Martial Arts
  ITIL
  Thought


Benchmarks revisited
...

By:David K. Every
©Copyright 1999


My first article for MacWEEK was about iBench <http://macweek.zdnet.com/1999/11/07/igeek.html>.

A quick summary is that benchmarks are often misleading if you aren't really good at presenting what the benchmark measures. Benchmarks are useful for engineers to figure out what to improve and to find bottlenecks (code or resources that are overused or slowing down performance in very specific areas) -- but the more you try to use benchmarks to make over generalizations, the more inaccurate those conclusions are likely to be.

I said in the original article, that if the PC was running much faster than the Mac for web browsing, then it was likely some problem with the configuration or benchmark itself. I use both machines regularly, and most of the time I find that the Mac is more useful, and often faster on operations that I care about.

Home Benchmarks

One ambitious user (George McGee) had a friend who had just bought an iBook for himself, and was lamenting the purchase because of the iBench report -- iBench implied that the iBook was much slower than comparable PCs and he felt had gotten gypped. So they decided to run a whole suite of their own benchmarks on their own, to prove that iBench was iBull.

They took a 333 MHz Gateway Solo (that had cost $1000 more than the iBook), and compared it to the 300 MHz iBook during their lunch hour -- just to see how close they would run in the "real world". It wasn't completely fair, since the iBook was a bit newer -- but according to iBench the results were so dramatic that it should have been a slaughter in the Gateways favor. Rather than using a special high speed line (like a T1), they used a normal dial in ISP -- and they timed things as they went.

The iBook booted twice as fast, and when they shut the two machines down, the iBook was roughly four times faster -- but those weren't things that was measured by iBench, even if they were common operations for portable computer users. The iBook was roughly 6 times faster to connect to their ISP -- that also wasn't measured in iBench. Time to launch their respective browser -- the iBook was almost three times faster. The Gateway got a faster modem connection speed -- but they didn't worry about that little handicap to the iBook. Time to load a page (first time with cache flushed), and the PC was almost twice as fast for some things (that iBench measured), but in some cases it went in the iBooks favor. Then they ran some QuickTime streaming, and the iBook was usually ahead there. In the end, the iBook was faster at almost everything else they needed to do -- except a few things that iBench had specifically measured. More importantly, the iBook was a lot faster on those common operations that they cared about.

They were going to evaluate sleep time, but pointed out that the Gateway power management had never worked right for the owner. Then they observed that the iBook was lighter, had cost less, had a far longer battery life (twice as long), and the Gateway would get so hot that it would cook their genitals during extended use (the iBook having no such problem). All that was showing the iBook as a real bargain, and that was before getting into the superiority of the Mac interface. The previously disheartened iBook owner bounded away happy with his purchasing decision after running the only benchmarks that really mattered -- his own. They went back and did a second series of benchmarks, just to verify the first, and they came up with similar results. Technically the iBook had lost in some areas (like browsing updates) -- but a few seconds difference between page loads didn't really bother the owner. However, the difference between between battery life, cost, usability, sleep, and 18 seconds versus 1:10 seconds just to connect to an ISP really did matter to him -- if you are connected all the time, then the dial in time might not matter at all to you. Again, the point is that benchmarks have very narrow value, and you have to be careful on how the data is interpreted, and what it means to you.

Tuning

The difference in browsing speed (some page loads) between platforms is suspicious. The Mac is a superior machine and architecture, and it shouldn't be slower -- let alone that dramatic a difference. A well known Mac Geek (Bo3b Johnson) did some exploration and found (documented) ways to dramatically speed up the Mac browsing experience, and he posted some interesting configuration solutions on MacInTouch <http://www.macintouch.com/browserperf.html>.

The gist of it is with simple configuration issues alone, he was able to make IE4.5 go up to three times faster than the default configuration, and with some different tweaks could make Navigator may go up to twice as fast as before. These tweaks alone should bring the Mac right on par, or surpass, the performance of the PC browsers in page loads.

So while I still think the browsing speed difference probably isn't that important to most users -- it is nice to know that even that issue will be addressed and can be addressed with some tweaking. Of course the next version of Explorer and Navigator are likely to have those tweaks and others built in to make them go faster on Macs. Somehow I doubt that PC Magazine will expend an effort explaining that their first set of tests (and conclusions) were just based on poor Mac configuration, and that future versions of Mac browsers will have the better configuration as a default. I expect to hear silence on this issue from PC Magazine -- just like when they made their early complaints about the iMac.

PC Magzines first bash, er, I mean first look at the iMac spent most of its time comparing video performance and scroll rates instead of usability. The issues that weren't just configuration differences (which also mislead some people) were quickly addressed, and Apple updated the video performance, increased the memory, upgrading the OS and made other improvements that would have dramatically altered the results of that benchmark. PC Magazine has never felt it necessary to correct or revisit any of the information in their "first look", other than to defend their bad configurations or bad arguments.

This all reflect that old altruism that there are "lies, damn lies, and statistics". Benchmarks are just statistics, and if used improperly a way to completely deceived people into thinking something is true that might not be completely true. There are reasons that PC Magazine wants to make PC's look better than Macs, and reasons why they won't compare all the things that are important to users but would make the Mac look superior.

Even if the Mac were to start trampling the PCs in iBench, I still don't think the iBench benchmark should really be something that users take too seriously -- it is just a cute metric that has allowed programmers and tweakers to figure out that Microsoft or Netscape have done a poor job of performance tuning for the Mac -- which can and will be easily resolved. It doesn't reflect any of the important issues about which platform is better for getting work (or play) done across the Internet.

Apple and benchmarks

I don't think Apple is immune to this either. I got a wave of letters crying foul about how Apple (and Steve Job's) use of ByteMarks to compare performance across platforms. They didn't like the broad brush "twice as fast as PCs", and I pretty much agree with them -- the Mac is not as much faster than PC's as ByteMarks alone leads people to believe. Other people hated the "Super-computer" claim of the G4 / AltiVec / Velocity Engine stuff -- the truth is the G4s aren't as fast as today's super-computers, even if they are faster than those of just a few (5 -10) years ago, and the G4 does still blow away the PentiumIII or Athlon at vector operations. The Mac is far faster per MHz than PCs, so the whole MHz thing is misleading in favor of PCs -- and the Mac is quite a bit faster (relatively) than the just as misleading SpecMarks, iBench or the many other PC metrics lead people to believe. So benchmarks are used by both sides to mislead people -- and I hardly ever hear people crying foul about that benchmark that mislead people in their favor.

So Apple is trying to defend against the mistruths of benchmarks used against them, by using ones that are at least as valid (and invalid) against others. It is true that the Mac doesn't have quite the advantage that Apple marketing is likely to imply -- that is why they are called marketing and not the ministry of truth. Personally, I think Apple shouldn't waste too much of their time in marketing with the MHz/MIPS/Performance war -- and stick to what is important (usability, reliability, and valuable time saving features). And of course the Mac is far superior (in my opinion), based on things that aren't measured by most benchmarks.

Gaming

In my previous article I also babbled about gaming benchmarks. I mentioned that most of the specs don't matter -- or don't matter the way people think they do -- and I talked about performance thresholds (that performance beyond the threshold of human perception matters very little). Unfortunately I wasn't very completely clear -- and discussed some fps (Frames per Second) rates on games (a commonly used metric that means very little), and the whole issue got muddled.

Gamers are running around measuring things like fps (frames per second) and keep thinking higher and higher numbers matter. They do and they don't. If you have sustained 30 fps (with motion blur), and a well designed game (synchronous frame rates) then it will play smoothly to users and probably be a better experience than most current games with double that frame rate (or more). Current games don't work like that and aren't designed for sustained or worst case performance -- and most measurements aren't measuring worst case or sustained performance. The benchmarks people use a measuring peak rates or average rates -- so a game with 80 fps average might get jittery and slow down to 20 fps (or less) in times of peak activity. So measuring frame rates on some demo or flyby is not the same as playability in the worst case conditions. So gamers care about getting 120 fps or more, because it isn't really 120 fps.

Some gamers talk about how faster fps allows for better positioning or faster spinning and slew (spin) rates. This is also hype and bad metrics, what they are ignoring is that if the game was designed better (and everything was synchronous), that wouldn't matter either. Most of it is hype and misleading, or dealing with specific configurations for specifics machines and specific games. When you start comparing cross platform, and cross genre it becomes nonsense.

Gamers that care about frame rates really care about the wrong metrics. Comparing frame rates on two different games doesn't make any sense, because playability is the sum of all factors -- and that goes way beyond any one factor (like frame rate). And the whole issue of playability goes way, way beyond frame rates -- like I personally prefer Unreal to Quake, despite the latter being "faster" moving, and I know nothing about their relative frame rates. The point is that the benchmarks are deceptive and aren't really addressing the issues people should care about. They do show specific bottlenecks and issues for geeks and tweakers -- if you are VERY specific about what you think the results are. If you try to broadly apply the benchmarks, then they are likely to be misleading and often distracting from the real issues.

Conclusion

The whole point of all this is to reiterate the futility of other people's benchmarks. They have very limited value -- as long as you don't take them too seriously. They are great for diagnostics and for helping engineers and support/configuration personelle to find issues and tune things. But 95% of these benchmarks are just misleading in the hands of users.

If you do care about benchmarks as a user, then personally, I prefer higher level and more real world Application benchmarks, like Adobe Photoshop comparison (assuming that you are doing most of your work in Adobe Photoshop) or benchmarks using the tools that I use -- and usually those type of comparisons come out in the Macs favor. I've been doing Java and WebObjects development (and used to do lots of C++ and cross platform development) and found that usually development was faster on the Mac, as where the programs that I created (when comparing them cross platform). That is the stuff that mattered most to me, because it was the exact stuff I used -- and was not just some statistics that someone else created to achieve their ends.

As an engineer or geek, I use benchmarks to help me diagnose issues. Sometimes I'll even quote them to counter other people's misleading benchmarks (or to shut someone up). Yet, I still tell average users to ignore most of them since that performance is probably not what is important to them -- and unless they are prepared to spend lots of time and energy figuring out the truths and flaws about benchmarks, then they are only likely to be mislead by them. If you want to know the truth, then you need to sit the exact competing platforms that you want to use, side by side, and do the work that you need to do on both machines and see which machine that you really prefer. And don't forget to add in the subjective usability preferences. When I've done this, and when I try to get real work done, I usually find that my Mac far outperforms my PC in terms of real work -- whether benchmarks reflect this or not. More importantly, the features and usability of the Mac is usually higher, and going to result in a lot less frustration and a lot higher productivity -- which is far more important over all.


Created: 11/07/99
Updated: 11/09/02


Top of page

Top of Section

Home