smallpt

[Updated 31 Aug 2010]
[Updated again 6 Sep 2010]

Just ran smallpt against a few machines here:

CPU OS Compiler Cores / Processors Execution Time(s) – 100spp – in seconds
AMD Athlon64 3800+ Linux amd64 G++ 4.4.1 1 365.181
387.036
Intel Xeon 2.4GHz Linux i386 G++ 4.4.3 2 x 2-way HT 358.000
363.824
Intel Itanium2 900Mhz (McKinley) Linux ia64 G++ 4.3.2 1 1366.38
1366.28
Sun UltraSparc 3i @ 1Ghz Solaris 10, 64-bit Sparc G++ 3.4.3 1 3384.46
Intel Core2Duo E6850 (3.0Ghz) Linux amd64 G++ 4.2.4 1 x Dual-core 177.46
180.05
Intel Core2Duo P8700 (2.53GHz) OS X 10.6.4 G++ 4.2.1 1 x Dual-core 138.36
139.68
Intel Core2Duo E5200 (2.5GHz) Linux amd64 G++ 4.4.3 1 x Dual-core 142.50
145.98
Intel Core2Duo E8400 (3.0GHz) Linux amd64 G++ 4.4.3 (static link) 1 x Dual-core 117.96
118.42

These figures are in no way scientific and should be considered ballpark figures only.  No efforts were made to reduce system load in order to run these tests, but systems used for these tests weren’t particularly loaded to begin with.

Linux builds were compiled with whatever the latest version of G++ installed was, using -O2 (except for the ia64 run which was built with -O3 by accident)

OS X refused to build a binary with OpenMP support that didn’t die very rapidly from a bus error. As a result, the test couldn’t utilise both CPU cores.  Please adjust expectations accordingly.  Build was with -O2 -ffast-math.

[Edits below]

The OSX figures have been updated to use OpenMP thanks to Brian’s advice.  Built using -O2.

The rather noticeable difference in speed between the E6850 and the P8700 is probably due to the different memory systems or the lower core/bus contention on the P8700 (although if it was the latter, I’d expect the margin to be smaller – the difference is only 9 vs 9.5) – it’s hard to say without doing more digging to see where this is slowing down.

The E6850 box is using an XFX branded nVidia nForce 680i motherboard which only provides a DDR2 memory interface – and the system in question is decked out with 4GBs of Corsair low-latency DDR2-800.

The P8700 is an Apple Macbook Pro 13″ 2.53Ghz (Mid-2009) which uses the stock 4GBs of DDR3-1066.

I’ve just added my work E5200 to the mix, and it too is getting scores comparable to the Penryn. I’ll have to re-run on the E6850 to verify the times.

[Updated again]

After a bit of research, I’ve managed to isolate the cause of the speed discrepency to be most likely the result of the upgrades to the design from the Conroe to the Penryn/Wolfdale family. I am surprised that the result is so pronounced.

[Updated again again]

I found an E8400 (Wolfdale 3.0Ghz, 1333MHz FSB) system to run smallpt on, and sure enough, it scores proportionally to the Penryn and E5200.

2 thoughts on “smallpt”

  1. I discovered smallpt yesterday and I have been experimenting with it a little. It took me a little while to get it running on my mac.

    Try: “export GOMP_STACKSIZE=32000” before executing in OSX. This will raise the default stack size for the worker threads.

  2. Hello,
    I am a big fan of this code too.
    Very neat and understandable.
    I passed the last month trying to port the code to the PS3 to run on the SPEs.
    I had to modify the radiance code to not be recursive anymore, instead, it use a stack data structure to store the “hitted” object.
    I am sucessfully so far, but, I’m having a little problem on the calculation of the emission component of the object. I think its a single precision issue, since SPE’s works faster with single precision.
    I runned my code in the PPE only and the scene renders beautifully, but when I ran in the SPEs the scene its a little bit darkier.
    Any help will be welcome.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.