Games Software Development

A Return to Strange Adventures in Infinite Space

Strange Adventures in Infinite Space is a game I remember fondly from the early 2000s – a simple and small game that could be played in the space of 20 minutes and was a lot of fun.

Screenshot of Strange Adventures in Infinite Space GPL, from the Linux port.

In 2005, the authors released the source code under the GPL, and I threw out a quick port of the demo only for Linux and promptly got distracted by other things. This port grew up into a more fully featured port by another individual.

A few months ago, I picked up a clean copy of the source and decided to tackle porting it properly. I also got in touch with the original authors and secured permission to redistribute the game data (since they had started releasing it for free) and was given their blessing to do so under the CC-BY-NC license.

The result is Strange Adventures in Infinite Space GPL, which now has it’s first set of public binaries available which include both a build of the game for modern 64-bit systems and the game data in a single, easy to deploy, bundle.

It’s been a interesting run – I’ve dug a lot deeper into the Infinite Space source code than I did last time and I know more about multi-platform porting, and especially about C++, than I did 15 years ago having been the steward of XSquawkBox for the past 5 years.

About the New Port

The new port started from pristine sources, so nothing from my original effort was kept.

The big goals were:

  • To eliminate local directory writing where possible.
    • When SAIS came out originally, it was perfectly acceptable for a game or application to write into its installation directory for save state or computed data – since Windows finally tightened up their default security, as has macOS, this is not a good solution anymore and needed to be addressed.
    • This is the likely the main source of the “You need to run it as Administrator” myth that exists around many game titles on Windows – running the game as Administrator avoids the permissions issues, but really, the game shouldn’t be writing into it’s installation directory anymore.
  • To modernise the codebase so it could run as a 64-bit native executable on 64-bit systems without issue.
    • This was needed as macOS has eliminated 32-bit binary support from it’s most recent release (10.15), meaning 32-bit builds can’t be produced anymore, let alone run. Also, with the rise of arm and aarch64, native 64-bit support would be useful to be able to support aarch64 in the future.
  • To try to untangle the codebase along the way.
    • In the original Read Me file, the original developer (Iikka Keranen) comments “The code is hideous. Sorry. This was my second Win32 app.” He wasn’t kidding – some bits have taken quite a bit of effort to work out what they’re actually doing so I could plot a path forward with modernisation.

Right now, the 1.6.0 release ports deal with points 1 & 2. They build on modern systems and have some adjustments to make life more pleasant on said modern systems.

The binaries are all native 64-bit and should run on the lastest releases of their respective operating systems without significant issues.

Untangling the codebase is harder and is still underway when I get a bit of time and inspiration to come back to it.

Will there be a Web Version?

Whilst it is a bit of a cool party trick to build things using emscripten so they can run in a browser, it’s actually quite a large amount of work to do that to SAIS right now.

It’s not off the cards, but it needs a significant internal rework to address one big issue: SAIS’s code base has multiple main loops, one for each scene.

This is a big no-no in the emscripten interface as the main loop has to be yielded so the browser can continue to run and process events, and ideally you need one super main loop which calls event handlers in the scene code. Refactoring the game’s scene tree and navigation to support this flow is not a trivial piece of work, and there’s other bits and pieces I want to sort out before then.

If emscripten had resolved the fundamental requirement by having a yield function instead of a main callback, this would have been a lot easier to do.

(Ed: I’ve just noticed ASYNCIFY on the emscripten page which seems like it’s new vs when I first assessed the work required – this provides a yield-like function, so maybe it’ll happen sooner after all)

Future Work?

I’m a lot busier right now than when I started this – I mostly did this is a bit of a space filler between tasks and to get a quick personal victory in the game porting space to help deal with my tendency towards suffering from imposter syndrome as I try to change my professional focus. I doubt I’ll be able to seriously return to this for a few months, so the big stretch goals won’t be happening anytime just yet, but I should be able to find the time for general maintenance stuff.


Raytracing – Floating Point Performance in 2020

So, I starting playing with Raytracers again – writing my own, following the guidance of Raytracing from the Ground Up. In doing so, I’ve gotten an opportunity to throw the code over various modern Multi-core machines to get a bit of an interesting look into FP performance in modern systems.

The raytracer in question is up on Github – it’s a simple raytracer that is FP intensive, but not particularly memory intensive. That said, if it’s compiled against debug libraries, it performs extremely poorly due to the Qt (Ed: and MSVC) debugging hooks. It doesn’t try to be too clever since most modern CPUs are fast enough to not need to be.

Last time I did this, it was with smallpt, a decade ago and the story was very different.

This time, rough times have been gathered for a 800×800 raytrace, 100 samples per pixel, with DoF simulation, intersecting with about 30 objects. This produces a minimum of 64 million ray intersections. A sample output is below.

i7-3770kLinux23.1s4/85.3sg++ 7.5.0
i7-4770Windows24.0s4/86.5sMSVC 2019
Ryzen 7 3700XWindows23.2s8/162.7sMSVC 2019
i9-9900kWindows16.4s8/162.3sMSVC 2019
Rough Execution Times – by CPU

In the above table, STT is single-thread time (in seconds), C/T is the Cores vs Threads count. MTT is the multi-threaded time using all available hardware threads. Compiler is the compiler used to produce the binary.

This is a very unscientific test – testing has been done without load shedding, and is typically roughly taken from the worse non-outlier of 3-4 runs and rounded to the nearest 100ms. As my raytracer has a Qt UI, there is measurable UI overhead on Windows – that’s partially reflected by the difference in the 3770 and 4770 times.

[Edit 2020-06-30: I’ve written a basic CLI front-end to the raytracer and that’s also revealed there’s compiler optimisation differences too – I’ll post an update with new times when I’ve finished collecting them]

There is no explicit vectorisation in any build. Autovectorisation is enabled for AVX, but not AVX-512.

Windows builds were performed using MSVC 2019 without tuning biased towards intel or AMD and the same binary set was used on all systems tested.

The surprising result is just how poorly the Ryzen does compared to the older Generation i7s, given that the 3rd and 4th gen i7s are 7-8 years old now – it’s only saving grace is the fact it has double the cores packed into the chip.

It’s also important to note that this is purely a from-cache FP intensive task – it barely hits the memory interface, so the much faster memory interfaces in the later CPUs will not make a difference here.

Hardware Restoration Tech

Restoring an InfoTower 1000

Just a few notes from this one.

I rescued some old gear from ACMS with the intent of at least getting some of it operational whilst I kept a hold of it.

One of the items was a DEC InfoTower system with 7 CD-ROM drives.

The InfoTower was a DEC InfoServer 1000 in a steel cabinet (labelled BA56A) , with a single power supply to power it and all the drives connected.  InfoServers could also be used to drive tape and conventional disk units so they could be served to VMS and other systems over an ethernet LAN.

InfoTowers were available with 4 or 7 CD-ROMs, and whilst a lot of photos out there have these with the older caddied RRD42s and similar, the one I picked up has the RRD43 tray drives.

When I got around to inspecting it, it clearly had a blown power-supply – you could smell the electrolyte from the capacitors the moment the rear service cover was removed.

Interestingly enough, the InfoTowers use an old AT-style power supply which plugs into a backplane which connects to the power-connector made available to each drive bay.

You can replace the power-supply fairly trivially with a modern ATX one if you get a ATX to AT power supply loom.  (I picked up one from ebay).  The power supply you install will need to be no bigger than a standard profile power-supply (so the very large monsters like the old Corsair HX1000 are completely out of the question) and needs at least 3 Molex plugs.   The original supply was a 230W supply.  I used a cheap Aywun 500W power supply since I couldn’t get anything smaller with sufficient Molex plugs.

Unfortunately, most of the eBay looms have the switch soldered to them, rather than connected by spades (like the original AT supplies used), which is wasteful and needless as AC-safe spade connectors are quite cheap.  I cut the spade connectors off of the old power-supply’s switch leads and spliced them onto the switch-cable from the loom.  That said, if I had uncrimped spade shoes and insulators, that would have been a better choice than using the old connectors.

One trap to watch, however, is that the pin-out of the P8/P9 connector on the backplane is the reverse of the convention from PCs – whilst the two connectors are side-by-side, you must connect them with the ground wires outside, not in the middle.  (You should double-check this before you swap the supply over, but I doubt there’s that much variance in the units).  Fortunately the PCB is labelled, so a bit of investigative work should help you verify this before you smoke your new power supply.

AUI to 10Base-T (Twisted-Pair) adapters are also fairly easy to get hands on these days too – I was able to pick up a few more so I’ll have enough to hook up my MicroVAXen as well as the Infoserver.

Software Development

Detecting Transaction Failures in Rails (with PostgreSQL)

So, Rails4 added support for setting the transaction isolation level on transactions. Something Rails has needed sorely for a long time.

Unfortunately nowhere is it documented how to correctly detect if a Transaction has failed during your Transaction block (vs any other kind of error, such as constraints failures).

The right way seems to be:

RetryLimit = 5 # set appropriately...

txn_retry_count = 0
  Model.transaction(isolation: :serializable) do
    # do txn stuff here.
rescue ActiveRecord::StatementInvalid => err
  if err.original_exception.is_a?(PG::TransactionRollback)
    txn_retry_count += 1
    if txn_retry_count < RetryLimit 

The transaction concurrency errors are all part of a specific family, which the current stable pg gem correctly reproduces in it’s exception heirachy. However, ActiveRecord captures the exception and raises it as a statement error, forcing you to unwrap it one layer in your code.

Software Development

On Python and Pickles

Currently foremost in my mind has been my annoyances with Python.

My current gripes have been with pickle.

Rather than taking a conventional approach and devising a fixed protocol/markup for describing the objects and their state, they invented a small stack based machine which the serialisation library writes bytecode to drive in order to restore the object state.

If this sounds like overengineering, that’s because it is. It’s also overengineering that’s introduced potential security problems which are difficult to protect against.

Worse than this, rather than throwing out this mess and starting again when it was obvious that it wasn’t meeting their requirements, they just continued to extend it, introducing more opcodes.

Nevermind that when faced up against simpler serialisation approaches, such as state marshalling via JSON, it’s inevitably slower, and significantly more dangerous.

And then people like the celery project guys go off and make pickle the default marshalling format for their tools rather than defaulting to JSON (which they also support).

Last week, I got asked to assist with interpreting pickle data so we could peek into job data that had been queued with Celery. From Ruby.  The result was about 4 hours of swearing and a bit of Ruby coding to produce unpickle. I’ve since tidied it up a bit, written some more documentation, and published it (with permission from my manager of course).

For anybody else who ever has to face off against this ordeal, there’s enough documentation inside the python source tree (see Lib/ and Lib/ that you can build the pickle stack machine without having to read too much of the original source.  It also helps if you are familiar with Postscript as the pickle machine’s dictionary, tuple and list constructors work very similarly to Postscript’s array and dictionary constructs (right down to the use of a stack mark during construction).



[Updated 31 Aug 2010]
[Updated again 6 Sep 2010]

Just ran smallpt against a few machines here:

CPU OS Compiler Cores / Processors Execution Time(s) – 100spp – in seconds
AMD Athlon64 3800+ Linux amd64 G++ 4.4.1 1 365.181
Intel Xeon 2.4GHz Linux i386 G++ 4.4.3 2 x 2-way HT 358.000
Intel Itanium2 900Mhz (McKinley) Linux ia64 G++ 4.3.2 1 1366.38
Sun UltraSparc 3i @ 1Ghz Solaris 10, 64-bit Sparc G++ 3.4.3 1 3384.46
Intel Core2Duo E6850 (3.0Ghz) Linux amd64 G++ 4.2.4 1 x Dual-core 177.46
Intel Core2Duo P8700 (2.53GHz) OS X 10.6.4 G++ 4.2.1 1 x Dual-core 138.36
Intel Core2Duo E5200 (2.5GHz) Linux amd64 G++ 4.4.3 1 x Dual-core 142.50
Intel Core2Duo E8400 (3.0GHz) Linux amd64 G++ 4.4.3 (static link) 1 x Dual-core 117.96

These figures are in no way scientific and should be considered ballpark figures only.  No efforts were made to reduce system load in order to run these tests, but systems used for these tests weren’t particularly loaded to begin with.

Linux builds were compiled with whatever the latest version of G++ installed was, using -O2 (except for the ia64 run which was built with -O3 by accident)

OS X refused to build a binary with OpenMP support that didn’t die very rapidly from a bus error. As a result, the test couldn’t utilise both CPU cores.  Please adjust expectations accordingly.  Build was with -O2 -ffast-math.

[Edits below]

The OSX figures have been updated to use OpenMP thanks to Brian’s advice.  Built using -O2.

The rather noticeable difference in speed between the E6850 and the P8700 is probably due to the different memory systems or the lower core/bus contention on the P8700 (although if it was the latter, I’d expect the margin to be smaller – the difference is only 9 vs 9.5) – it’s hard to say without doing more digging to see where this is slowing down.

The E6850 box is using an XFX branded nVidia nForce 680i motherboard which only provides a DDR2 memory interface – and the system in question is decked out with 4GBs of Corsair low-latency DDR2-800.

The P8700 is an Apple Macbook Pro 13″ 2.53Ghz (Mid-2009) which uses the stock 4GBs of DDR3-1066.

I’ve just added my work E5200 to the mix, and it too is getting scores comparable to the Penryn. I’ll have to re-run on the E6850 to verify the times.

[Updated again]

After a bit of research, I’ve managed to isolate the cause of the speed discrepency to be most likely the result of the upgrades to the design from the Conroe to the Penryn/Wolfdale family. I am surprised that the result is so pronounced.

[Updated again again]

I found an E8400 (Wolfdale 3.0Ghz, 1333MHz FSB) system to run smallpt on, and sure enough, it scores proportionally to the Penryn and E5200.

Software Development

Adventures in 64bit cleanup

I’ve been doing a bit of clean-up in linux/FOSS code for 64bit systems and it’s starting to scare me just how much crap filters into Linux distributions every now and then without anybody noticing it.

nss-mdns was today’s violator – the Multicast DNS NSSwitch module (Multicast DNS is sometimes better known as Bonjour or Avahi).

What’s particularly disturbing is that reading through the code reveals that the author suffered from the fatal “all the world is 32-bit” mindset when he wrote it.  I’m surprised nobody else picked up the unaligned access warnings flying up their console, then again, very few people use Itaniums or other 64-bit systems with strict alignment as a desktop system these days.

A small amount of hackery and fidgeting later, the error has gone away (yay!), and the bugfix was submitted.

The other fun fix was surpressing the unaligned access fix-up handler in parrot configuration tests so it could actually work out the correct pointer alignment size.  This little piece of magic is done by using prctl(). The fix was submitted here.

Software Development

ia64: Plan9, Compilers and ABIs

So, I have my second-hand HP vx2000 (Single-CPU Itanium2 workstation) running in my room.  (OK, this itself is a mistake – it’ll be moved into the home office once I get sick of the added head in my room).

For some bizare reason, I seem to have come up with the idea that trying to port Plan9 to it would be a good idea.

I’ve started studying the architecture and standard ABI documentation and I’m still trying to get my head around little details, but the whole thing seems pretty doable if I beat kencc into shape first.

The standard ABI register usage suggests a mixture of caller-save/callee-save conventions (some of the global registers are available as caller-save scratch) – this should only require minimal changes to kencc as it’s a case of teaching kencc to work out how many extra registers it thinks it needs for any given proc for optimal results, and allocating them dynamically via the appropriate mechanism, and then ignoring their save/restore on call/return.  That itself shouldn’t hurt kencc much (unlike on sparc32, etc, where you need to work almost exclusively in the callee-save model to get best results if you want to use register windows, and that’s fairly contrary to how kencc thinks and allocates registers), but will make context switching and debugging a bit more complicated.

Alternatively, we could just ignore register spill-fill and try to cram ourselves into the scratch registers only.  This would probably sit well with most plan9 developers.

Last (and equally insane option) is to meet minimum requirements for spill/fill (so EFI calls that allocate registers won’t kill us), but allocate all the registers and treat them as caller-save globals

This will make context saves even more expensive (saving 128 64-bit registers WILL suck), but is simple.

Anyway, this isn’t the really hard bit – as far as I can tell, the hard bit is fixing the 9 assembler/loader to produce good ia64 machine code and pick sensible optimisations.


At least I know I’m not imagining it…

I’ve recently had the displeasure of having to update the copy of WANPIPE that we ship with our product at work from the old stable 2.2 family to the beta 3.3 family in order to support their new Synchronous Serial adapter (The A14x family which is replacing the old S514x family).  We use these cards to support Frame Relay communications.  Frame Relay is still reasonably popular in Australia for private point to point communications, and, to be frank, our product with a supported Sangoma card and annual support probably still costs substantially less than the cheapest Cisco with 100mbit ethernet + sync serial for frame relay support and equivalent support.

So far I’m not  impressed with the A142 kit or drivers.

The kit itself is pretty shoddy.  Whilst the card is a nice small dual-layer card which will fit into a low-profile PCI slot (and even comes with the half-height edge-bracket), the cabling that comes with it is atrocious.  The card has a mini-centronics connector with screw-terminals which the Y cable attaches to  (A142 is a two port card, and is the smallest they offer now) and that itself is OK.  The dodgy comes in with the V.35 cable kit which attaches a V.35 to DB25  cable to the DB25-Y cable that you screw into the card.  The main problem being that both the DB25M on the V.35 cable and the DB25F on the Y cable have screw terminals – so you can’t secure the two to each other making it the weakest (and usually highest-tension because the Y cable is short and usually dangles off the back of the system) join in your V.35 cable run.

And then there were the drivers…  After spending a while chasing my own tail because my old 2.4.30 build tree had been damaged (but was still churning out valid modules, just without valid ksyms), I finally was able to get a build of the 3.3 modules that worked.

Then I had to slave out and accommodate the new user-space tools, changed configuration file layout for wanconfig (the WANPIPE stack configuration tool), all to discover that they’ve managed to break DLCI state indication in two places:  First of all, you used to be able to check IFF_RUNNING to find out if the DLCI was active or not, not anymore.  Secondly, the LIP layer (Sangoma’s own WAN stack) reports via dmesg transitions in card and DLCI state.  I did my usual FRAD power-down test to make sure it was even tracing DLCI failure correctly, and LIP didn’t even notice the DLCI had gone silent/failed until I turned the FRAD back on and it was in it’s resynchronising state.

At least I got an email back from Sangoma technical support confirming that they had indeed broken the support unintentionally, and it’ll be fixed next release.  Shame they didn’t mention when that would be.