Category Archives: Tech

A special kind of Hell…

So, I thought that ActiveRecord transaction support was finally reasonably mature and usable in Rails 4.

How wrong I was.

One of my biggest complaints with ActiveRecord has been that there was no way to set transaction isolation for a given transaction. This got addressed in Rails 4. Unfortunately nothing else appears to have been fixed.

The Problem

The problems remaining in ActiveRecord surrounding transactions:

  • No connector agnostic handling of transaction collisions/forced rollbacks.
  • No visibility of transaction isolation level inside a transaction
  • No automatic restart facilities for transactions.

No connector agnostic handling of transaction collisions/forced rollbacks.

My last post surmised why this is a problem pretty well – you need to understand your specific connector well enough to understand precisely how it’s going to tell ActiveRecord that the transaction failed. This is incredibly non-portable as a result – ActiveRecord is supposed to hide database details, not expose them!

No visibility of transaction isolation level inside a transaction

Transaction isolation is nice and fine, but the way it interacts with transaction nesting is a pain in the ass.

If a new deeper transaction is requested in a nested set with equal or lower transaction isolation to the open transaction, it should just work as normal without changing the isolation level. Whilst I’m sure there’s plenty of arguments to do with performance as to why this is suboptimal, but if you’re lowering isolation levels for ‘performance’, you probably shouldn’t be running those statements inside of a higher isolation transaction block.

I can’t even wrap this in since the isolation level information is discarded when the transaction is created – ActiveRecord makes no effort to remember what isolation level we’re running at.

This one is pretty important too since you can’t always see the implementation details of model methods at a glance and as such can’t tell if it’s usually a short transaction to ensure that it’s results are consistent.

No automatic restart facilities for transactions.

Not everybody realises that a transaction can fail because the database can’t guaranty it’s consistency, leading to the incorrect application of transactions by beginners.

Having an optional automatic restart facility would alleviate some of these problems as it would make it clear that it may be required in some circumstances to handle a transaction restart, and also allow users to get that behaviour with minimal effort.

It also facilities some improvements I have thought of…

A Proposal

I think we need to make the following changes to the transaction code:

  • Create an IsolationLevel class or comparison method so it’s possible to actually compare isolation levels against each other easily. As there is a clear heirachy of isolation with the standards based isolation levels, we should be able to compare them naturally in our code, and it should be possible to add any non-standard levels a driver offers.
  • Add an isolation_level attribute to the Connection.current_transaction objects so you can discover what isolation level is currently in effect
  • Add an optional restartable attribute to the transaction which defaults to false which indicates if the transaction can be auto-restarted. Add a optional max_retries attribute to specify how hard it should try.
  • Treat ALL transactions uniformly – at the moment the code differentiates a transaction started with an explicit isolation level parameter vs one without. You should be able to join transactions at the same isolation level that you desire at the very least, and there’s little harm in allowing transactions of lower isolation levels to join higher isolation level transactions.
  • Consider using the auto-restart facility, if enabled, to kick the isolation level on a transaction to a higher isolation level if required by a sub-transaction. This can be expensive if it’s happening a lot, but it’s better than some of the hoops required without it. In a lot of circumstances, however, the cost should be negligible as the query caches should help reduce the second execution costs.

Detecting Transaction Failures in Rails (with PostgreSQL)

So, Rails4 added support for setting the transaction isolation level on transactions. Something Rails has needed sorely for a long time.

Unfortunately nowhere is it documented how to correctly detect if a Transaction has failed during your Transaction block (vs any other kind of error, such as constraints failures).

The right way seems to be:

RetryLimit = 5 # set appropriately...

txn_retry_count = 0
  Model.transaction(isolation: :serializable) do
    # do txn stuff here.
rescue ActiveRecord::StatementInvalid => err
  if err.original_exception.is_a?(PG::TransactionRollback)
    txn_retry_count += 1
    if txn_retry_count < RetryLimit 

The transaction concurrency errors are all part of a specific family, which the current stable pg gem correctly reproduces in it’s exception heirachy. However, ActiveRecord captures the exception and raises it as a statement error, forcing you to unwrap it one layer in your code.

On Python and Pickles

Currently foremost in my mind has been my annoyances with Python.

My current gripes have been with pickle.

Rather than taking a conventional approach and devising a fixed protocol/markup for describing the objects and their state, they invented a small stack based machine which the serialisation library writes bytecode to drive in order to restore the object state.

If this sounds like overengineering, that’s because it is. It’s also overengineering that’s introduced potential security problems which are difficult to protect against.

Worse than this, rather than throwing out this mess and starting again when it was obvious that it wasn’t meeting their requirements, they just continued to extend it, introducing more opcodes.

Nevermind that when faced up against simpler serialisation approaches, such as state marshalling via JSON, it’s inevitably slower, and significantly more dangerous.

And then people like the celery project guys go off and make pickle the default marshalling format for their tools rather than defaulting to JSON (which they also support).

Last week, I got asked to assist with interpreting pickle data so we could peek into job data that had been queued with Celery. From Ruby.  The result was about 4 hours of swearing and a bit of Ruby coding to produce unpickle. I’ve since tidied it up a bit, written some more documentation, and published it (with permission from my manager of course).

For anybody else who ever has to face off against this ordeal, there’s enough documentation inside the python source tree (see Lib/ and Lib/ that you can build the pickle stack machine without having to read too much of the original source.  It also helps if you are familiar with Postscript as the pickle machine’s dictionary, tuple and list constructors work very similarly to Postscript’s array and dictionary constructs (right down to the use of a stack mark during construction).


[Updated 31 Aug 2010]
[Updated again 6 Sep 2010]

Just ran smallpt against a few machines here:

CPU OS Compiler Cores / Processors Execution Time(s) – 100spp – in seconds
AMD Athlon64 3800+ Linux amd64 G++ 4.4.1 1 365.181
Intel Xeon 2.4GHz Linux i386 G++ 4.4.3 2 x 2-way HT 358.000
Intel Itanium2 900Mhz (McKinley) Linux ia64 G++ 4.3.2 1 1366.38
Sun UltraSparc 3i @ 1Ghz Solaris 10, 64-bit Sparc G++ 3.4.3 1 3384.46
Intel Core2Duo E6850 (3.0Ghz) Linux amd64 G++ 4.2.4 1 x Dual-core 177.46
Intel Core2Duo P8700 (2.53GHz) OS X 10.6.4 G++ 4.2.1 1 x Dual-core 138.36
Intel Core2Duo E5200 (2.5GHz) Linux amd64 G++ 4.4.3 1 x Dual-core 142.50
Intel Core2Duo E8400 (3.0GHz) Linux amd64 G++ 4.4.3 (static link) 1 x Dual-core 117.96

These figures are in no way scientific and should be considered ballpark figures only.  No efforts were made to reduce system load in order to run these tests, but systems used for these tests weren’t particularly loaded to begin with.

Linux builds were compiled with whatever the latest version of G++ installed was, using -O2 (except for the ia64 run which was built with -O3 by accident)

OS X refused to build a binary with OpenMP support that didn’t die very rapidly from a bus error. As a result, the test couldn’t utilise both CPU cores.  Please adjust expectations accordingly.  Build was with -O2 -ffast-math.

[Edits below]

The OSX figures have been updated to use OpenMP thanks to Brian’s advice.  Built using -O2.

The rather noticeable difference in speed between the E6850 and the P8700 is probably due to the different memory systems or the lower core/bus contention on the P8700 (although if it was the latter, I’d expect the margin to be smaller – the difference is only 9 vs 9.5) – it’s hard to say without doing more digging to see where this is slowing down.

The E6850 box is using an XFX branded nVidia nForce 680i motherboard which only provides a DDR2 memory interface – and the system in question is decked out with 4GBs of Corsair low-latency DDR2-800.

The P8700 is an Apple Macbook Pro 13″ 2.53Ghz (Mid-2009) which uses the stock 4GBs of DDR3-1066.

I’ve just added my work E5200 to the mix, and it too is getting scores comparable to the Penryn. I’ll have to re-run on the E6850 to verify the times.

[Updated again]

After a bit of research, I’ve managed to isolate the cause of the speed discrepency to be most likely the result of the upgrades to the design from the Conroe to the Penryn/Wolfdale family. I am surprised that the result is so pronounced.

[Updated again again]

I found an E8400 (Wolfdale 3.0Ghz, 1333MHz FSB) system to run smallpt on, and sure enough, it scores proportionally to the Penryn and E5200.

Still Alive…

Still alive, just not talking much.

gosqlite3 is currently on hiatus whilst I do non-Go related stuff.  Thanks to Tokuhirom and yyyc514 for their contributions.  So far I mostly like Go – I just wish the implementation was a bit more… general.

I have some EVE Online related web tools in the pipe, related to my recent pushes back into Industry and Wormholing.  More on those as I work on them.  Expect them to show up on github sometime in the future.  These I’m doing in RoR.  I recently discovered that the excellent Eve Metrics 2 site was built with RoR and seems to be related to the author of recache.  Rather cool.

In Vista’s Defence…

[Ed:  I actually wrote this back in November, so a few things have changed, but my opinion generally hasn’t]

OK, Usually I wouldn’t be caught dead saying stuff like this, but I’m getting sick of the public FUD and smear campaign on what’s possibly one of the best Microsoft Windows releases to date.

Yes, I know how odd that sounds coming from myself – I’m a long-time Linux and OSX user and have preferred staying away from Windows for anything non-gaming related, but now we’ve got people resisting what’s a fundamentally decent change if they’re changing hardware anyway.

From my point of view, there are a few key points here:

Vista made it from RTM to SP1 in about a year, and the Service Pack 1 release improved performance for most things up to XP level.  Nobody would dream of using XP without Service Pack 2 these days, but would they still be as fond of the RTM release of XP when compared against Vista SP1?

Vista’s UAC (User Account Control), whilst infuriatingly obtrusive, is a step in the right direction for most common users and it’s continued support should see application developers start fixing their applications to operate correctly along-side it without relying on priveledge escalation to get stuff done.  We in Unix-land have been able to do our day-to-day work without superuser privileges, Windows users should be able to as well.

Vista’s video driver system is substantially more robust than the XP video driver system.  Under XP, I’ve had machines with shader-capable graphics cards bring my system down on a regular basis due to GPU crashes (mostly my old Radeon 9700Pro overheating) – and whilst the drivers have been able to intervene to prevent a complete system crash, they have forced me to reboot the system shortly after.  Under Vista, GPU crashes have been met with a transparent restart of the GPU and things have kept on going – mid game with no more than a 20second pause.  And this was with the RTM release – I haven’t seen a GPU crash in quite some time now.  These changes have cost a few features in certain graphics card drivers, but in general have improved the experience of using Windows dramatically.

Now, I’m also hearing complaints about changes in the shell – User Experience changes per se.  I think people have forgotten the transitions from Windows 3.11 to Windows 95 – that was significantly more major than the XP to Vista transition, and at least the Vista Aero themes are vaugely plesant, unlike the old XP Luna theme which I’ve had to religously turn off after booting my XP systems for the first time due to it’s massive performance drag (XP’s GDI system just wasn’t up to it) and it’s gaudy appearance.  On all but my poor Fujitsu laptop, Aero performance has been good enough to leave the full, Translucent, Aero theme enabled without a noticable performance hit – certainly, disabling Aero completely to use the classic theme doesn’t yield the performance increase that disabling Luna did on XP, and Vista, unfortunately, relies on some of the Aero widget sizes and layout to look reasonable.

Theme differences aside, the new start menu makes sense – it dynamic adjusts the main options (which you can manually pin should you dislike that behaviour) to the applications you use most frequently and provides a very fast name-search to find options both in the start menu or files on your PC using the search function now integrated into the start menu.

As for hardware compatibility, I don’t blame Microsoft or Vista, but rather, the hardware vendors.  The only hardware I had any troubles with were all Creative Labs soundcards, and Creative is infamous for holding back on those to force users to upgrade.

Adventures in 64bit cleanup

I’ve been doing a bit of clean-up in linux/FOSS code for 64bit systems and it’s starting to scare me just how much crap filters into Linux distributions every now and then without anybody noticing it.

nss-mdns was today’s violator – the Multicast DNS NSSwitch module (Multicast DNS is sometimes better known as Bonjour or Avahi).

What’s particularly disturbing is that reading through the code reveals that the author suffered from the fatal “all the world is 32-bit” mindset when he wrote it.  I’m surprised nobody else picked up the unaligned access warnings flying up their console, then again, very few people use Itaniums or other 64-bit systems with strict alignment as a desktop system these days.

A small amount of hackery and fidgeting later, the error has gone away (yay!), and the bugfix was submitted.

The other fun fix was surpressing the unaligned access fix-up handler in parrot configuration tests so it could actually work out the correct pointer alignment size.  This little piece of magic is done by using prctl(). The fix was submitted here.

ia64: Plan9, Compilers and ABIs

So, I have my second-hand HP vx2000 (Single-CPU Itanium2 workstation) running in my room.  (OK, this itself is a mistake – it’ll be moved into the home office once I get sick of the added head in my room).

For some bizare reason, I seem to have come up with the idea that trying to port Plan9 to it would be a good idea.

I’ve started studying the architecture and standard ABI documentation and I’m still trying to get my head around little details, but the whole thing seems pretty doable if I beat kencc into shape first.

The standard ABI register usage suggests a mixture of caller-save/callee-save conventions (some of the global registers are available as caller-save scratch) – this should only require minimal changes to kencc as it’s a case of teaching kencc to work out how many extra registers it thinks it needs for any given proc for optimal results, and allocating them dynamically via the appropriate mechanism, and then ignoring their save/restore on call/return.  That itself shouldn’t hurt kencc much (unlike on sparc32, etc, where you need to work almost exclusively in the callee-save model to get best results if you want to use register windows, and that’s fairly contrary to how kencc thinks and allocates registers), but will make context switching and debugging a bit more complicated.

Alternatively, we could just ignore register spill-fill and try to cram ourselves into the scratch registers only.  This would probably sit well with most plan9 developers.

Last (and equally insane option) is to meet minimum requirements for spill/fill (so EFI calls that allocate registers won’t kill us), but allocate all the registers and treat them as caller-save globals

This will make context saves even more expensive (saving 128 64-bit registers WILL suck), but is simple.

Anyway, this isn’t the really hard bit – as far as I can tell, the hard bit is fixing the 9 assembler/loader to produce good ia64 machine code and pick sensible optimisations.