Category Archives: Software Development

A special kind of Hell…

So, I thought that ActiveRecord transaction support was finally reasonably mature and usable in Rails 4.

How wrong I was.

One of my biggest complaints with ActiveRecord has been that there was no way to set transaction isolation for a given transaction. This got addressed in Rails 4. Unfortunately nothing else appears to have been fixed.

The Problem

The problems remaining in ActiveRecord surrounding transactions:

  • No connector agnostic handling of transaction collisions/forced rollbacks.
  • No visibility of transaction isolation level inside a transaction
  • No automatic restart facilities for transactions.

No connector agnostic handling of transaction collisions/forced rollbacks.

My last post surmised why this is a problem pretty well – you need to understand your specific connector well enough to understand precisely how it’s going to tell ActiveRecord that the transaction failed. This is incredibly non-portable as a result – ActiveRecord is supposed to hide database details, not expose them!

No visibility of transaction isolation level inside a transaction

Transaction isolation is nice and fine, but the way it interacts with transaction nesting is a pain in the ass.

If a new deeper transaction is requested in a nested set with equal or lower transaction isolation to the open transaction, it should just work as normal without changing the isolation level. Whilst I’m sure there’s plenty of arguments to do with performance as to why this is suboptimal, but if you’re lowering isolation levels for ‘performance’, you probably shouldn’t be running those statements inside of a higher isolation transaction block.

I can’t even wrap this in since the isolation level information is discarded when the transaction is created – ActiveRecord makes no effort to remember what isolation level we’re running at.

This one is pretty important too since you can’t always see the implementation details of model methods at a glance and as such can’t tell if it’s usually a short transaction to ensure that it’s results are consistent.

No automatic restart facilities for transactions.

Not everybody realises that a transaction can fail because the database can’t guaranty it’s consistency, leading to the incorrect application of transactions by beginners.

Having an optional automatic restart facility would alleviate some of these problems as it would make it clear that it may be required in some circumstances to handle a transaction restart, and also allow users to get that behaviour with minimal effort.

It also facilities some improvements I have thought of…

A Proposal

I think we need to make the following changes to the transaction code:

  • Create an IsolationLevel class or comparison method so it’s possible to actually compare isolation levels against each other easily. As there is a clear heirachy of isolation with the standards based isolation levels, we should be able to compare them naturally in our code, and it should be possible to add any non-standard levels a driver offers.
  • Add an isolation_level attribute to the Connection.current_transaction objects so you can discover what isolation level is currently in effect
  • Add an optional restartable attribute to the transaction which defaults to false which indicates if the transaction can be auto-restarted. Add a optional max_retries attribute to specify how hard it should try.
  • Treat ALL transactions uniformly – at the moment the code differentiates a transaction started with an explicit isolation level parameter vs one without. You should be able to join transactions at the same isolation level that you desire at the very least, and there’s little harm in allowing transactions of lower isolation levels to join higher isolation level transactions.
  • Consider using the auto-restart facility, if enabled, to kick the isolation level on a transaction to a higher isolation level if required by a sub-transaction. This can be expensive if it’s happening a lot, but it’s better than some of the hoops required without it. In a lot of circumstances, however, the cost should be negligible as the query caches should help reduce the second execution costs.

Detecting Transaction Failures in Rails (with PostgreSQL)

So, Rails4 added support for setting the transaction isolation level on transactions. Something Rails has needed sorely for a long time.

Unfortunately nowhere is it documented how to correctly detect if a Transaction has failed during your Transaction block (vs any other kind of error, such as constraints failures).

The right way seems to be:

RetryLimit = 5 # set appropriately...

txn_retry_count = 0
begin
  Model.transaction(isolation: :serializable) do
    # do txn stuff here.
  end
rescue ActiveRecord::StatementInvalid => err
  if err.original_exception.is_a?(PG::TransactionRollback)
    txn_retry_count += 1
    if txn_retry_count < RetryLimit 
      retry
    else
      raise
    end
  else
    raise
  end
end

The transaction concurrency errors are all part of a specific family, which the current stable pg gem correctly reproduces in it’s exception heirachy. However, ActiveRecord captures the exception and raises it as a statement error, forcing you to unwrap it one layer in your code.

On Python and Pickles

Currently foremost in my mind has been my annoyances with Python.

My current gripes have been with pickle.

Rather than taking a conventional approach and devising a fixed protocol/markup for describing the objects and their state, they invented a small stack based machine which the serialisation library writes bytecode to drive in order to restore the object state.

If this sounds like overengineering, that’s because it is. It’s also overengineering that’s introduced potential security problems which are difficult to protect against.

Worse than this, rather than throwing out this mess and starting again when it was obvious that it wasn’t meeting their requirements, they just continued to extend it, introducing more opcodes.

Nevermind that when faced up against simpler serialisation approaches, such as state marshalling via JSON, it’s inevitably slower, and significantly more dangerous.

And then people like the celery project guys go off and make pickle the default marshalling format for their tools rather than defaulting to JSON (which they also support).

Last week, I got asked to assist with interpreting pickle data so we could peek into job data that had been queued with Celery. From Ruby.  The result was about 4 hours of swearing and a bit of Ruby coding to produce unpickle. I’ve since tidied it up a bit, written some more documentation, and published it (with permission from my manager of course).

For anybody else who ever has to face off against this ordeal, there’s enough documentation inside the python source tree (see Lib/pickletools.py and Lib/pickle.py) that you can build the pickle stack machine without having to read too much of the original source.  It also helps if you are familiar with Postscript as the pickle machine’s dictionary, tuple and list constructors work very similarly to Postscript’s array and dictionary constructs (right down to the use of a stack mark during construction).

Still Alive…

Still alive, just not talking much.

gosqlite3 is currently on hiatus whilst I do non-Go related stuff.  Thanks to Tokuhirom and yyyc514 for their contributions.  So far I mostly like Go – I just wish the implementation was a bit more… general.

I have some EVE Online related web tools in the pipe, related to my recent pushes back into Industry and Wormholing.  More on those as I work on them.  Expect them to show up on github sometime in the future.  These I’m doing in RoR.  I recently discovered that the excellent Eve Metrics 2 site was built with RoR and seems to be related to the author of recache.  Rather cool.

Adventures in 64bit cleanup

I’ve been doing a bit of clean-up in linux/FOSS code for 64bit systems and it’s starting to scare me just how much crap filters into Linux distributions every now and then without anybody noticing it.

nss-mdns was today’s violator – the Multicast DNS NSSwitch module (Multicast DNS is sometimes better known as Bonjour or Avahi).

What’s particularly disturbing is that reading through the code reveals that the author suffered from the fatal “all the world is 32-bit” mindset when he wrote it.  I’m surprised nobody else picked up the unaligned access warnings flying up their console, then again, very few people use Itaniums or other 64-bit systems with strict alignment as a desktop system these days.

A small amount of hackery and fidgeting later, the error has gone away (yay!), and the bugfix was submitted.

The other fun fix was surpressing the unaligned access fix-up handler in parrot configuration tests so it could actually work out the correct pointer alignment size.  This little piece of magic is done by using prctl(). The fix was submitted here.

ia64: Plan9, Compilers and ABIs

So, I have my second-hand HP vx2000 (Single-CPU Itanium2 workstation) running in my room.  (OK, this itself is a mistake – it’ll be moved into the home office once I get sick of the added head in my room).

For some bizare reason, I seem to have come up with the idea that trying to port Plan9 to it would be a good idea.

I’ve started studying the architecture and standard ABI documentation and I’m still trying to get my head around little details, but the whole thing seems pretty doable if I beat kencc into shape first.

The standard ABI register usage suggests a mixture of caller-save/callee-save conventions (some of the global registers are available as caller-save scratch) – this should only require minimal changes to kencc as it’s a case of teaching kencc to work out how many extra registers it thinks it needs for any given proc for optimal results, and allocating them dynamically via the appropriate mechanism, and then ignoring their save/restore on call/return.  That itself shouldn’t hurt kencc much (unlike on sparc32, etc, where you need to work almost exclusively in the callee-save model to get best results if you want to use register windows, and that’s fairly contrary to how kencc thinks and allocates registers), but will make context switching and debugging a bit more complicated.

Alternatively, we could just ignore register spill-fill and try to cram ourselves into the scratch registers only.  This would probably sit well with most plan9 developers.

Last (and equally insane option) is to meet minimum requirements for spill/fill (so EFI calls that allocate registers won’t kill us), but allocate all the registers and treat them as caller-save globals

This will make context saves even more expensive (saving 128 64-bit registers WILL suck), but is simple.

Anyway, this isn’t the really hard bit – as far as I can tell, the hard bit is fixing the 9 assembler/loader to produce good ia64 machine code and pick sensible optimisations.

C: The Preprocessor

Sometimes you just have to step that extra mile and do something completely dodgy.

The termios way for setting baudrates uses macros to represent the bitrate.  When you’re accepting a baudrate from the user, what’s the most sensible way to do it in C?

  1. Get the macro value directly from the user.
  2. Get the baudrate as a string, and then string compare to determine which baudrate you want, and set the flags appropriately.
  3. Get the baudrate as an integer, and then integer compare to determine which baudrate you want, and set the flags appropriately.

People answering 1 or 2 can hand in their C licenses.

So you’re doing integer comparisons, right?  Hey, we can use a switch/case statement for that, right?

So shortly we get:

switch (baudrate) {
case 50:
    baudflag = B50;
    break;
case 75:
    baudflag = B75;
    break;

… etc …

Surely there must be a better way?

Thanks to the C preprocessor, we have a slightly better option…

#define SPEEDMAP(x) case x: baudflag = B ## x; break;
switch (baudrate) {
SPEEDMAP(50);
SPEEDMAP(75);
SPEEDMAP(110);

…etc…

I wouldn’t get too carried away with solving stuff this way though…