How to let NHibernate retry deadlocked transactions when using session per request

deadlocklivelocknhibernaterollback

What pattern/architecture do you use in a 3-tiered application using NHibernate that needs to support retries on transaction failures, when you are using the Session-Per-Request pattern? (as ISession becomes invalid after an exception, even if this is a deadlock or timeout or livelock exception).

Best Answer

Note 2 Nowadays I would never put write-transactions inside of the web project - but instead use messaging + queues and have a worker in the background handling messages aiming to cause transactional work to be done.

I would, however, still use transactions for reading to get consistent data; together with MVCC/Snapshot isolation, from web projects. In that case you'll find that session-per-request-per-transaction is perfectly fine.

Note 1 The ideas of this post have been placed in the Castle Transactions framework and my new NHibernate Facility.

OK, here's the general idea. Suppose you want to create a non-finalized order for a customer. You have some sort of GUI, e.g. a browser/MVC app, that create a new data structure with the relevant information (or you get this data structure from the network):

[Serializable]
class CreateOrder /*: IMessage*/
{
    // immutable
    private readonly string _CustomerName;
    private readonly decimal _Total;
    private readonly Guid _CustomerId;

    public CreateOrder(string customerName, decimal total, Guid customerId)
    {
        _CustomerName = customerName;
        _Total = total;
        _CustomerId = customerId;
    }

    // put ProtoBuf attribute
    public string CustomerName
    {
        get { return _CustomerName; }
    }

    // put ProtoBuf attribute
    public decimal Total
    {
        get { return _Total; }
    }

    // put ProtoBuf attribute
    public Guid CustomerId
    {
        get { return _CustomerId; }
    }
}

You need something to handle it. Probably this would be a command handler in a service bus of some sort. The word 'command handler' is one of many and you might as well just call it a 'service' or 'domain service' or 'message handler'. If you were doing functional programming, it would be your message box implementation, or if you were doing Erlang or Akka, it would be an Actor.

class CreateOrderHandler : IHandle<CreateOrder>
{
    public void Handle(CreateOrder command)
    {
        With.Policy(IoC.Resolve<ISession>, s => s.BeginTransaction(), s =>
        {
            var potentialCustomer = s.Get<PotentialCustomer>(command.CustomerId);
            potentialCustomer.CreateOrder(command.Total);
            return potentialCustomer;
        }, RetryPolicies.ExponentialBackOff.RetryOnLivelockAndDeadlock(3));
    }
}

interface IHandle<T> /* where T : IMessage */
{
    void Handle(T command);
}

The above shows an API usage you might choose for this given problem domain (application state/transaction handling).

The implementation of With:

static class With
{
    internal static void Policy(Func<ISession> getSession,
                                       Func<ISession, ITransaction> getTransaction,
                                       Func<ISession, EntityBase /* abstract 'entity' base class */> executeAction,
                                       IRetryPolicy policy)
    {
        //http://fabiomaulo.blogspot.com/2009/06/improving-ado-exception-management-in.html

        while (true)
        {
            using (var session = getSession())
            using (var t = getTransaction(session))
            {
                var entity = executeAction(session);
                try
                {
                    // we might not always want to update; have another level of indirection if you wish
                    session.Update(entity);
                    t.Commit();
                    break; // we're done, stop looping
                }
                catch (ADOException e)
                {
                    // need to clear 2nd level cache, or we'll get 'entity associated with another ISession'-exception

                    // but the session is now broken in all other regards will will throw exceptions
                    // if you prod it in any other way
                    session.Evict(entity);

                    if (!t.WasRolledBack) t.Rollback(); // will back our transaction

                    // this would need to be through another level of indirection if you support more databases
                    var dbException = ADOExceptionHelper.ExtractDbException(e) as SqlException;

                    if (policy.PerformRetry(dbException)) continue;
                    throw; // otherwise, we stop by throwing the exception back up the layers
                }
            }
        }
    }
}

As you can see, we need a new unit of work; the ISession every time something goes wrong. That's why the loop is on the outside of the Using statements/blocks. Having functions are equivalent to having factory instances, except we're invoking directly on an object instance, rather than calling a method on it. It makes for a nicer caller-API imho.

We want fairly smooth handling of how we perform retries, so we have an interface that can be implemented by different handlers, called IRetryHandler. It should be possible to chain these for every aspect (yes, it's very close to AOP) you want to enforce of the control flow. Similar to how AOP works, the return value is used to control control-flow, but only in a true/false fashion, which is our requirement.

interface IRetryPolicy
{
    bool PerformRetry(SqlException ex);
}

The AggregateRoot, PotentialCustomer is an entity with a lifetime. It's what you would be mapping with your *.hbm.xml files/FluentNHibernate.

It has a method that corresponds 1:1 with the sent command. This makes the command handlers completely obvious to read.

Furthermore, with a dynamic language with duck typing, it would allow you to map commands' type names to methods, similar to how Ruby/Smalltalk does it.

If you were doing event sourcing, the transaction handling would be similar, except the transaction wouldn't interface NHibernate's such. The corollary is that you would save the events created through invoking CreateOrder(decimal), and provide your entity with a mechanism for re-reading saved events from store.

A final point to notice is that I'm overriding three methods I have created. This is a requirement from NHibernate's side, as it needs a way of knowing when an entity is equal to another, should they be in sets/bags. More about my implementation here. In any way, this is sample code and I don't care about my customer right now, so I'm not implementing them:

sealed class PotentialCustomer : EntityBase
{
    public void CreateOrder(decimal total)
    {
        // validate total
        // run business rules

        // create event, save into event sourced queue as transient event
        // update private state
    }

    public override bool IsTransient() { throw new NotImplementedException(); }
    protected override int GetTransientHashCode() { throw new NotImplementedException(); }
    protected override int GetNonTransientHashCode() { throw new NotImplementedException(); }
}

We need a method for creating retry policies. Of course we could do this in many ways. Here I'm combining a fluent interface with an instance of the same object of the same type that the static method's type is. I implement the interface explicitly so that no other methods are visible in the fluent interface. This interface only uses my 'example' implementations below.

internal class RetryPolicies : INonConfiguredPolicy
{
    private readonly IRetryPolicy _Policy;

    private RetryPolicies(IRetryPolicy policy)
    {
        if (policy == null) throw new ArgumentNullException("policy");
        _Policy = policy;
    }

    public static readonly INonConfiguredPolicy ExponentialBackOff =
        new RetryPolicies(new ExponentialBackOffPolicy(TimeSpan.FromMilliseconds(200)));

    IRetryPolicy INonConfiguredPolicy.RetryOnLivelockAndDeadlock(int retries)
    {
        return new ChainingPolicy(new[] {new SqlServerRetryPolicy(retries), _Policy});
    }
}

We need an interface for the partially complete invocation to the fluent interface. This gives us type-safety. We hence need two dereference operators (i.e. 'full stop' -- (.)), away from our static type, before finishing configuring the policy.

internal interface INonConfiguredPolicy
{
    IRetryPolicy RetryOnLivelockAndDeadlock(int retries);
}

The chaining policy could be resolved. Its implementation checks that all its children return continue and as it checks that, it also performs the logic in them.

internal class ChainingPolicy : IRetryPolicy
{
    private readonly IEnumerable<IRetryPolicy> _Policies;

    public ChainingPolicy(IEnumerable<IRetryPolicy> policies)
    {
        if (policies == null) throw new ArgumentNullException("policies");
        _Policies = policies;
    }

    public bool PerformRetry(SqlException ex)
    {
        return _Policies.Aggregate(true, (val, policy) => val && policy.PerformRetry(ex));
    }
}

This policy lets the current thread sleep some amount of time; sometimes the database is overloaded, and having multiple readers/writers continuously trying to read would be a de-facto DOS-attack on the database (see what happened a few months ago when facebook crashed because their cache servers all queried their databases at the same time).

internal class ExponentialBackOffPolicy : IRetryPolicy
{
    private readonly TimeSpan _MaxWait;
    private TimeSpan _CurrentWait = TimeSpan.Zero; // initially, don't wait

    public ExponentialBackOffPolicy(TimeSpan maxWait)
    {
        _MaxWait = maxWait;
    }

    public bool PerformRetry(SqlException ex)
    {
        Thread.Sleep(_CurrentWait);
        _CurrentWait = _CurrentWait == TimeSpan.Zero ? TimeSpan.FromMilliseconds(20) : _CurrentWait + _CurrentWait;
        return _CurrentWait <= _MaxWait;
    }
}

Similarly, in any good SQL-based system we need to handle deadlocks. We can't really plan for these in depth, especially when using NHibernate, other than keeping a strict transaction policy -- no implicit transactions; and be careful with Open-Session-In-View. There are also the cartesian product problem/N+1 selects problem you'd need to keep in mind if you are fetching a lot of data. Instead then, you might have Multi-Query, or HQL's 'fetch' keyword.

internal class SqlServerRetryPolicy : IRetryPolicy
{
    private int _Tries;
    private readonly int _CutOffPoint;

    public SqlServerRetryPolicy(int cutOffPoint)
    {
        if (cutOffPoint < 1) throw new ArgumentOutOfRangeException("cutOffPoint");
        _CutOffPoint = cutOffPoint;
    }

    public bool PerformRetry(SqlException ex)
    {
        if (ex == null) throw new ArgumentNullException("ex");
        // checks the ErrorCode property on the SqlException
        return SqlServerExceptions.IsThisADeadlock(ex) && ++_Tries < _CutOffPoint;
    }
}

A helper class to make the code read better.

internal static class SqlServerExceptions
{
    public static bool IsThisADeadlock(SqlException realException)
    {
        return realException.ErrorCode == 1205;
    }
}

Don't forget to handle network failures in the IConnectionFactory as well (by delegating perhaps through implementing IConnection).


PS: Session-per-request is a broken pattern if you are not only doing reading. Especially if you are doing reading with the same ISession that you are writing with and you are not ordering the reads such that they are all, always, before the writes.

Related Topic