Finite State Machine – How to Store and Validate Data Between State Transitions

Architecturefinite-state machinestatetransition

I'm developing a largeish application that will have custom finite state machines. That is, the admin users of the application will be able to create their own state machines, limited by little pieces of code that are pré built, lets call those tasks.

OK that is all fine and we are reaching a place where we think its going to work.

The big question is, depending on the tasks involved, we will need to force user to input a certain data.

For example, there is a certain fsm that will merge parcels. For that to happen, we need at least two input parcels and some data on the new resulting parcel.

How can we:

  1. Store the data we need, which will be different, depending on the fsm and the tasks it will perform? I was thinking a big json blob.

  2. Validate it, without going insane.

Background: Django, marionette js, Django rest Framework and PostgreSQL.

Async tasks and state changes with celery.

I'm asking this because I dont want to have many many fixed models tied up to a certain validation. That will eventually lead to a high number of tables and it will be harder to maintain. (And that why i suggested a json Field for storing all the data for each state, validate it, and if its all good, transition to the next state).

PS: Sorry for bad formatting. Using my phone without proper keyboard.

Best Answer

I'd go one of two ways depending on the exact implementation details.

(Apologies, I'm not familiar with Django, so my answer speaks generically about the architecture and doesn't have specific suggestions for the language you're using)

1. JSON fields for data using Single Table Inheritance

As you suggested, I'd put the data in JSON(B) fields in Postgres. I'd validate by subclassing (or similar) the Tasks model using Single Table Inheritance (STI).

A quick google suggests that Django might not do STI natively, but you could get a simple enough version working by adding, say, a type column to your tasks table and use that to dynamically load the class that contains your validations for that type of task.

Pros

  • Expressive - your validation rules are as flexible as the language
  • Simple - Easy to get up and running with

Cons

  • Awkward to scale - If you have lots of different types of tasks, this could get out of hand for maintenance. (This can be mitigated by grouping similar tasks to use the same validation class)
  • Developer dependent - any changes to validation or adding new tasks requires a developer

2. Store validation also in JSON

Have a second table storing your task types that uses JSON schema to describe the attributes and validation of a task

Pros

  • Standard Tools - If you use JSON schema there'd be standard libraries you could work with to validate the tasks
  • Easier to Scale - Just add a new kind of task to your task type table

Cons

  • May be a bit slower - A little bit more involved to develop initially
  • Less Expressive - JSON schema is not a programming language, so it's not as expressive for defining rules (but in practice it would be good enough for most cases)