Python Workflow Design Pattern

pythonworkflow

I'm working on a piece of software design, and I'm stuck between not having any idea what I'm doing, and feeling like I'm reinventing the wheel.

My situation is the following: I am designing a scientific utility with an interactive UI. User input should trigger visual feedback (duh), some of it directly, i.e. editing a domain geometry, and some of it as soon as possible, without blocking user interaction, say, solving some PDE over said domain.

If I draw out a diagram of all operations I need to perform, I get this rather awesomely dense graph, exposing all kinds of opportunities for parallelism and caching/reuse of partial results. So what I want is primarily to exploit this parallelism in a transparent way (selected subtasks executing in seperate processes, results outmatically being 'joined' by downstream tasks waiting for all their inputs to be ready), plus only needing to recompute those input branches that actually have their input changed

pyutilib.workflow seems to come closest to being what I'm looking for, except of course that it isn't (doesn't seem to do any subprocessing to begin with). That seems rather disappointing; while I'm not a software engineer, id say I'm not asking for anything crazy here.

Another complicating factor is the tight user-interface integration I desire, which other scientific-workflow-solutions seem not designed to handle. For instance, I would like to pass a drag-and-drop event through a transformation node for further processing. The transformation node has two inputs; an affine transform state input port, and a pointset class that knows what to do with it. If the affine transform input port is 'dirty' (waiting for its dependencies to update), the event should be held up until it becomes available. But when the event has passed the node, the eventinput port should be marked as handled, so it does not refire when the affine transform changes due to further user input. That's just an example of one of the many issues that come up that I don't see being adressed anywhere. Or what to do when a long-running forking-joining branch receives new input while it is in the middle of crunching a previous input.

So my question: Do you happen to know of some good books/articles on workflow design patterns that I should read? Or am I trying to fit a square peg into a round hole, and you know of a completely different design pattern that I should know about? Or a python package that does what I want it to, regardless of the buzzwords it comes dressed up in?

Ive rolled by own solution on top of enthought.traits, but I'm not perfectly happy with that either, as it feels like a rough and shoddy reinvention of the wheel. Except that I cant seem to find any wheels anywhere on the internet.

NOTE: I'm not looking for webframeworks, graphical workflow designers, or any special-purpose tools. Just something conceptually like pyutilib.workflow, but including documentation and a featureset that I can work with.

#
#
#
EDIT: this is where I'm at after more reading and reflection on the issue:
#
#
#

The requirements one can tack onto a 'workflow architecture' are too diverse for there to be a single shoe that fits all. Do you want tight integration with disk storage, tight integration with web frameworks, asynchronicity, mix in custom finite state machine logic for task dispatch? They are all valid requirements, and they are largely incompatible, or make for senseless mixes.

However, not all is lost. Looking for a generic workflow system to solve an arbitrary problem is like looking for a generic iterator to solve your custom iteration problem. Iterators are not primarily about reusability; you cant reuse your red-black-tree iterator to iterate over your tensor. Their strength lies in a clean separation of concerns, and definition of a uniform interface.

What I'm looking for (and have started writing myself; its going to be pretty cool) will look like this: at its base is a general implementation-agnostic workflow-declaration mini-language, based on decorators and some meta-magic, to transform a statement like the below into a workflow declaration containing all required information:

@composite_task(inputs(x=Int), outputs(z=Float))
class mycompositetask:
    @task(inputs(x=Int), outputs(y=Float))
    def mytask1(x):
        return outputs( y = x*2 )
    @task(inputs(x=Int, y=Float), outputs(z=Float))
    def mytask2(x, y):
        return outputs( z = x+y )
    mytask1.y = mytask2.y   #redundant, but for illustration; inputs/outputs matching in name and metadata autoconnect

What the decorators return is a task/compositetask/workflow declaration class. Instead of just type constraints, other metadata required for the workflow-type at hand is easily added to the syntax.

Now this concise and pythonic declaration can be fed into a workflow instance factory that returns the actual workflow instance. This declaration language is fairly general and probably need not change much between different design requirements, but such a workflow instantiation factory is entirely up to your design requirements/imagination, aside from a common interface for delivering/retrieving input/output.

In its simplest incarnation, wed have something like:

wf   = workflow_factory(mycompositetask)
wf.z = lambda result: print result   #register callback on z-output socket
wf.x = 1    #feed data into x input-socket

where wf is a trivial workflow instance, which does nothing but chain all contained function bodies together on the same thread, once all inputs are bound. A quite verbose way to chain two functions, but it illustrates the idea, and it already achieves the goal of separating the concern of keeping the definition of the flow of information in a central place rather than spread all throughout classes that would rather have nothing to do with it.

That's more or less the functionality I've got implemented so far, but it means I can go on working on my project, and in due time ill add support for fancier workflow instance factories. For instance, I'm thinking of analyzing the graph of dependencies to identify forks and joins, and tracking the activity generated by each input supplied on the workflow-instance level, for elegant load balancing and cancellation of the effects of specific inputs that have lost their relevance but are still hogging resources.

Either way, I think the project of separating workflow declaration, interface definition, and implementation of instantiation is a worthwhile effort. Once I have a few nontrivial types of workflow instances working well (I need at least two for the project I'm working on, I've realized*), I hope to find the time to publish this as a public project, because despite the diversity of design requirements in workflow systems, having this groundwork covered makes implementing your own specific requirements a lot simpler. And instead of a single bloated workflow framework, a swiss army knife of easily switched-out custom solutions could grow around such a core.

*realizing that I need to split my code over two different workflow instance types rather than trying to bash all my design requirements into one solution, turned the square peg and round hole I had in my mind into two perfectly complementary holes and pegs.

Best Answer

I believe that you are both, right and wrong, in doubt of re-inventing the wheel. Maybe different levels of thinking gives you a hint here.

How to eat an elephant?

Level A: software design

At that level, you would want to stick to the best practice that no long operations are done in the UI (and UI thread). You need an UI layer that focuses only on gathering input (including cancellation) and drawing (including in-progress-visualization like progress-bar or hour-glass). This layer should be separated from anything else as dusk and dawn. Any call outside of this layer must be fast if you want intuitiveness and responsiveness.

In tasks as complex as yours, the calls outside of the UI layer are typically:

  1. Schecule some work - the command should be queued to smart layer for it to pick up whenever it gets to it.
  2. Read results - the results should be queued in the smart layer so they could just be "popped out" and rendered.
  3. Cancel/stop/exit - just raise a flag. Smart layer should check this flag now and then.

Don't worry too much that some user operations are getting reaction too slowly - if you have a solid design core then you can adjust priorities of the user input later on. Or add a short-term hour-glass or similar. Or even cancel all long operations that get obsolete after a specific user input.

Level B: the heavy-lifting smart layer

There is no "best" framework for "any" kind of hard work.

So I'd suggest you to design the feeding (by UI) of this layer as simple as possible with no frameworks involved.

Internally, you can implement it using some frameworks but you will have the ability in future to redesign the hard-working elements as needed. For example, in future you could:

  • give some math to GPU
  • share tasks to server-farms
  • involve cloud computing

For complex tasks, picking a framework at the top level of design might prove as an obstacle in the future. Specifically, it may limit your freedom of applying other technologies.

It's hard to tell for sure but it seems to me that you don't have a silver bullet framework for your task. So you should find strong tools (e.g. threads and queues) to implement good design practices(e.g. decoupling).

EDIT as a response to your edit

Your latest edit stresses perfectly the hard challenges that a software designer meets. For your case, the acceptance that there is no silver bullet. I'd suggest you to accept it sooner - better than later...

The hint resides in that you offered the most generic task to be defined by Int's and Float's. This could make you happy for today but it will fail tomorrow. Exactly as locking-in to a super-abstract framework.

The path is right - to have a heavy-lifting "task" base in your design. But it should not define Int or Float. Focus on the above mentioned "start", "read" and "stop" instead. If you don't see the size of the elephant that you are eating then you might fail eating it and end up starving :)

From Level A - design perspective - you could define task to contain something like this:

class AnySuperPowerfulTask:
    def run():
        scheduleForAThreadToRunMe()
    def cancel():
        doTheSmartCancellationSoNobodyWouldCrash()

This gives you the basis - neutral, yet clean and decoupled by Level A (desing) perspective.

However, you would need some kind of setting up the task and getting the real result, right? Sure, that would fall into Level B of thinking. It would be specific to a task (or to a group of tasks implemented as an intermediate base). Final task could be something along these lines:

class CalculatePossibilitiesToSaveAllThePandas(SuperPowerfulTask):
    def __init__(someInt, someFloat, anotherURL, configPath):
        anythingSpecificToThisKindOfTasks()
    def getResults():
        return partiallyCalculated4DimensionalEvolutionGraphOfPandasInOptimisticEnvoronment()

(The samples are intentionally incorrect by python in order to focus on the design, not syntax).

Level C - abstraction-nirvana

It looks like this level should be mentioned in this post.

Yes, there is such a pitfall that many good designers can confirm. The state where you could endlessly (and without any results) search for a "generic solution", i.e. the silver bullet). I suggest you to take a peek into this and then get out fast before it's too late ;) Falling into this pitfall is no shame - it's a normal development stage of the greatest designers. At least I'm trying to believe so :)

EDIT 2

You said: "Im working on a piece of software design, and im stuck between not having any idea what im doing, and feeling like im reinventing the wheel."

Any software designer can get stuck. Maybe next level of thinking might help you out. Here it comes:

Level D - I'm stuck

Suggestion. Leave the building. Walk into the Cafe next corner, order the best coffee and sit down. Ask yourself the question "What do I need?". Note, that it is different from the question "What do I want?". Ping it until you have eliminated wrong answers and start observing the correct ones:

Wrong answers:

  1. I need a framework that would do X, Y and Z.
  2. I need a screwdriver that could run 200mph and harvest forest in a farm nearby.
  3. I need an amazing internal structure that my user will actually never see.

Right answers (forgive me if I understood your problem wrong):

  1. I need user to be able to give input to the software.
  2. I need user to see that the calculations are in progress.
  3. I need user to visually see the result of calculations.