Forking – Is Recursive Forking an Anti-Pattern?

forking

I'm starting to tinker with process forking, just to get a feel for it. I thought I'd write an example where work is distributed among child processes and the results handed back to the parent; this way, it could be run in parallel on several processors.

In my first attempt, I'm forking in a loop, which means that each time through, all the existing processes (the original, children, grandchildren, etc) produce another fork.

This is causing me problems (memory consumption, rapid growth of the number of processes, keeping track of the pipes to hand data back up the chain, etc), and I'm wondering if I've stumbled on an anti-pattern. I could simply fork N children directly from the parent.

I realize that it's common to fork and exec multiple times; for instance, terminal forks a shell, and the shell forks an editor. But is it considered bad practice to fork "recursively" when the children are all of the same type?

Best Answer

Uncontrolled forking is definitely an anti-pattern. As you have discovered, it is relatively easy to consume all available resources, and then continue to consume them to the point where everything grinds to a complete standstill.

Just ask the Sorcerer's apprentice.

Like any problem involving recursion, you have to clearly understand your terminating condition. If your algorithm potentially never terminates, recursion is completely unsuitable since you have no control of how deep your stack can become.

Of course, some tasks may "discover" other tasks, so it may also never be possible to launch everything directly from the parent process.

One strategy often used to simplify a solution is to create some kind of task manager process, and have it consume a queue of pending tasks. The master task, or any children, can add tasks to the queue as they find them. The task manager then forks as many processes as it needs to in order to process the tasks.

In this way, the task manager can keep a count of the total number of running processes, and only fork a new process if it thinks it necessary and the hardware can cope with it.

There is a risk of blocking with this, of course: some task needs the result of another task before it can continue, yet that other task is in the queue and will never be run because all the currently executing tasks are blocking waiting for tasks that haven't been started yet...

You can fix this by having separate tasks whose job is to consume results, and managing these through a separate task manager. That way, the calculations can never become completely blocked: something is always going to be making progress.

Related Topic