OOP – How to Represent an Algorithm as a Class

language-agnosticobject-oriented

I am trying to understand how to design classes which take an input, do some processing, and return a result. More specifically, should the object store the intermediate results between function calls as state, or should all methods be static, and pass params around.

Example: I am currently working on a class which detects, and removes headers & footer from pages in a document. The algorithm steps are outlined below:

HeaderFooterFilter

removeHeaderFooterText(pages) ->
  candidates = getBoundaryBlocks(pages);
  toRemove = detectRepeatingBlocks(candidates);
  removeBlocks(toRemove);

So my question is, should pages, candidates, toRemove be members of the class, or should the methods be static and pass them around?

Is there a design-pattern which deals with algorithm classes?

Best Answer

Before discussion, notice that

When given a business use case, object-oriented programmers can usually arrive at some consensus about how the surface design (the partitioning of responsibilities between classes, and the publicly accessible methods and properties of these classes) should look like.

On the other hand, when given an algorithm,

  • Object-oriented programmers can still arrive at some consensus about the surface design, based on the inputs, steps (process), and outputs.
  • There is much less consensus about how the algorithm's internals shall be partitioned, because those belongs to implementation detail which the algorithm's user would not be concerned about.
  • Object-oriented design does not prescribe ways to design algorithms. The algorithm's basic framework should be decided first, and then an object-oriented design (encapsulation) can be produced.
  • If the algorithm's basic framework needs to be changed, for example, to exploit a faster algorithm that has a requirement for a different framework, most of that object-oriented design would have to be re-done.
  • Object-oriented algorithm design is an entirely different topic, with an emphasis on fundamental (functional) building blocks that are reusable across many different types of algorithms. A consequence is that there is higher code reusability. The drawback is that it would have required the designers to adopt a different approach to algorithms, which is not typically taught in schools.

As far as object-oriented design is concerned, the breakdown of an algorithm into steps depends on whether the algorithm itself is better seen, from its user's point of view, as being executed in one single step, or as multiple steps.

Keep in mind that the algorithm's user doesn't need to know how the steps are decomposed, unless the user has a need to intervene between the steps, such as:

  • Interrupt the algorithm in between two steps,
  • Inspect the algorithm's data in between two steps,
  • Alter the algorithm's settings in between two steps,
  • etc.

If the algorithm is better seen as a single step, the object-oriented implementation can choose to hide all of the implementation details, leaving only the "inputs, process, outputs" visible to the user. Furthermore, one can provide a factory method or a utility function that takes the input and computes the output. This factory or utility method can make use of the object-oriented implementation.


In general it is better to store the algorithm states (mutable data) in the object itself, and make them private. There are exceptions:

  • Not if doing so increases confusion. For example, if pages means one thing in some parts of the algorithm, and then pages is rewritten with something different in other parts of the algorithm. If this is the case,
    • They should be given different and descriptive names to avoid confusion.
    • If the pages has its own state transitions (like a state machine), that logic should be extracted into its own class.

Class design for algorithms that are parallelized

If it is being used by its users concurrently (being called from multiple threads), the algorithm needs to either freeze (its data becomes immutable after the main computation has finished), or it needs lock-based protection.

If the algorithm itself is parallelized,

  • Mutable states should be partitioned, and each worker should have exclusive access to its piece of mutable state.
  • The algorithm's execution should be partitioned into phases. Once a phase has completed, the phase's output data becomes frozen.
Related Topic