I am trying to understand how to design classes which take an input, do some processing, and return a result. More specifically, should the object store the intermediate results between function calls as state, or should all methods be static, and pass params around.
Example: I am currently working on a class which detects, and removes headers & footer from pages in a document. The algorithm steps are outlined below:
HeaderFooterFilter
removeHeaderFooterText(pages) ->
candidates = getBoundaryBlocks(pages);
toRemove = detectRepeatingBlocks(candidates);
removeBlocks(toRemove);
So my question is, should pages
, candidates
, toRemove
be members of the class, or should the methods be static and pass them around?
Is there a design-pattern which deals with algorithm classes?
Best Answer
Before discussion, notice that
When given a business use case, object-oriented programmers can usually arrive at some consensus about how the surface design (the partitioning of responsibilities between classes, and the publicly accessible methods and properties of these classes) should look like.
On the other hand, when given an algorithm,
As far as object-oriented design is concerned, the breakdown of an algorithm into steps depends on whether the algorithm itself is better seen, from its user's point of view, as being executed in one single step, or as multiple steps.
Keep in mind that the algorithm's user doesn't need to know how the steps are decomposed, unless the user has a need to intervene between the steps, such as:
If the algorithm is better seen as a single step, the object-oriented implementation can choose to hide all of the implementation details, leaving only the "inputs, process, outputs" visible to the user. Furthermore, one can provide a factory method or a utility function that takes the input and computes the output. This factory or utility method can make use of the object-oriented implementation.
In general it is better to store the algorithm states (mutable data) in the object itself, and make them private. There are exceptions:
pages
means one thing in some parts of the algorithm, and thenpages
is rewritten with something different in other parts of the algorithm. If this is the case,pages
has its own state transitions (like a state machine), that logic should be extracted into its own class.Class design for algorithms that are parallelized
If it is being used by its users concurrently (being called from multiple threads), the algorithm needs to either freeze (its data becomes immutable after the main computation has finished), or it needs lock-based protection.
If the algorithm itself is parallelized,