Method extraction vs underlying assumptions

functionsmethods

When I split big methods (or procedures, or functions — this question is not specific to OOP, but since I work in OOP languages 99% of the time, it's the terminology that I'm most comfortable with) into a lot of small ones, I often find myself displeased with the results. It becomes harder to reason about these small methods than when they were just blocks of code in the big one, because when I extract them, I lose a lot of underlying assumptions that come from the context of the caller.

Later, when I look at this code and see individual methods, I don't immediately know where are they called from, and think about them as ordinary private methods that can be called from anywhere in the file. For example, imagine an initialisation method (constructor or otherwise) split into a series of small ones: in the context of method itself, you clearly know that object's state is still invalid, but in an ordinary private method you probably go from assumption that object is already initialised and is in a valid state.

The only solution I've seen for this is the where clause in Haskell, which allows you to define small functions that are used only in the "parent" function. Basically, it looks like this:

len x y = sqrt $ (sq x) + (sq y)
    where sq a = a * a

But other languages I use don't have anything like this — the closest thing is defining a lambda in a local scope, which is probably even more confusing.

So, my question is — do you encounter this, and do you even see this is a problem? If you do, how do you typically solve it, particularly in "mainstream" OOP languages, like Java/C#/C++?

Edit about duplicates: As others noticed, there are already questions discussing splitting methods and small questions that are one-liners. I read them, and they don't discuss the issue of underlying assumptions that can be derived from caller's context (in example above, object being initialised). That's the point of my question, and that's why my question is different.

Update: If you followed this question and discussion underneath, you might enjoy this article by John Carmack on the matter, in particular:

Besides awareness of the actual code being executed, inlining functions also has the benefit of not making it possible to call the function from other places. That sounds ridiculous, but there is a point to it. As a codebase grows over years of use, there will be lots of opportunities to take a shortcut and just call a function that does only the work you think needs to be done. There might be a FullUpdate() function that calls PartialUpdateA(), and PartialUpdateB(), but in some particular case you may realize (or think) that you only need to do PartialUpdateB(), and you are being efficient by avoiding the other work. Lots and lots of bugs stem from this. Most bugs are a result of the execution state not being exactly what you think it is.

Best Answer

For example, imagine an initialisation method split into a series of small ones: in the context of method itself, you clearly know that object's state is still invalid, but in an ordinary private method you probably go from assumption that object is already initialised and is in a valid state. The only solution I've seen for this is...

Your concern is well-founded. There is another solution.

Take a step back. What fundamentally is the purpose of a method? Methods only do one of two things:

Produce a value
Cause an effect

Or, unfortunately, both. I try to avoid methods that do both, but plenty do. Let's say that the effect produced or the value produced is the "result" of the method.

You note that methods are called in a "context". What is that context?

The values of the arguments
The state of the program outside of the method

Essentially what you are pointing out is: the correctness of the result of the method depends on the context in which it is called.

We call the conditions required before a method body begins for the method to produce a correct result its preconditions, and we call the conditions which will be produced after the method body returns its postconditions.

So essentially what you are pointing out is: when I extract a code block into its own method, I am losing contextual information about the preconditions and postconditions.

The solution to this problem is make the preconditions and postconditions explicit in the program. In C#, for instance, you could use Debug.Assert or Code Contracts to express preconditions and postconditions.

For example: I used to work on a compiler which moved through several "stages" of compilation. First the code would be lexed, then parsed, then types would be resolved, then inheritance hierarchies would be checked for cycles, and so on. Every bit of the code was very sensitive to its context; it would be disastrous, for instance, to ask "is this type convertible to that type?" if the graph of base types was not yet known to be acyclic! So therefore every bit of code clearly documented its preconditions. We would assert in the method that checked for type convertibility that we had already passed the "base types acylic" check, and it then became clear to the reader where the method could be called and where it could not be called.

Of course there are lots of ways in which good method design mitigates the problem you've identified:

make methods that are useful for their effects or their value but not both
make methods that are as "pure" as possible; a "pure" method produces a value that depends only on its arguments, and produces no effect. These are the easiest methods to reason about because the "context" they need is very localized.
minimize the amount of mutation that happens in program state; mutations are points where code gets harder to reason about

Related Solutions

Java – What’s the difference between a function and a method

Speaking strictly, a procedure is a subroutine that is executed purely for its side effects (like printing something to the screen) and returns no values. A function is a subroutine that always returns the same value given the same inputs and has no side effects. A method is a procedure or function that is associated with a class or object.

The confusing part is when people use these terms, they're not always referring to the pure definitions. For the sake of convenience and consistency, programming languages don't always make a distinction between functions, procedures, and methods. They have one or two ways to declare a subroutine, and whether it's technically a function, procedure, or method depends on how the programmer is using it.

In Java, for example, a procedure is created by having a void return type on a method. A function is a method with a return type and no side effects, like:

int add(int x, int y) {
    return x + y;
}

People whose only programming experience is in a language like Java often don't even realize there's a difference, because in Java it usually doesn't matter in a practical sense. In a java-only context, programmers often refer to any subroutine as a function, even by those who know the difference, and they mostly go uncorrected except by the very pedantic.

Java Methods – Why Cyclomatic Complexity Matters

The core thing here: "brain capacity".

You see, one of the main functions of code is ... to be read. And code can be easy to read and understand; or hard.

And having a high CC simply implies a lot of "levels" within one method. And that implies: you, as a human reader will have a hard time understanding that method.

When you read source code, your brain automatically tries to put things into perspective: in other words - it tries to create some form of "context".

And when you have a small method (with a good name) that only consists of a few lines, and very low CC; then your brain can easily accept this "block". You read it, you understand it; DONE.

On the other hand, if your code has high CC, your brain will spend many many "cycles" more to deduct what is going on.

Another way of saying that: you should always lean towards preferring a complex network of simple things over a simple network of complex things. Because your brain is better at understanding small things.

Best Answer

Related Solutions

Java – What’s the difference between a function and a method

Java Methods – Why Cyclomatic Complexity Matters

Related Topic