How much abstraction is too much

abstractionlanguage-agnosticoop

In an object-oriented program: How much abstraction is too much? How much is just right?

I have always been a nuts and bolts kind of guy. I understood the concept behind high levels of encapsulation and abstraction, but always felt instinctively that adding too much would just confuse the program.

I always tried to shoot for an amount of abstraction that left no empty classes or layers. And where in doubt, instead of adding a new layer to the hierarchy, I would try and fit something into the existing layers.

However, recently I've been encountering more highly abstracted systems. Systems where everything that could require a representation later in the hierarchy gets one up front. This leads to a lot of empty layers, which at first seems like bad design. However, on second thought I've come to realize that leaving those empty layers gives you more places to hook into in the future without much refactoring. It leaves you greater ability to add new functionality on top of the old without doing nearly as much work to adjust the old.

The two risks of this seem to be that you could get the layers you need wrong. In this case one would wind up still needing to do substantial refactoring to extend the code and would still have a ton of never used layers. But depending on how much time you spend coming up with the initial abstractions, the chance of screwing it up, and the time that could be saved later if you get it right – it may still be worth it to try.

The other risk I can think of is the risk of over doing it and never needing all the extra layers. But is that really so bad? Are extra class layers really so expensive that it is much of a loss if they are never used? The biggest expense and loss here would be time that is lost up front coming up with the layers. But much of that time still might be saved later when one can work with the abstracted code rather than more low-level code.

So when is it too much? At what point do the empty layers and extra "might need" abstractions become overkill? How little is too little? Where's the sweet spot?

Are there any dependable rules of thumb you've found in the course of your career that help you judge the amount of abstraction needed?

Best Answer

The point of abstractions is to factor out common properties from the specific ones, like in the mathematical operation:

ab + ac => a(b + c)

Now you do the same thing with two operations instead of three. This factoring made our expression simpler.

A typical example of an abstraction is the file system. For example, you want your program to be able to write to many kinds of storage devices: pen drives, SD cards, hard drives, etc...

If we didn't have a file system, we would need to implement the direct disk writing logic, the pen drive writing logic and the SD card writing logic. But all of these logics have something in common: they create files and directories, so this common things can be abstracted away, by creating an abstraction layer, and providing an interface to the hardware vendor to do the specific stuff.

The more the things share a common property. The more beneficial an abstraction can be:

ab + ac + ad + ae + af

to:

a(b + c + d + e + f)

This would reduce the 9 operations to 5.

Basically each good abstraction roughly halves the complexity of a system.

You always need at least two things sharing a common property to make an abstraction useful. Of course you tear a single thing apart so it looks like an abstraction, but it does not mean it's useful:

10 => 5 * 2

You cannot define the word "common" if you have only one entity.

So to answer your question. You have enough abstractions if they making your system as simple as possible.

(In my examples, addition connects the parts of the system, while multiplication defines an abstract-concrete relationship.)

Related Solutions

Unix – the fastest way to get the value of π

The Monte Carlo method, as mentioned, applies some great concepts but it is, clearly, not the fastest, not by a long shot, not by any reasonable measure. Also, it all depends on what kind of accuracy you are looking for. The fastest π I know of is the one with the digits hard coded. Looking at Pi and Pi[PDF], there are a lot of formulae.

Here is a method that converges quickly — about 14 digits per iteration. PiFast, the current fastest application, uses this formula with the FFT. I'll just write the formula, since the code is straightforward. This formula was almost found by Ramanujan and discovered by Chudnovsky. It is actually how he calculated several billion digits of the number — so it isn't a method to disregard. The formula will overflow quickly and, since we are dividing factorials, it would be advantageous then to delay such calculations to remove terms.

enter image description here

where,

enter image description here

Below is the Brent–Salamin algorithm. Wikipedia mentions that when a and b are "close enough" then (a + b)² / 4t will be an approximation of π. I'm not sure what "close enough" means, but from my tests, one iteration got 2 digits, two got 7, and three had 15, of course this is with doubles, so it might have an error based on its representation and the true calculation could be more accurate.

let pi_2 iters =
    let rec loop_ a b t p i =
        if i = 0 then a,b,t,p
        else
            let a_n = (a +. b) /. 2.0 
            and b_n = sqrt (a*.b)
            and p_n = 2.0 *. p in
            let t_n = t -. (p *. (a -. a_n) *. (a -. a_n)) in
            loop_ a_n b_n t_n p_n (i - 1)
    in 
    let a,b,t,p = loop_ (1.0) (1.0 /. (sqrt 2.0)) (1.0/.4.0) (1.0) iters in
    (a +. b) *. (a +. b) /. (4.0 *. t)

Lastly, how about some pi golf (800 digits)? 160 characters!

int a=10000,b,c=2800,d,e,f[2801],g;main(){for(;b-c;)f[b++]=a/5;for(;d=0,g=c*2;c-=14,printf("%.4d",e+d/a),e=d%a)for(b=c;d+=f[b]*a,f[b]=d%--g,d/=g--,--b;d*=b);}

What are bitwise shift (bit-shift) operators and how do they work

The bit shifting operators do exactly what their name implies. They shift bits. Here's a brief (or not-so-brief) introduction to the different shift operators.

The Operators

>> is the arithmetic (or signed) right shift operator.
>>> is the logical (or unsigned) right shift operator.
<< is the left shift operator, and meets the needs of both logical and arithmetic shifts.

All of these operators can be applied to integer values (int, long, possibly short and byte or char). In some languages, applying the shift operators to any datatype smaller than int automatically resizes the operand to be an int.

Note that <<< is not an operator, because it would be redundant.

Also note that C and C++ do not distinguish between the right shift operators. They provide only the >> operator, and the right-shifting behavior is implementation defined for signed types. The rest of the answer uses the C# / Java operators.

(In all mainstream C and C++ implementations including GCC and Clang/LLVM, >> on signed types is arithmetic. Some code assumes this, but it isn't something the standard guarantees. It's not undefined, though; the standard requires implementations to define it one way or another. However, left shifts of negative signed numbers is undefined behaviour (signed integer overflow). So unless you need arithmetic right shift, it's usually a good idea to do your bit-shifting with unsigned types.)

Left shift (<<)

Integers are stored, in memory, as a series of bits. For example, the number 6 stored as a 32-bit int would be:

00000000 00000000 00000000 00000110

Shifting this bit pattern to the left one position (6 << 1) would result in the number 12:

00000000 00000000 00000000 00001100

As you can see, the digits have shifted to the left by one position, and the last digit on the right is filled with a zero. You might also note that shifting left is equivalent to multiplication by powers of 2. So 6 << 1 is equivalent to 6 * 2, and 6 << 3 is equivalent to 6 * 8. A good optimizing compiler will replace multiplications with shifts when possible.

Non-circular shifting

Please note that these are not circular shifts. Shifting this value to the left by one position (3,758,096,384 << 1):

11100000 00000000 00000000 00000000

results in 3,221,225,472:

11000000 00000000 00000000 00000000

The digit that gets shifted "off the end" is lost. It does not wrap around.

Logical right shift (>>>)

A logical right shift is the converse to the left shift. Rather than moving bits to the left, they simply move to the right. For example, shifting the number 12:

00000000 00000000 00000000 00001100

to the right by one position (12 >>> 1) will get back our original 6:

00000000 00000000 00000000 00000110

So we see that shifting to the right is equivalent to division by powers of 2.

Lost bits are gone

However, a shift cannot reclaim "lost" bits. For example, if we shift this pattern:

00111000 00000000 00000000 00000110

to the left 4 positions (939,524,102 << 4), we get 2,147,483,744:

10000000 00000000 00000000 01100000

and then shifting back ((939,524,102 << 4) >>> 4) we get 134,217,734:

00001000 00000000 00000000 00000110

We cannot get back our original value once we have lost bits.

Arithmetic right shift (>>)

The arithmetic right shift is exactly like the logical right shift, except instead of padding with zero, it pads with the most significant bit. This is because the most significant bit is the sign bit, or the bit that distinguishes positive and negative numbers. By padding with the most significant bit, the arithmetic right shift is sign-preserving.

For example, if we interpret this bit pattern as a negative number:

10000000 00000000 00000000 01100000

we have the number -2,147,483,552. Shifting this to the right 4 positions with the arithmetic shift (-2,147,483,552 >> 4) would give us:

11111000 00000000 00000000 00000110

or the number -134,217,722.

So we see that we have preserved the sign of our negative numbers by using the arithmetic right shift, rather than the logical right shift. And once again, we see that we are performing division by powers of 2.