Object-Oriented – Impact of Methodology Shift on System Performance

dependency-injectionobject-orientedperformancerefactoring

TD;DR:

There was some confusion as to what I was asking, so here is the driving idea behind the question:

I always intended the question to be what it is. I may not have articulated it well originally. But intent have always been "is modular, separated, loose coupled, decoupled, refactored code" markedly slower by its own nature than "monolithic single-unit, do-everything in one place, one file, tightly coupled" code. The rest is just details and various manifestations of this that I came across then or now or will later. It is slower for sure on some scale. Like a non-defragged disk, you have to pick up the pieces from everywhere. It's slower. For sure. But should I care?

And the question is not about…

not about micro-optimization, premature optimization, etc. It is not about "optimize this or that part to death".

What is it then?

It is about the overall methodology and techniques and ways of thinking about writing code that emerged over time:

  • "inject this code into your class as a dependency"
  • "write one file per class"
  • "separate your view from your database, controller, domain".
  • don't write spaghetti homogenious single codeblock, but write many separate modular components that work together

It is about the way and the style of code that is currently – within this decade – seen and advocated in most frameworks, advocated at conventions, passed on via the community. It is a shift in thinking from 'monolithic blocks' to 'microservices'. And with that comes the price in terms of machine-level performance and overhead, and some programmer-level overhead as well.

Original Question follows:

In Computer Science field, I have noticed a notable shift in thinking when it comes to programming. I come across the advice quite often that goes like this:

  • write smaller function-wise code (more testable and maintainable this way)
  • refactor existing code into smaller and smaller chunks of code until most of your methods/functions are just a few lines long and it is clear what is their purpose (which creates more functions, compared to a larger monolithic block)
  • write functions that only do one thing – separation of concerns, etc (which usually creates more functions and more frames on a stack)
  • create more files (one class per file, more classes for decomposition purposes, for layer purposes such as MVC, domain architecture, design patterns, OO, etc, which creates more file system calls)

This is a change compared to the "old" or "outdated" or "spaghetti" coding practices where you have methods spanning 2500 lines, and big classes and god objects doing everything.

My question is this:

when it call comes down to machine code, to 1s and 0s, to assembly instructions, to HDD platters, should I be at all concerned that my perfectly class-separated OO code with variety of refactored small-to-tiny functions and methods generates too much extra overhead?

Details

While I am not intimately familiar with how OO code and its method calls are handled in ASM in the end, and how DB calls and compiler calls translate to moving actuator arm on a HDD platter, I do have some idea. I assume that each extra function call, object call, or "#include" call (in some languages), generate an extra set of instructions, thereby increasing code's volume and adding various "code wiring" overheads, without adding actual "useful" code. I also imagine that good optimizations can be done to ASM before it is actually ran on the hardware, but that optimization can only do so much too.

Hence, my question — how much overhead (in space and speed) does well-separated code (code that is split up across hundreds of files, classes, and design patterns, etc) actually introduce compared to having "one big method that contains everything in one monolithic file", due to this overhead?

UPDATE for clarity:

I am assuming that taking the same code and splitting it, refactoring it out, decoupling it into more and more functions and objects and methods and classes will result in more and more parameter passing between smaller code pieces. Because for sure, refactoring code has to keep the thread going, and that requires parameter passing. More methods or more classes or more Factory Methods design patterns, results in more overhead of passing various bits of information more than it is the case in a single monolithic class or method.

It was said somewhere (quote TBD) that up to 70% of all code is made up of ASM's MOV instruction – loading CPU registers with proper variables, not the actual computation being done. In my case, you load up CPU's time with PUSH/POP instructions to provide linkage and parameter passing between various pieces of code. The smaller you make your pieces of code, the more overhead "linkage" is required. I am concerned that this linkage adds to software bloat and slow-down and I am wondering if I should be concerned about this, and how much, if any at all, because current and future generations of programmers who are building software for the next century, will have to live with and consume software built using these practices.

UPDATE: Multiple files

I am writing new code now that is slowly replacing old code. In particular I've noted that one of the old classes was a ~3000 line file (as mentioned earlier). Now it is becoming a set of 15-20 files located across various directories, including test files and not including PHP framework I am using to bind some things together. More files are coming as well. When it comes to disk I/O, loading multiple files is slower than loading one large file. Of course not all files are loaded, they are loaded as needed, and disk caching and memory caching options exist, and yet still I believe that loading multiple files takes more processing than loading a single file into memory. I am adding that to my concern.

UPDATE: Dependency Inject everything

Coming back to this after a while.. I think my question was misunderstood. Or maybe I chose to misunderstand some answers. I am not talking about micro-optimizing as some answers have singled out, (at least I think calling what I am talking about micro-optimization is a misnomer) but about the movement of "Refactor code to loosen tight coupling", as a whole, at every level of the code. I came from Zend Con just recently where this style of code has been one of the core points and centerpieces of the convention. Decouple logic from view, view from model, model from database, and if you can, decouple data from the database. Dependency-Inject everything, which sometimes means just adding wiring code (functions, classes, boilerplate) that does nothing, but serves as a seam/hook point, easily doubling code size in most cases.

UPDATE 2: Does "separating code into more files" significantly affect performance (at all levels of computing)

How does philosophy of compartmentalize your code into multiple files impact today's computing (performance, disk utilization, memory management, CPU processing tasks)?

I am talking about

Before…

In a hypothetical yet quite real not so distant past, you could easily write one mono-block of a file that does has model and view and controller spaghetti or not-spaghetti-coded, but that runs everything once it is already loaded. Doing some benchmarks in the past using C code I found out that it is MUCH faster to load a single 900Mb file into memory and process it in large chunks than it is to load a bunch of smaller files and process them in a smaller peace-meal chunks doing the same work in the end.

.. And Now*

Today I find myself looking at code that shows a ledger, that has features like .. if an item is an "order", show order HTML block. If a line item can be copied, print HTML block that displays an icon and HTML parameters behind it allowing you to make the copy. If the item can be moved up or down, display the appropriate HTML arrows. Etc. I can, through Zend Framework create partial() calls, which essentially means "call a function that takes your parameters and inserts them into a separate HTML file that it also calls". Depending on how detailed I want to get, I can create separate HTML functions for the tiniest parts of the ledger. One for arrow up, arrow down, one for "can I copy this item", etc. Easily creating several files just to display a small part of the webpage. Taking my code and behind-the-scenes Zend Framework code, the system/stack probably calls close to 20-30 different files.

What?

I am interested in aspects, the wear and tear on the machine that is created by compartmentalizing code into many smaller separate files.

For example, loading more files means having them located in various places of the file system, and in various places of physical HDD, which means more HDD seek and read time.

For CPU it probably means more context switching and loading various registers.

In this sub-block (update #2) I am interested more strictly in how using multiple files to do the same tasks that could be done in a single file, affect system performance.

Using Zend Form API vs simple HTML

I used Zend Form API with latest and greatest modern OO practices, to build an HTML form with validation, transforming POST into domain objects.

It took me 35 files to make it.

35 files = 
    = 10 fieldsets x {programmatic fieldset + fieldset manager + view template} 
    + a few supporting files

All of which could be replaced with a a few simple HTML + PHP + JS + CSS files, perhaps total of 4 light-weight files.

Is it better? Is it worth? … Imagine loading 35 files + numerous Zend Zramework library files that make them work, vs 4 simple files.

Best Answer

My question is this: when it call comes down to machine code, to 1s and 0s, to assembly instructions, should I be at all concerned that my class-separated code with variety of small-to-tiny functions generates too much extra overhead?

MY answer is yes, you should. Not because you have lots of little functions (once upon a time the overhead of calling functions was reasonably significant and you could slow your program down by making a million little calls in loops, but today compilers will inline them for you and what's left is taken care of by the CPU fancy prediction algorithms, so don't worry about that) but because you will introduce the concept of layering too much into your programs when the functionality is too small to sensibly grok in your head. If you have larger components you can be reasonably sure they are not performing the same work over and over, but you can make your program so minutely granular that you may find yourself unable to really understand the call paths, and in that end up with something that barely works (and is barely maintainable).

For example, I worked at a place that showed me a reference project for a web service with 1 method. The project comprised 32 .cs files - for a single web service! I figured this was way too much complexity, even though each part was tiny and easily understood by itself, when it came to describing the overall system, I quickly found myself having to trace through calls just to see what the hell it was doing (there were also too many abstractions involved,as you'd expect). My replacement webservice was 4 .cs files.

i didn't measure performance as I figure it would have been roughly the same all in all, but I can guarantee mine was significantly cheaper to maintain. When everyone talks of programmer time being more important than CPU time, then create complex monsters that cost lots of programmer time in both dev and maintenance you have to wonder that they are making excuses for bad behaviour.

It was said somewhere (quote TBD) that up to 70% of all code is made up of ASM's MOV instruction - loading CPU registers with proper variables, not the actual computation being done.

That is what CPUs do though, they move bits from memory to registers, add or subtract them, and then put them back into memory. All computing boils down to pretty much that. Mind you, I once had a very multi-threaded program that spent most of its time context switching (ie saving and restoring register state of threads) than it did working on the thread code. A simple lock in the wrong place truly screwed performance there, and it was such an innocuous bit of code too.

So my advice is : find a sensible middle ground between either extreme that make your code look good to other humans, and test the system to see if it performs well. Use the OS features to make sure its running as you'd expect with CPU, memory, disk and network IO.