Hardware – Understanding the Levels of Computing

abstractionhardwareoperating systems

Sorry, for my confused question. I'm looking for some pointers.

Up to now I have been working mostly with Java and Python on the application layer and I have only a vague understanding of operating systems and hardware. I want to understand much more about the lower levels of computing, but it gets really overwhelming somehow. At university I took a class about microprogramming, i.e. how processors get hard-wired to implement the ASM codes. Up to now I always thought I wouldn't get more done if learned more about the "low level".

One question I have is: how is it even possible that hardware gets hidden almost completely from the developer? Is it accurate to say that the operating system is a software layer for the hardware? One small example: in programming I have never come across the need to understand what L2 or L3 Cache is. For the typical business application environment one almost never needs to understand assembler and the lower levels of computing, because nowadays there is a technology stack for almost anything. I guess the whole point of these lower levels is to provide an interface to higher levels. On the other hand I wonder how much influence the lower levels can have, for example this whole graphics computing thing.

So, on the other hand, there is this theoretical computer science branch, which works on abstract computing models. However, I also rarely encountered situations, where I found it helpful thinking in the categories of complexity models, proof verification, etc. I sort of know, that there is a complexity class called NP, and that they are kind of impossible to solve for a big number of N. What I'm missing is a reference for a framework to think about these things. It seems to me, that there all kinds of different camps, who rarely interact.

The last few weeks I have been reading about security issues. Here somehow, much of the different layers come together. Attacks and exploits almost always occur on the lower level, so in this case it is necessary to learn about the details of the OSI layers, the inner workings of an OS, etc.

Best Answer

The keyword for thinking about these things is abstraction.

Abstraction just means deliberately ignoring the details of a system so that you can think about it as a single, indivisible component when assembling a larger system out of many subsystems. It is unimaginably powerful - writing a modern application program while considering the details of memory allocation and register spilling and transistor runtimes would be possible in some idealized way, but it is incomparably easier not to think about them and just use high-level operations instead. The modern computing paradigm relies crucially on multiple levels of abstraction: solid-state electronics, microprogramming, machine instructions, high-level programming languages, OS and Web programming APIs, user-programmable frameworks and applications. Virtually no one could comprehend the entire system nowadays, and there isn't even a conceivable path via which we could ever go back to that state of affairs.

The flip side of abstraction is loss of power. By leaving decisions about details to lower levels, we often accept that they may be made with suboptimal efficiency, since the lower levels do not have the 'Big Picture' and can optimize their workings only by local knowledge, and are not as (potentially) intelligent as a human being. (Usually. For isntance, compiling HLL to machine code is nowadays often done better by machines than by even the most knowledgeable human, since processor architecture has become so complicated.)

The issue of security is an interesting one, because flaws and 'leaks' in the abstraction can often be exploited to violate the integrity of a system. When an API postulates that you may call methods A, B, and C, but only if condition X holds, it is easy to forget the condition and be unprepared for the fallout that happens when the condition is violated. For instance, the classical buffer overflow exploits the fact that writing to memory cells yields undefined behaviour unless you have allocated this particular block of memory yourself. The API only guarantees that something will happen as a result, but in practice the result is defined by the details of the system at the next lower level - which we have deliberately forgotten about! As long as we fulfill the condition, this is of no importance, but if not, an attacker who understands both levels intimately can usually direct the behaviour of the entire system as desired and cause bad things to happen.

The case of memory allocation bugs is particularly bad because it has turned out to be really, really hard to manage memory manually without a single error in a large system. This could be seen as a failed case of abstraction: although it is possible to do everything you need with the C malloc API, it is simply to easy to abuse. Parts of the programming community now think that this was the wrong place at which to introduce a level boundary into the system, and instead promote languages with automatic memory management and garbage collection, which loses some power, but provides protection against memory corruption and undefined behaviour. In fact, a major reason for still using C++ nowadays is precisely the fact that it allows you to control exactly what resources are acquired and released when. In this way, the major schism between managed and unmanaged languages today can be seen as a disagreement about where precisely to define a layer of abstraction.

The same can be said for many other major alternative paradigms in computing - the issue really crops up all the time where large systems have to be constructed, because we are simply unable to engineer solutions from scratch for the complex requirements common today. (A common viewpoint in AI these days is that the human brain actually does work like that - behaviour arising through feedback loops, massively interconnected networks etc. instead of separate modules and layers with simple, abstracted interfaces between them, and that this is why we have had so little success in simulating our own intelligence.)

Related Topic