Which takes longer time? Switching between the user & kernel modes or switching between two processes

kernelprocess

Which takes longer time?

Switching between the user & kernel modes (or) switching between two processes?

Please explain the reason too.

EDIT : I do know that whenever there is a context switch, it takes some time for the dispatcher to save the status of the previous process in its PCB, and then reload the next process from its corresponding PCB. And for switching between the user and the kernel modes, I know that the mode bit has to be changed. Isn't it all, or is there more to it?

Best Answer

Switching between processes (given you actually switch, not run them in parallel) by an order of oh-my-god.

Trapping from userspace to kernelspace used to be done with a processor interrupt earlier. Around 2005 (don't remember the kernel version), and after a discussion on the mailing list where someone found that trapping was slower (in absolute measures!) on a high-end xeon processor than on an earlier Pentium II or III (again, my memory), they implemented it with a new cpu instruction sysenter (which had actually existed since Pentium Pro I think). This is done in the Virtual Dynamic Shared Object (vdso) page in each process (cat /proc/pid/maps to find it) IIRC.

So, nowadays, a kernel trap is basically just a couple of cpu instructions, hence rather few cycles, compared to tenths or hundreds of thousands when using an interrupt (which is really slow on modern CPU's).

A context switch between processes is heavy. It means storing all processor state (registers, etc) to RAM (at a magic memory location in the user process space actually, guess where!), in practice dirtying all cached memory in the cpu, and reading back the process state for the new process. It will (likely) have nothing still in the cpu cache from last time it ran, so each memory read will be a cache miss, and needed to be read from RAM. This is rather slow. When I was at the university, I "invented" (well, I did come up with the idea, knowing that there is plenty of dye in a CPU, but not enough cool if it's constantly powered) a cache that was infinite size although unpowered when unused (only used on context switches i.e.) in the CPU, and implemented this in Simics. Implemented support for this magic cache I called CARD (Context-switch Active, Run-time Drowsy) in Linux, and benchmarked rather heavily. I found that it could speed-up a Linux machine with lots of heavy processes sharing the same core with about 5%. This was at relatively short (low-latency) process time slices, though.

Anyway. A context switch is still pretty heavy, while a kernel trap is basically free.

Answer to at which memory location in user-space, for each process:

At address zero. Yep, the null pointer! You can't read from this entire page from user-space anyway :) This was back in 2005, but it's probably the same now unless the CPU state information has grown larger than a page size, in which case they might have changed the implementation.

Related Solutions

R – Time Stamp counter (TSC) when switching between Kernel & User mode

As you can read on Wikipedia

With the advent of multi-core/hyperthreaded CPUs, systems with multiple CPUs, and "hibernating" operating systems, the TSC cannot be relied on to provide accurate results. The issue has two components: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. In such cases, programmers can only get reliable results by locking their code to a single CPU. Even then, the CPU speed may change due to power-saving measures taken by the OS or BIOS, or the system may be hibernated and later resumed (resetting the time stamp counter). Reliance on the time stamp counter also reduces portability, as other processors may not have a similar feature. Recent Intel processors include a constant rate TSC (identified by the constant_tsc flag in Linux's /proc/cpuinfo). With these processors the TSC reads at the processor's maximum rate regardless of the actual CPU running rate. While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate. This has the effect of making things seem like they require more processor cycles than they normally would.

Android – How to pause / sleep thread or process in Android

One solution to this problem is to use the Handler.postDelayed() method. Some Google training materials suggest the same solution.

@Override
public void onClick(View v) {
    my_button.setBackgroundResource(R.drawable.icon);

    Handler handler = new Handler(); 
    handler.postDelayed(new Runnable() {
         @Override 
         public void run() { 
              my_button.setBackgroundResource(R.drawable.defaultcard); 
         } 
    }, 2000); 
}

However, some have pointed out that the solution above causes a memory leak because it uses a non-static inner and anonymous class which implicitly holds a reference to its outer class, the activity. This is a problem when the activity context is garbage collected.

A more complex solution that avoids the memory leak subclasses the Handler and Runnable with static inner classes inside the activity since static inner classes do not hold an implicit reference to their outer class:

private static class MyHandler extends Handler {}
private final MyHandler mHandler = new MyHandler();

public static class MyRunnable implements Runnable {
    private final WeakReference<Activity> mActivity;

    public MyRunnable(Activity activity) {
        mActivity = new WeakReference<>(activity);
    }

    @Override
    public void run() {
        Activity activity = mActivity.get();
        if (activity != null) {
            Button btn = (Button) activity.findViewById(R.id.button);
            btn.setBackgroundResource(R.drawable.defaultcard);
        }
    }
}

private MyRunnable mRunnable = new MyRunnable(this);

public void onClick(View view) {
    my_button.setBackgroundResource(R.drawable.icon);

    // Execute the Runnable in 2 seconds
    mHandler.postDelayed(mRunnable, 2000);
}

Note that the Runnable uses a WeakReference to the Activity, which is necessary in a static class that needs access to the UI.

Best Answer

Related Solutions

R – Time Stamp counter (TSC) when switching between Kernel & User mode

Android – How to pause / sleep thread or process in Android

Related Topic