Please understand that with Keil on 8051 cores that the subroutine's local variables and any entry arguments beyond what would fit into registers the compiler allocates RAM for the placement of these. The linker then has a complicated job that it does to analyze the full calling tree structure of your program and optimize the pool of memory allocated for these purposes so as to minimize the amount of RAM required. This minimization uses mutual exclusion between code calling sequences to determine what parts of the memory pool can be shared between code paths that do not overlap in the execution flow.
When you inject an ISR call into the mix of the code calling tree the linker cannot determine from what point the interrupt may come into play. It could come at any time during any one of the mutual exclusion paths that the linker found in the normal mainline code. This means that the linker has to pull the subroutine out of the shared usage pool and allocate data space specifically for the subroutine local variables and maybe even some of its entry arguments. Without this the shared memory pools from other mutually exclusive code execution paths would get polluted and the program would crash in blazing glory.
The challenge when coding on an 8051 core is try to steer away from designs that use lots of local variables, have lots of entry arguments and have calling contexts that are shared between mainline code and interrupts. You really have to limit the freedoms that you get with other architectures that have a stack that gets used for subroutine argument passing and local variables. Keil chose to not implement everything on the stack and thus produced a tool set that offers performance and optimization that is first class in the industry for 8051 architecture. And for that reason it is important to adapt a coding style that plays into the Keil scheme.
In your case I really recommend that you reconsider your determination that this subroutine has to be called from both the mainline context and the interrupt context. Try to find a design technique that avoids this and your life will be a lot simpler.
You mix two independent (orthogonal) ideas in digital circuits theory: asynchronous circuits and multi-core processors.
Asynchronous circuits: circuits which have more than one clock, and the clocks are asynchronous (i.e. have non-constant and unpredictable phase relationship).
Some circuits may use two clocks (for example), but one is just a division by 2 of the other. These circuits are not asynchronous because there is known phase relationship between the two clocks, although the frequencies of the clocks are different.
You may have a single core CPU having few asynchronous clocks, and a multi-core CPU with all its cores running on the same clock (the latter is just an imaginary CPU - all real multi-core CPUs have many clocks which consist several mutually-asynchronous clock sets).
Asynchronous circuits is a major topic in digital design. The above explanation is basic.
Multi-core CPUs: few microprocessors (cores) connected in parallel which employ sophisticated hardware and software in order to achieve high performance.
The usual practice is to make the cores as independent as possible in terms of clocks/power/execution/etc. This allows dynamic (at run time) adjustment of CPUs activity (i.e. consumed power) to the actual needs of the system.
My impression is that what you're looking for is an explanation about multi-core CPUs, not asynchronous circuits.
This topic is much, much bigger than anything one can put in the answer.
The answers to your questions, though:
- The clocks used by different cores (to my best knowledge) have the same sources (can be more than one: crystal, VCO, ...). Each core (usually) has few mutually-asynchronous clock sets. Each core has dedicated clock gating and throttling logic which allow to turn-off or slow the clock, independently for each core. Again, if you're interested only in algorithmic aspect of cores' parallelism - forget about clocks (for now).
- You have just indicated the main aspect of cores' parallelism - how do you run multiple cores in parallel efficiently. This topic is huge, and contains both HW and SW solutions. From HW perspective, cores both modify a common memory and exchange control and status signals with sequencing logic and between themselves. The picture complicates a lot due to existence of caches - I'd suggest that you start from reading on caches, then cache coherency, and only then on cashes in multi-cores systems.
Hope this helps.
Best Answer
The answer is that the inbound Interrupt does not connect directly to any core in a multicore architecture (given your question asks about Intel and ARM).
For the Intel CPU architecture models (I don't work on ARM), when first powered up there is no mapping configured so all interrupts (and indeed boot code) runs on processor zero. Once virtualization is initialized then the rules change drastically.
Interrupts arrive at an I/O processing unit and are 'mapped' in hardware to the required processor. That map definition could map all interrupts to a single core (which then could move an interrupt to a virtualized ISR on another core), or map various interrupts directly to certain cores.
A good overall example is Intel VTD on i7 where interrupts are handled by the Northbridge implementation:
A good document to start with is this which walks through the mapping of both interrupts and DMA for the i7.
Depending on what software is running (RTOS vs Virtualization kernel) the mapping of interrupts will vary.
An excellent paper on an RTOS implementation in Linux is here.