You basically have two options here:
- A current-generation 486-based CPU
- Upside:
- Everything will likely be integrated into the main CPU - No southbridge, IO controllers, etc...
- you can actually buy the ICs off the shelf.
- Downside:
- Devices may be BGA
- few exposed busses
- may want to run at a few hundred Mhz.
- By "Currently Produced", I mean something like a AMD Geode, or similar device from VIA, etc... There is a pretty big market for small, low power, low speed x86 CPUs in embedded applications.
- A old-stock or old-design 386/486 CPU
- Upside:
- Probably more educational
- Downsides:
- Requires a LOT of devices (e.g. Southbridge, a UART, etc...), all of which are additional ICs.
- Note: the external devices will vary depending on which 386/486 you decide on. The early ones had few integrated peripherals. Later on, many of the peripherals got integrated into the CPU itself.
There are some midpoint-devices, like the 386EX, which is a 386 intended for embedded applications. It's old enough that it's available in TQFP-144 (released in 1994), yet it includes most of the necessary peripherals on-die.
Datasheet
Some resources, off the top of my head:
A FOSS BIOS alternative.
Interesting forum thread about building a IBM XT compatible computer.
Other stuff:
Dieter's Homepage
Some nutjob who build a discrete Transistor CPU!
He also has a bunch of other homemade CPU projects
Really, If I were you, I would go with an ARM device. You can get big ARM CPUs that have MMUs, and will run linux fine.
Alternatively, an 8088 or 8086 may be significantly more approachable. There is lots of information about people homebrewing 8088 computers out there.
Of course to properly look at this we must know what it means to "Natively" execute anything. On the surface this seems like an easy question, but it isn't. Let me elaborate.
But first, let me say that I am massively simplifying this description! There is no way I can explain this in a reasonable number of words without some over-arching generalizations and simplifications. Deal with it.
Let's start with a bit-slice processor (BSP) design. These are the easiest of processors to design, the hardest to program for, the smallest in terms of logic size, and the worst in terms of code-density. Essentially, an instruction word in a bit-slice processor never goes through an instruction decode step. The instruction word is somewhat pre-decoded. The individual bits of the instruction goes directly to latches, muxes, ALUs, etc inside the processor. Consequently the instruction word can be very large. Instructions larger than 256 bits is not uncommon! Normal BSP's are purpose built for a single task and are not general purpose CPU's. While BSP's sound somewhat exotic, they are used all over the place but are so deeply embedded that you probably don't notice.
One step up from a BSP is a RISC CPU. The overall data flow is changed to be more general purpose, and an instruction decode stage is added to the pipeline. Inside the RISC CPU there is still a giant instuction word, like the BSP, except that the instruction decode is used to convert the 32-bit instruction into that giant instruction word. Fundamentally this instruction decode is like a giant look up table that converts the 32-bit instruction to the giant instruction word used in the BSP. It is not literally a giant look up table, but that is what it effectively is. This instruction decode limits what the instructions can do, but greatly simplifies programming and is what turns this thing into a general purpose CPU.
Next step up we get to a CISC CPU. The main difference is that the instruction decode becomes more complex. Instead of the ID being just a huge lookup table, the ID converts the 32-bit instruction into a series of BSP-like instructions. You can really think of each 32-bit instruction and being a small subroutine call inside a BSP.
Next, you have assembly language. This is the ASCII text that you write that gets converted into those 32-bit instructions by the assembler and linker. While this is the lowest level of programming that a human might do, there is not always a one to one relationship between what the human writes and what the CPU executes. Even here the assembler is doing some level of interpreting and manipulating of the final instructions. For example, MIPS assemblers will rearrange or add instructions to deal with pipeline hazards. I'm sure other assemblers will do something similar.
Then you have a fully interpreted language. In this language, the interpreter has to parse the ASCII of each line or command every time that line is executed. This is what most scripting languages do.
There are also fully compiled languages, like C/C++, in which a compiler takes the ASCII source code and converts it into assembly language (or sometimes directly into the normal 32-bit opcodes).
Between interpreted and compiled languages there is "tokenized languages". These are most like interpreted languages, but the ASCII source code is parsed only once. The net effect is that the execution speed is much quicker and a fully interpreted language, but you still have the flexibility of an interpreted language and don't have the compile time of a compiled language. The term "tokenized" is used because the code is pre-parsed, or tokenized, into something that is easier to deal with than straight ASCII. Java is a good example of a tokenized language.
There have also been "BASIC CPUs", essentially these are CPU's that have a BASIC interpreter built into them. They are a normal MCU where the Flash EPROM contains a BASIC interpreter as well as the pre-tokenized BASIC program.
So, back to the question: What does it mean to natively execute a program? Does the program have to be down to the BSP level to be native? If so then almost nothing is native. What about the 32-bit instruction level? Ok, that's what most would call native since that is what the "CPU block" is given to execute. Normally anything ASCII is not "native" since some level of interpretation needs to be done before it can be executed. How about those BASIC MCU's? Do they natively execute BASIC? Probably not.
But let's look more at those BASIC MCU's. The BASIC interpreter is stored in the Flash EPROM and is made up of those MCU's standard opcodes. But what if the interpreter was actually part of a CISC CPU's instruction decode? Instead of the instruction decode running some subroutine for an "Multiple and ADD with Saturation" instruction, it ran a subroutine for "let X=5 + y". Would that CPU then be said to execute BASIC natively? I would!
But let's look at the C language specifically. And let's assume some crazy CISC processor that would interpret ASCII C source code directly. As you look at the tasks of managing files, parsing ASCII, and managing variables you notice two things: Either the BSP at the core of our C-CPU becomes absolutely huge and unmanageable or the BSP starts to look like what any other modern CPU has. But if the BSP looks similar to other CPU's then the instruction decode must do all the hard work, which it is not well suited for either.
What you end up with if you follow this to it's natural conclusion is something that looks like a normal RISC or CISC CPU that has a C Interpreter already programmed into it's Flash EPROM. Exactly like those Basic MCU's I mentioned before!
The net result is that a CPU that runs C "natively" is not useful-- even as an educational project. I could go on and on, but I'm almost late for a meeting now. Enjoy!
Best Answer
The problem is that in order to understand how the computer gets to 1 + 2 = 3 you have to understand about 2 levels deeper than you've gone.
Roughly a computer is organized (in terms of fields of study) like this from highest level of abstraction to the most physical reality:
To properly understand why the computer can produce 2+1 = 3, you must first decide what you are willing to accept "on faith" and what you will not believe until you internalize it. That piece of information will be at the level two below the thing you understand. So if you want to understand an adder circuit at the logical level you will need to understand the basics of "digital" transistors (specifically CMOS).
Using your earlier site as an example, consider this resource. It discusses the "Full Adder" -- the minimum completely general purpose circuit capable of addition/subtraction including carry-in and carry-out.
You will also need to understand how numbers are represented in 2's complement (the number system used in modern computers for integer arithmetic).
If you really want a world-class introductory course, I cannot recommend Professor Scott Wills at Georgia Tech highly enough. He passed away last year of cancer, but his course lives on. The Georgia Tech ECE2030 (introduction to computer engineering) class has its text book and exercises all online.
Good luck!