You can build them completely from basic logic gates, and the result will be a nice piece of art :-).
The 74xx logic series also contains a 74LS181, a 4-bit slice ALU, which simplifies things drastically. Bit slice ALU's were used to build more complex ALUs (read: longer word lengths), but newer technologies have made this kind of ICs obsolete.
note: TTL (74xx) is just one technology used for logic gates. Rarely used anymore. It was followed by Low-Power Schottky: 74LSxx, strictly speaking also a form of TTL. Nowadays there are dozens of logic families, all based on high-speed CMOS (74HCxx, 74HCTxx, 74ACxx,...)
These days the proper way to create an ALU would be to do it in a CPLD or an FPGA. This gives you a lot of gates, and the HDL (Hardware Description Language) you use to design the ALU is a lot easier and less error-prone than trying to figure out how to make the connections yourself with the logic gates. VHDL and Verilog are the HDLs of the day.
An alternative method to create an ALU (not using logic gates, though), would be a single parallel EEPROM/Flash. You use inputs A and B and the operation as input (address) and get the result of the operation as output (data). All you have to do is compile the ROM's content, meaning that you have to write at every address what the result of the operation will be for the corresponding inputs A, B and operation. Word size will be limited by the largest ROM size you can find.
Most microprocessor manufacturing (along with countless other devices) undergo the process of binning: all similar products are made at once, and depending on their performance, are placed into "bins" (groups) of similarly performing products, and then packaged and sold accordingly.
In the case of Intel processors (AMD is similar), generally processors within the same line are manufactured together, and are binned according to their stable clock frequency. You can tell when a processor is part of the same "line," by looking at the core codename, or if that is not specific enough, the features of the core (as mentioned by embedded.kyle, the i5 doesn't have hyperthreading, but the i7 does, even though both in question are "Sandy Bridge").
Sometimes a higher-end processor that fails can still be sold as another. An example I know first-hand is that the M0 steppings of the old Northwood-core Pentium 4's (130nm process) were actually failed Gallatin-core processors (which was the core for the P4 "Extreme Edition"). Similarly, a lot of people had/have luck unlocking extra cores, caches and shader units on various CPUs and GPUs. For example, it is quite common to be able to buy a mid-high range video card (take for example, the AMD Radeon 6850) and flash it with the BIOS of the higher-level card (the Radeon 6870, in this case) and gain the extra things that card has (some extra shader cores). This also has to do with binning during the manufacturing process.
This sort of thing drives overclockers to take good note of the stepping, place of origin, and batch number of their processors. When word gets out that certain batches of processors are overclocking better than their not-as-potent brethren (same CPU, mind you, just made at a different time or place), they become more in demand.
If you're interested more, definitely search "CPU Binning," or read up at some forums. I'm a member at www.overclockers.com, and the forum there is quite welcoming and has a wealth of past and current knowledge (along with an abundance of fantastic members).
Best Answer
At the very lowest level, consider something like microcode. That's what Wouter was talking about when he mentioned Very Long Instruction Word architectures.
A CPU is a collection of busses, registers, memory, and arithmetic logic unit (ALU). Each of these do simple and finite things. Any one higher level instruction is a coordinated set of actions between all the blocks. For example, the low level operations to add a memory location value into the accumulator could be:
When you break it down into the basic hardare operations, you note that orchestrating a instruction is mostly routing data to the right places in the right sequence. If this were implemented with descrete logic, #1 would be asserting a single output enable line of a tri-state buffer that drives the memory address bus. #2 and #3 likewise require asserting a single line. #4 is a little different in that the ALU is usually a canned logic block itself and often has a set of lines that code the operation. For example, 000 might be pass input 1 to output, 001 add both inputs to the output, 010 logical AND both inputs to the output, etc.
The point is that at the level described above, each instruction is just asserting a certain set of control lines in sequence, possibly with minimum wait times between some actions. A stripped down CPU could simply tie each bit in the instruction word to one of these control lines. That would be simple and highly flexible, but one drawback is that the instruction word would need to be quite wide to contain all the necessary control lines and a operand address field. The instruction memory is used very inefficiently.
A bunch of years ago, there were machines with microcoded architecture. They worked as I described at the low level, but these microinstructions weren't what you wrote when you programmed the machine. Each user level instruction essentially kicked off a small microcode routine. The user instructions would now be more compact with less redundant information in them. This was good because there could be many many of them and memory was limited. But the actual low level control of the hardware was done from microcode. This was stored in a special wide and fast and therefore expensive memory, but it didn't need to be very big because there were only a few microcode instruction for each user level opcode.
Nowadays, relatively simple machines like microcontrollers don't really have microcode. The instruction set has been made a little simpler and more regular so that it can be decoded directly by hardware, although that hardware may have a sort of sequencer or state machine that isn't exactly a microcode engine but sortof does that job with combinatorial logic with pipeline stages where things get held up waiting on clock edges. This is one reason, for example, that smaller PICs have a CPU clock that is 4x the instruction clock. That allows 4 clock edges per instruction, which is useful for managing propagation delays thru the logic. Some of the PIC documentation even tells you at what Q edges what operations are performed.
So if you want to get something very basic up and running, try implementing just a microcode machine. You may need a 24 bit or wider instruction word, but for demonstration and learning that is fine. Each bit controls a single flip flop clock, enable line, or whatever. You will also need some way of sequencing thru the instructions and doing conditional branching. A brute force way is to put a new address for possible use depending on the result of some conditional operation right into the microcode word.