"However when a device attempts to use the bus the line will be pulled to 0V."
That's it exactly: the bus is high only if all devices set it high. Just like in an AND gate the output is high only if all inputs are high.
Tri-state logic is not the way to control the bus; if one device sets it high, and another sets it low, you have a short circuit. Usually there's a passive pullup, which keeps the bus at its high level. Each device controls an open drain FET to pull it low.
Like you said, in this setup there's no way to determine how many devices are pulling it low simultaneously.
This is indeed active high logic: for the AND function the bus is active (high) if all inputs are active (high). If a device pulls the bus low it's just putting a low level on it.
The external data-bus width doesn't always agree with the processor's internal structure. A well-known example is the old Intel 8088 processor, which was identical to the 16-bit 8086 internally, but had an 8-bit external bus.
Databus width is not a real indicator of the processor's power, though a less wide bus may affect data throughput. The actual power of a processor is determined by the CPU's ALU, for Arithmetic and Logic Unit. 8-bit microcontrollers will have 8-bit ALUs which can process data in the range 0..255. That's enough for text processing: the ASCII character table only needs 7 bits. The ALU can do some basic arithmetic, but for larger numbers you'll need software help. If you want to add 100500 + 120760 then the 8-bit ALU can't do that directly, not even a 16-bit ALU can. So the compiler will split numbers to do separate calculations on the parts, and recombine the result later.
Suppose you have a decimal ALU, which can process numbers up to 3 decimal digits. The compiler will split the 100500 in 100 and 500, and the 120760 into 120 and 760. The CPU can calculate 500 + 760 = 260, plus an overflow of 1. It takes the overflow digit and add that to the 100 + 120, so that the sum is 221. It then recombines the two parts so that you get the final result 221260. This way you can do anything. The three digits were no objection for processing 6 digits numbers, and you can write algorithms for processing 10-digit number or more. Of course the calculation will take longer than with an ALU which can do 10-digit calculations natively, but it can be done.
Any computer can simulate any other computer.
The humble 8-bit processor can do exactly what a supercomputer can, given the necessary resources, and the time. Lots of time :-).
A concrete example are arbitrary precision calculators. Most (software) calculators have something like 15 decimal digits precision; if numbers have more significant digits it will round them and possible switch to mantissa + exponent form to store and process them. But arbitrary precision expand on the example calculation I gave earlier, and they allow to multiply
\$ 44402958666307977706468954613 \times 595247981199845571008922762709 \$
for example, two numbers (they're both prime) which would need a wider databus than my PC's 64-bit. Extreme example: Mathematica gives you \$\pi\$ to 100000 digits in 1/10th of a second. Calculating \$e^{\pi \sqrt{163}}\$ \$^{(1)}\$ to 100000 digits takes about half a second. So, while you would expect working with data wider than the databus to be taxing, it's often not really a problem. For a PC running at 3 GHz this may not be surprising, but microcontrollers get faster as well: an ARM Cortex-M3 may run at speeds greater than 100 MHz, and for the same money you get a 32-bits bus too.
\$^{(1)}\$ About 262537412640768743.99999999999925007259, and it's not a coincidence that it's nearly an integer!
Best Answer
PC and XT
The original IBM PC simply extended the Intel chipset bus to connectors using buffer drivers. The clock rate on the card bus was the exact same as the clock rate used for a CPU cycle. So with approximately \$4.77\:\text{MHz}\$ (derived by dividing by 3 a \$14.31818\:\text{MHz}\pm 5\:\text{ppm}\$ crystal rate) on the PC's CPU, this meant that a typical 6-cycle, 8-bit I/O bus transaction would take about \$1.26\:\mu\text{s}\$. This was consistent with the technology at the time, so boards could decode and latch addresses using middle-of-the-road (in terms of speed) and reasonably-priced devices. IBM would eventually published a fairly complete set of documentation on the IBM PC, XT, and PC/AT that included detailed schematics that were well laid out and understandable and a complete listing of their BIOS source code (in assembly), as well.
The PC and XT simply used the bus design that reflected Intel's chip design, without extension features (that I'm aware of.) If you tried to increase the clock rate of the CPU, then the clock rate of the bus would also increase and this put pressure on the boards. But I don't recall many attempting to do this, so it wasn't an issue.
AT
With the advent of the PC/AT and the 80286, a new 16-bit I/O transaction and 16-bit memory transaction became available. Intel also changed over to the new 82284 clock gen chip and the 82288 bus control chip. Additional DMA channels and interrupt signal lines were added by IBM and an arbitration transaction was added so that add-on cards could replace the platform CPU as the bus owner. (A little more on that, later.)
The new standard limit for the CPU was now \$6\:\text{MHz}\$. The bus rate was similarly increased and newer boards needed to keep up. IBM also introduced a number of new cards for the system.
The 80286 had four more address lines (going from 20 to 24) and could now enter a new protected mode of operation to gain access to these new lines. While Intel was able to allow the transition from real mode operation to protected mode using appropriate software instructions, they were rushed to get the chip out to the marketplace and did not manage to successfully field the new CPU with the ability to switch back to real mode. As a result, the only way back from protected mode to real mode was through a processor reset. IBM handled this problem through the keyboard interface, using the keyboard (and memory in the calendar IC they used) to force a hardware reset when instructed to do so. The BIOS supported transitions back and forth between modes and was able to hide the fact that the keyboard needed to reset the CPU each time a request was made to get back to real mode operation.
Wider bus transfers on the PC/AT bus now also supported faster bus cycle rates; a "byte swapper" was used to port around low order and high order bytes on the bus; and the new refresh cycle logic used discrete circuits.
The rush for more CPU speed
People quickly discovered that they could increase the clock rate of their expensive IBM PC/AT to about \$8\:\text{MHz}\$ by simply replacing the clock crystal. I did this and found that I could successfully push the system and the boards I used to about \$8.5\:\text{MHz}\$ before things started to get iffy. (I couldn't reach a consistent \$9\:\text{MHz}\$ on my system, so I settled in at \$8\:\text{MHz}\$ and left it there.)
The level of skill needed (and tools required) to design a motherboard was relatively low at the time. Almost anyone could find inexpensive parts and do decent layout that would work fine at these frequencies. And a lot of "mom and pop" motherboard makers soon began to enter the scene. (IBM's price point was very high for most people.)
Perhaps the first truly successful (able to emulate the IBM hardware with 99% compatibility) PC replacement was Kaypro's 286i product. Before this, there were usually too many "issues" to make the products sufficiently acceptable to the business market (though hobbyists were often okay.) Kaypro's entry was about US$2k cheaper than IBM's, so it very quickly rolled out.
As more and more competitors solved the compatibility issues and began to compete, Intel started to roll out faster spec'd 80286 CPUs, too. Board makers would incorporate these newer CPUs, include faster logic chips so the bus could run faster, and we began to see \$8\:\text{MHz}\$, \$10\:\text{MHz}\$, and even \$12\:\text{MHz}\$ offerings. But this almost immediately put pressure on the add-on cards. Older cards simply couldn't be used and newer ones were too few, too far between, and consumers faced buying a faster system that greatly reduced the number of add-on cards they could buy and successfully use.
While a few companies attempted to isolate the add-on card bus rate from the internal Intel bus rate with discrete chips (with some success), the sheer number of "mom and pop" motherboard makers and the need to separate the clock rate of the CPU from the cycle time of the bus opened the door for a new company, Chips and Technology (aka C&T), to produce an ASIC that got this job done. Very quickly after, new motherboards entered the market allowing the ISA bus cycle time to be kept (relatively) independent of the Intel CPU clock rate. Since Intel was meanwhile continuing to increase the maximum CPU frequency, this was a godsend to the many competitors, who didn't have the internal horsepower or financing to develop ASICs but who could certainly use them in new products.
As a result, the "frequency wars" started in earnest and there was hardly a month going by where there weren't new motherboard offerings with increasing CPU clock rates. The decoupling of CPU frequency from bus frequency was a huge win for C&T, too, who did quite well in the process.
Just as a note, I believe the decoupled ISA bus operated asynchronously to the platform CPU with one exception: the RESET line to the platform CPU.
I/O and Memory and DMA
The I/O and memory bus transactions are distinct, but in most ways quite similar to each other. It's just that different boards would respond. The original 8-bit I/O transaction, for example, was 6 bus cycles long. But with the PC/AT a newer, wider bus a 3-cycle I/O was included.
It was the job of each add-on board to latch and decode the address and associated signals they were interested in (IOR or IOW, for example, for cards responding to I/O bus cycles.) They then had a certain number of clocks to respond for a standard transaction. An I/O card could, however, assert IOCHRDY if it wanted added bus cycles to complete its transaction.
With the advent now of both 8-bit and 16-bit transactions with the PC/AT ISA bus, a few issues arose. For example, a 16-bit I/O slave add-on could not force the bus master (which may or may not be the platform CPU) to execute a 16-bit access when the owner only wants an 8-bit. Similarly, a bus owner intending on a 16-bit access cannot order an 8-bit slave add-on to perform a 16-bit access. So there are added signal lines to aid in these circumstances.
The DMA access cycles were a little different from the other two in this sense: DMA would use simultaneous activation of I/O and memory command signal lines to allow data to be placed onto and retrieved from the bus during the same cycle. Here, for example, the address placed onto the bus is for memory and NOT for the I/O card (which should not use it.) (The AEN is activated to indicate to the I/O card not to use the address.)
Add-on boards responding to I/O addresses were set at unique locations to avoid conflict. IBM provided guidance about this for important cards (video display, serial port, parallel port, interrupt controller, and so on), but many add-on board makers would also include means to adjust the I/O address so that if you used two or more of their boards, they would work okay together. In general, the system worked pretty well and there were few problems. (Most issues related to the graphics memory required for various types of display controller boards.)
Arbitration
Technically, there actually IS an arbitration cycle. It's just not what you are asking about. Instead, it is a means by which another bus master (presumably residing on an add-on card) can claim ownership of the bus as the master. This cycle actually starts out looking like a DMA transfer cycle and it is the DMA controller which first responds. The would-be bus master then has a fixed time in which to assert MASTER and obtain ownership. The DMA controller then tri-states its own address, command, and data signals. (I worked with a team on a MIPS R2000 add-on card for the IBM PC/AT, circa 1986.)