At the very lowest level, consider something like microcode. That's what Wouter was talking about when he mentioned Very Long Instruction Word architectures.
A CPU is a collection of busses, registers, memory, and arithmetic logic unit (ALU). Each of these do simple and finite things. Any one higher level instruction is a coordinated set of actions between all the blocks. For example, the low level operations to add a memory location value into the accumulator could be:
- Enable the operand address onto the memory address bus. Assume memory is always in read mode when not explicitly writing.
- Enable the accumulator onto the ALU input 1.
- Enable the memory data bus onto the ALU input 2.
- Set the ALU operation to addition.
- Wait the minimum number of clock ticks so that you know the output of the ALU has settled. In this case it includes the memory read time, the ALU propagation time, and any intermediate data path propagation times.
- Latch the ALU output into the accumulator.
When you break it down into the basic hardare operations, you note that orchestrating a instruction is mostly routing data to the right places in the right sequence. If this were implemented with descrete logic, #1 would be asserting a single output enable line of a tri-state buffer that drives the memory address bus. #2 and #3 likewise require asserting a single line. #4 is a little different in that the ALU is usually a canned logic block itself and often has a set of lines that code the operation. For example, 000 might be pass input 1 to output, 001 add both inputs to the output, 010 logical AND both inputs to the output, etc.
The point is that at the level described above, each instruction is just asserting a certain set of control lines in sequence, possibly with minimum wait times between some actions. A stripped down CPU could simply tie each bit in the instruction word to one of these control lines. That would be simple and highly flexible, but one drawback is that the instruction word would need to be quite wide to contain all the necessary control lines and a operand address field. The instruction memory is used very inefficiently.
A bunch of years ago, there were machines with microcoded architecture. They worked as I described at the low level, but these microinstructions weren't what you wrote when you programmed the machine. Each user level instruction essentially kicked off a small microcode routine. The user instructions would now be more compact with less redundant information in them. This was good because there could be many many of them and memory was limited. But the actual low level control of the hardware was done from microcode. This was stored in a special wide and fast and therefore expensive memory, but it didn't need to be very big because there were only a few microcode instruction for each user level opcode.
Nowadays, relatively simple machines like microcontrollers don't really have microcode. The instruction set has been made a little simpler and more regular so that it can be decoded directly by hardware, although that hardware may have a sort of sequencer or state machine that isn't exactly a microcode engine but sortof does that job with combinatorial logic with pipeline stages where things get held up waiting on clock edges. This is one reason, for example, that smaller PICs have a CPU clock that is 4x the instruction clock. That allows 4 clock edges per instruction, which is useful for managing propagation delays thru the logic. Some of the PIC documentation even tells you at what Q edges what operations are performed.
So if you want to get something very basic up and running, try implementing just a microcode machine. You may need a 24 bit or wider instruction word, but for demonstration and learning that is fine. Each bit controls a single flip flop clock, enable line, or whatever. You will also need some way of sequencing thru the instructions and doing conditional branching. A brute force way is to put a new address for possible use depending on the result of some conditional operation right into the microcode word.
All options are wrong. Maximum number of (unique) opcodes a processor can execute is not limited by bus width.
One may think that a CPU with 12-bit data bus would probably be designed to be able to fit its instruction in a single data word so that it can read instructions in one go - because 2^12 = 4096 opcodes is more than enough for most purposes.
But, alongside opcodes, instructions may also contain arguments that often require an entire data word - so they wouldn't fit anyway - at this point it's not always useful to try to separate an opcode into its own word: some commands may pack 6-bit opcodes with 6 bits for arguments in one word, while others have one 12-bit opcode word plus several words of data. But then a CPU cannot have 2^12 instructions because it wouldn't be able to distinguish between the two instruction types.
On other hand, as pointed out in comments, some say that x86 has more than 6000 opcodes (although not all of them have unique function or are useful).
Yet another point is, for a 4-bit CPU though 2^4 = 16 instructions are very often not enough, so it has to have a way to fit more than that.
My point is that opcode count is limited by CPU instruction format, and not by bus width.
There can be multiple ways and reasons a CPU may incorporate more opcodes than what fits into the data bus, including:
Word-spanning instructions
A processor does not need to read a command in a single data cycle - it can use multiple consequential cycles. In fact most CPUs don't - although its more commonly used for instruction arguments rather then to expand opcode space.
Example: intel 4004 has only 4 lines which are multiplexed as data/address lines, 4-bit data word, but more than 40 opcodes in 8-bit instructions.
Prefixes and suffixes
A (CISC) processor may have as many instruction prefixes and suffixes as it needs.
Those are prefixed to an actual instruction to change what it does - either a little or completely.
It depends on your definition of "unique opcode". If one assumes any part of an instruction that is not data to be a part of opcode, their total number would include all possible variations. However, some believe those affixes are distinct parts of instruction.
Example: Intel x86 CPUs do not actually have 4M opcodes. However if you count all prefixes as a part of an opcode, modern CPUs allow for instructions as long as 15 bYtes - that's a LOT of possible opcodes. Although many will just do the same thing - so this depends on definition of them being "unique".
Modes
A processor may have multiple modes of operation in which it may have a completely different set of opcodes.
Examples: intel x86_64 has 32-bit (real/v86/protected) and 64-bit modes which have distinct opcodes. ARM CPUs can have ARM 32-bit and thumb 16-bit modes.
Bus bit multiplexing
The questions states "data lines" and "address lines", however both internal data bus and internal address bus may be wider than the amount of actual bus lines.
The multiplexed bus data is sent sequentially, i.e. first half, then second half. The CPU stores it into full-sized internal registers and operates on those.
This is often done to reduce costs and/or chip physical footprint size.
Examples include intel 4004, anything on LPC data bus, and NEC VR4300, Nintendo64's CPU that only had 32-line data bus.
No parallel bus
As a continuation of previous point, a CPU does not even need to expose a parallel bus at all.
A CPU may easily only expose a sequential bus such as I2C, SPI, etc.
It's probably not very cost-effective to produce such a dedicated CPU, but a lot of low-pin-count microcontrollers (that include both CPU and memory) are made that way to save those precious pins for something more useful. For example, atmel ATTINY4/5/6/10 chips only has 6 pins total, two for power, one for reset, three general-purpose. The instructions are sent via proprietary 3-line interface sequentially.
Depending on your definition of a microcontroller, it can be considered a microprocessor or can be programmed to act a one (i.e. simulate a dedicated CPU with a sequential bus or buses).
This question clearly states that some kind of data bus IS exposed, but not that it is a parallel bus. In theory the 12-line data bus could consist of a single serial data line and 11 auxilary/ground/status lines, although that probably wouldn't be a very sane idea.
Dedicated instruction bus
Actually a processor does not even need to accept instructions on the same bus lines as it does data.
This could easily be the case when ALUs were discrete chips rather than a part of a microprocessor but is not economically viable now most of the time.
But nothing prevents you from implementing a CPU with dedicated lines just for instructions. Such a CPU may be useful when a single operation must be done on an array of data (SIMD).
Since instruction bus width is completely arbitrary, so is maximum possible opcode count.
Best Answer
With something like an 8085 processor, the result is probably "undefined behavior". Those 1970s devices had limited logic available for instruction decoding, and they designed the opcodes to require minimal decoding effort. For example, maybe every op-code that had a '1' in the 4th bit would result in an update to the accumulator.
These devices wouldn't inform the programmer of anything because they couldn't spare the resources to detect a wrong op-code. It also wouldn't necessarily behave as a NOP, because the fields of the opcode might actually pass through the decode logic and produce some behavior that changed the state of the processor.
It would be the job of the programmer or the compiler to not generate invalid opcodes, which isn't really very difficult when you think about it. A compiler is not going to produce any opcode that it hasn't been programmed to produce.
It seems likely that newer processors, with vastly more resources, can spare some to detect invalid opcodes and produce some defined behavior, but I'm not familiar enough with them to comment on that.