It's not really clear what you are asking, but you can't fetch a opcode and its operand in the same cycle regardless of memory architecture because you don't know what operand there is to fetch until after the instruction is decoded. Doing them both together breaks basic cause and effect.
There is was a similar question to this recently. See single-cycle design using and shared memory for both data and instruction
To understand the behaviour of the CPU you need to know how an instruction is made: there is a fixed part that represent the instruction (let's say that 0001 stands for INC), that is repeated every time you write an INC instruction, and there is a part that depends on the instruction and it's why the CISC processors instructions are not all long the same. The second part contain, in our example, the address of the memory location that we want to increment (0010 0100 1000, assuming that it uses 12 bits to address the memory).
Every instruction set is made in this way. The difference in the instruction set between a CISC and a RISC processor is that the latter uses an instruction set with a fixed instruction width.
The datasheet of a microprocessor can show you exactly how every instruction is made. If you want to see an example of how an instruction is made I advise you to start with old processors datasheet because they are less complex and are easier to understand. Take a look at the Z80 Datasheet, from page 101.
Coming to the question, essentially no. The CISC instruction does not contain the single microcoded instructions, but only a code that identifies the operation we want to do and the necessary data (value to load, address of the memory location...) to execute the instruction. The CPU has a decoder that when read the code knows what to do for that specific instruction and, if it's a microcoded instruction, it knows how to decompose it. As you can see the number of microcoded instructions in which an instruction is divided isn't depending on the lenght of the instruction and it's something "hidden" in the CPU itself, not accessible from the outside. The same goes for the opcode of the "micro" instructions.
This is a simplistic answer, but it tell you essentially how the things works. You can immediately see why the decoder of a CISC processor, particularly if it uses microcode, like the x86 CPUs, is much more complex than its counterpart RISC.
Your question is making some assumptions it shouldn't. However, you have answered this yourself in large part.
In general a memory can do one access at a time. Since fetching the instruction and fetching the operand are two separate accesses at two separate addresses, they must be done sequentially with traditional memories. Even if not, the instruction has to be fetched first before you know that the operation requires a data fetch and then at what address. The logical sequence is inherently sequential.
That said, there are various speedup techniques beyond the conceptually simple scheme of a traditional processor. It would be too long to get into them here, but three that immediately come to mind are caching, pipelining, and separate instruction and data memories. All these are used to various extents in current mainstream products. Real modern processors are no longer as simple as what you are assuming in your question.
For example, most small microcontrollers use a harvard architecture, which means separate instruction and data memories. They can to simultaneous accesses since they are separate memories. However, there is still the issue of having to fetch and decode the instruction before knowing what, if anything, needs to be read from or written to data memory. This is usually dealt with to various extents with pipelining, pre-fetching, and other techniques.
I can't reproduce a whole college level course on computer architecture here, but hopefully I've given you enough keywords so that you can find lots more information on your own.