Add a pair of resistors to the end of the lines. One up to VCC and the other to ground. This will allow you to terminate the line at the end in addition to (or instead of) source termination.
The problem with source termination is it works best with only one endpoint and you have many. The intermediate points will see a step on the edges, the placement of which will depend on the relative impedance of the line to the source resistor.
Parallel termination will allow you to reduce the reflections which works better with lots of loads. Problem is you need the pair of resistors because the driver may not be able drive the desired resistance to ground or VCC. So the pair of resistors can be chosen to provide the desired impedance pulled to some intermediate voltage which the driver can handle.
I think you'd be good for 100BaseT (50MHz RMII signals), although for other reasons I think this is still a risky design. I don't have the time to go through a thorough timing & impedance analysis, but I can offer the following off-the-cuff comments:
a) Whilst I have no idea where you're located or whether you have access to a credit-card, 4-layer PCBs are very affordable from many PCB fabricators. OSHpark.com comes to mind. By dealing with this limitation, your (b) problem (next point) is avoided too.
b) Connecting to "NC" pads is risky and pretty much a no-no in a professional setting. Maybe they're really "NC", or maybe they're "reserved" for some future use on a updated piece of silicon that not only goes into a new closely related IC, but also future manufacturing of this IC. Obviously there'll be lead-frame in there, but maybe also bonding wire to silicon. You just don't know, not today, and not in the future. This is why the mfg says "No Connect"! That "well documented" (says who?) NC today could become connected to some silicon tomorrow. But maybe this doesn't matter in your situation for a one-off.
c) Signal speed through copper on FR4 is about 6"/15cm per ns. Judging from the KSZ8091 datasheet (7.0 Timing Diagrams), I think you'd want your timings to be accurate to within 1ns. So you've got plenty of space (length) to work with here, way more than your currently 'cramped' layout; from a timing perspective you don't need to be that close to the MCU. Personally I wouldn't get too caught up on timing & length-matching in this situation, I don't think it'll matter. Having said that, it's good practise for these fast signals to be the same length, because this does matter in faster designs. Good thing you have the space to pull the PHY chip further away from the MCU to give you space for length-matching.
d) Signal Integrity & impedance: With your bottom-side ground being 0.6mm away, it doesn't get you much coupling or impedance control. This is why 4-layer PCBs exist :-). If I were you I'd use that extra space (distance between PHY & MCU) available (from a timing perspective) to also add some 0402 resistors in series with these 50MHz signals (placed closest to source), so that you've got the option to slow them down and bring the R component of your impedance up, in case ringing (reflections) is a problem. If you do stick with a 2-layer, then I'd also use that available space between PHY & MCU to add some Ground-connected copper pour on the top-side between these high-speed signals.
Interestingly, I saw something curious in Netgear's cheap GS305 (right), and even cheaper (left) GS105 5-port Gigabit Ethernet switches. IIRC, being Gigabit, these will be ~250MHz signals out to the magnetics, where one would think impedance control would be more important. Then again, I suspect their magnetics are only rated for 10/100BaseT, not 1000, but they seem to be getting away with that, too!
The GS105 even cheaper model is only 2 layers:
Best Answer
Let me start by saying that 7 cm is not a long distance to go for 40 MHz signals. I've ran double that frequently and didn't even break a sweat. Below is a list of issues that you need to consider when doing this:
Trace length: As I just said, 7 cm is not far. But look at your timing budget. If your budget is tight you might have to do something called "matched trace lengths", where every signal of your bus has the same length. Odds are that at 40 MHz you don't need this, but it is worth looking into.
Parallel traces: Try to keep some space between signals on this bus. This is super important for clock and control signals, and less important for data. A normal PCB might have 0.008 inches (0.2 mm) between traces, and you might consider doubling or tippling that. It is OK for them to be close for short distances, but the longer the traces the farther apart they should be.
Power/Ground Planes: Yes, have them. This is important. Run your traces on a layer that is adjacent to the power or ground plane. I am skipping over a LOT of details here that pertain to high-speed digital design. This is an area for you to learn more of in the future. If you can't have a power/gnd plane then your problem has gotten 10 times harder. Run a ground trace next to each signal trace and hope for the best!
Termination: Yes, use them! If the signals are going from one chip to another (and not connecting to more chips) then using source termination is the easiest. Normally that would be a 33 to 50 ohm resistor in series and located at the driver of the signal.
Decoupling caps: Make sure that all chips are properly decoupled. Then add decoupling caps near places where signals move through a via (no more than 1 cap every 3 square cm). Again, I am skipping details but at 40 MHz you don't need to worry that much.