Reading the question and the comments, there may be a conceptual misunderstanding : the attenuator WILL attenuate any noise presented on its input (even from just a 50 ohm source impedance), to the same extent it attenuates the signal.
However it also generates noise of its own, which may be represented as the noise from a perfect resistor equal to its own output impedance, and this is added at the output to the (attenuated) input signal and noise. So if input and output Z are both 50 ohms, the net result is attenuated signal + marginally increased noise (i.e. NF = attenuation).
But if its output impedance is lower, the added noise is also lower, thus improving the noise voltage as Andy states.
So represent the attenuator as a perfect attenuator (attenuating noise) in series with a Johnson noise voltage source equal to the output impedance. The rest is just applying the formulae.
EDIT: re: updated question.
(1) There is nothing special about 290K except that it's a realistic temperature for the operation of a passive circuit. The reason they chose it is that the article quotes a noise floor ( -174dBm/Hz) which is correct for a specific temperature : yes, 290k.
(2) While any resistance in the attenuator will contribute noise, I realise that it is not a satisfactory explanation as to why you get the same noise out of an attenuator, because (as Andy says) you could make a capacitive attenuator which is not a Johnson noise generator. So we have to look a little deeper, and remember these noise sources are the statistics of the individual electrons that make up the current.
So, let's say we build a (50 ohm in, 50 ohm out) attenuator, and attempt to cheat Johnson by using a capacitive divider. That implies a node within the attenuator which conducts some of the input current to ground. At this node, we have two current paths; a fraction of the current flows to output, the rest to ground. What determines which path an individual electron will take? Essentially, chance. Collectively? Statistics. So this is a noise source.
Or let's just add series capacitance to provide enough attenuation : we thereby avoid dividing the current flow and eliminate the noise source, right? At the cost of reducing the signal current; our statistics now operate with a smaller sample size and consequently greater variance : more noise.
These results are the best you can do, there is no way round them.
I think that you do have to compare powers and also in a certain bandwith. The noise is a power in a certain bandwidth, if you choose a smaller BW the noise will be less !
If your signal is only 1 MHz wide, you don't need your the 100 MHz bandwidth (the LNA could be 100 MHz but after mixing you would filter to a 1 MHz bandwidth and get rid of the extra noise).
So you cannot compare your 50 mV signal to the noise power (in a specified BW). You need a signal power (in a specified BW) and compare that to the noise (in the same BW).
Think of it like: if your signal is only a 50 mV sinewave it will use a very small BW (it is only one frequency). Now compare that to for example a WiFi signal which uses many frequencies in a certain BW. Combined they can also peak at 50 mV but they would contain a lot more power (and information!).
Best Answer
Yes, we want the output SNR as high as possible, because that means the best chance of recovering the message signal accurately.
This is not possible.
The input noise will be amplified just as much as the signal.
Plus, some additional noise will be added by our amplifier.
So the overall effect is the output SNR is lower than the input SNR. (This is why the NF is positive when expressed in dB)
But we want the SNR to be reduced by as little as possible.