The problem with proximity is it only takes the location information from one receiver so it is difficult to give a very precise location of the device itself. The only way to increase accuracy with sticking with this method is to add more nodes that measure the devices. The nice thing about this method is it works well in indoor environments where you have a lot of objects that can be in your way and cause difficulties estimating the location of a device using distance estimation.
The problem with distance estimation is it assumes the world is flat with no object to block signals or cause multipath, just like I talked about with the proximity method. If this truly were the case, distance estimation could be very precise with its measurements, but the world isn't this nice to us.
So along comes scene analysis. This is essentially distance estimation, but attempts to overcome the issues that pure distance estimations has. In order to do this you have to train the system what is actually happening in the real world. So you stick devices at known locations and record the results as if you were doing distance estimation. The more points that you do this for, the better you can teach a system to know precisely where a device is.
The way that these data points are used can vary based off of use, but they are pretty much always used in some sort of neural network. For your case it appears as though they are using k-nearest neighbor. Essentially what it is doing is taking the data that is received, comparing it directly with all of the known sample points you collected, and saying the location is the same as the location of a known sample data point. More advanced systems can predict location based off of actually calculating the distance between multiple sample points.
I have never actually seen k-nearest neighbor before, so this is somewhat of a guess as to how it is being applied here, but I hope I was able to provide some insight as to what the issues where with distance analysis that are attempting to be overcome with scene analysis.
I have open source and open hardware sensor that would give you a working starting point: it is internet connected and transmits its temperature, humidity, and battery voltage every two minutes an will last for 3-5 years on 2xAA batteries. It is based on the M12 6LoWPAN module.
I'll try my best to grab touch on all of your questions:
Regarding band tradeoff:
433MHz, 915MHz, 2.4GHz
Range vs. antenna size is the clear tradeoff here. Free-space path loss is a function of wavelength so lower frequencies travel much farther for the same attenuation. BUT, in order to capitalize on this you'll also need a suitable antenna which also scales with wavelength. The 2.4Ghz antenna on the M12 takes about 2 sq. cm of PCB area.
A second factor is licensing. 2.4GHz can have unlicensed stations worldwide. 915MHz is only unlicensed in US (it's a GSM band everywhere else). I'm not sure the restrictions on 433MHz.
Data rate is also effected by frequency choice according to Shannon–Hartley theorem; you can cram more data into a higher frequency band. This isn't always used for more final data rate though. 802.15.4, for instance, has 4 bits of redundancy for every real bit seen at the data layer. The 32 symbols are pseudo-orthogonal so you have to corrupt several low level bits to cause an error. This allows 802.15.4 to operate under the noise floor (research suggests at -5dB SNR) and makes it relatively robust to interference.
Now on to the next hard topic,
low-power radio operation:
Compared to household battery sources (e.g. AA alkalines), even the "low-power" SoCs such as the mc13224v aren't very low power. The transmitters are around 30mA at 2-3.5V and the receivers are 25mA or so. Without turning the radio off and putting the CPU to sleep, this load will drain 2 AAs in a few days. The high power consumption of the receiver is often surprising to people and probably the biggest pain in developing low power radio systems. The implication is that to run for years, you can almost never transmit or listen.
The goal to get "year long" operation from 2xAA alkalines is to get the average current of the system to be < 50uA. Doing so puts you at years and up against the secondary effects from the batteries such as self-discharge and the 7 year self life for household batteries.
The best way to get under <50uA average is if your transceiver doesn't need to receive. If this is true, then you can "chirp" the data as quickly as possible and put the system into a low power mode (say approx 10uA) for most of the time. The TH12, for instance, transmits for about 10ms, but there is other overhead in the system regarding processing time and setup times for the sensor involved. The details can be worked out with a current probe and spreadsheet:
From that type of analysis you can work out what the run-life is going to be (assuming you have an accurate discharge curve for your battery).
If you do need to receive data on the low power side (e.g. to make a sleepy router in a mesh network) then the current state-of-the-art focuses on time division techniques. Some tightly synchronize the network, such as 802.15.4 beacons, and others use a "loose" system such as ContikiMAC (which can be easier to implement esp. if your hardware doesn't have a stable timebase).
Regardless, my experience shows that these methods are around 400uA average which puts you in the "months to maybe a year" run-time with 2xAAs.
Collisions:
My advice: don't worry about them for now. In other words do "aloha" (your option #1) where if you have data send it. If it collides then maybe resend it. (this depends on you goals). If you don't need to ensure that every sample is received then just try once and go to sleep right away.
You will find that the power consumption problem is so hard that the only solution will be a network that isn't transmitting much at all. If you just try, it will probably get through. If it doesn't, you can always try again later.
If you do need to make sure every datagram gets through then you will have to do some kind of ACK scheme. In the 6LoWPAN world, you can use TCP which will keep retrying until your battery is dead. There is also CoAP which uses UDP and has a retry mechanism (but doesn't promise delivery). But every choice here will impact run-time. If you are operating for years, the impact will be in months.
Your option #2 is built into 802.15.4 hardware as CCA. The idea is that receiver turns on for 8 symbols and returns true or false. Then you can make a decision about what to do next. You can play with these schemes all day/week. But every time you do something like this you shave more weeks off the run-time. That's why I suggest to start simple for now. It will work quite well if you are trying for long run-times.
Best Answer
I've seen a few wireless testbeds for 2.4 GHz (802.15.4) networks, including multi-hop links, deployed in both academic and industrial (though R&D) environments. None of these was for certification of link performance, but more to generally assess the feasibility of the given architecture with the to-be-tested hardware and protocol.
The networks were deployed trying to replicate a typical 'sound' installation therefore with a backbone running through central spaces (like corridors) and with line of sight between the nodes, and several other nodes scattered around in the rooms.
If you want to test the maximum reach of the nodes, then you're better served by going to an open space and putting the nodes within line of sight. Even better if you can raise them as much as possible over the ground, which may be a source of reflection and/or attenuation. If instead you want to experiment real-world performance in the indoor setting you describe, you can try the above approach, plus you can try using two (or more) nodes in two separate rooms with different degrees of separation, like drywall, stone wall, glass and whatever you can come up with.
Also note that with higher-level protocols, you may have some trouble checking for errors, as the protocol will try to auto-correct and you will just be able to measure the effective speed or a calculated metric of signal strength.