I designed[1] something like this using I2C once. (Since I did it for work, I can't post the code.) As long as you have control over all the nodes (they're all MCUs programmed by you), this should work.
Basically, the devices are arranged in a daisy-chain using I2C as normal. In addition to the I2C, you have a point-to-point logic line, using two PIO pins per node. One pin ("upstream sense") is input-only and pulled up, while the other pin ("downstream sense") is output-only, but initially tri-stated (high-Z out) and optionally pulled up. Each node's upstream sense pin is connected to the downstream sense pin of the next chip upstream. The farthest-upstream and farthest-downstream pins are left unconnected. Optionally, each node can have an external FET which connects pull-up resistors to the I2C bus.
On power up, all nodes have their I2C ports as slaves with address 0 or some such (doesn't really matter), drive their downstream sense pins to 0, and wait for a fixed time (depends on how long it takes for all your nodes to power up and initialize). What they're looking to receive is an "all call" (broadcast) message.
Whichever node is farthest upstream will not see its upstream sense pulled low in this time. So it goes first (if pull-ups are FET-controlled, it turns its pull-up on), sets its port as a master, and broadcasts an all-call message identifying itself to the other nodes, including its address (whatever you want to use for the first one) and any other information identifying what it is to the other nodes. Then it waits for a fixed amount of time for another node (should be none, but who knows) to send an all-call message saying that they are in fact at the first address. If it gets such a message, it then repeats its identification, but with the next address. This cycle repeats until it finds an available address. (This pattern allows a node to reset and get its address back without confusing the bus.)
Once it is sure of its address, it sets it in the I2C peripheral and goes to slave mode, to listen for other nodes, and drives its downstream sense line high, which tells the next node downstream to go through the same process to get its address. At this point, it just listens for people trying to claim its address, and records the identification information of the other nodes. (Nodes also listen for other nodes' identification prior to getting a rising edge on upstream sense, building a network table, but they don't have a claimed address yet, so they don't check for collisions. When it comes time to claim an address, it can use the table data to pick a likely unclaimed address.)
After all this, everyone should have unique I2C addresses and be ready to go. Then you just use I2C as normal. (Needless to say, whatever initial address all nodes had could not be used post-configuration.) In our setup, all-call was only used for configuration, and direct addressing was only used for real work. If you want to use all-call after configuration, you'll need to design your all-call message to flag which mode it's in.
There's probably plenty that can be optimized here, but it should give you a start. We used this on a piggyback board for a half-brick power supply, so you could just snap together whatever bricks you needed (we added edge-mating connectors to our boards to carry I2C and the other lines) and then plug into a serial port on any one of the bricks to get voltage, current, and temperature information on all of them. It was pretty sweet and got our student (who did the heavy lifting) an A in senior lab. (Then he ran as fast as he could to grad school across the country...)
[1] By "designed" I mean I wrote up something similar to the text above, the 1% inspiration per Edison. The 99% perspiration was provided by my undergrad student.
The biggest drawback to this system I can see is that if the slaves
are all power cycled for some reason they will all collide when trying
to join the network in the grace period. This means it may take a very
long time until they all rejoin the network.
If you have a limited number of potential addresses (say 256) the master can always poll a currently unconnected slave address to see if it has sprung into action. So if you have 20 actives slaves, then the master polls a 23 rd slave each "round" and numerically increments the unconnected_slave_address for the next "round". It will only take a few seconds to do enough rounds to find a new slave and then the job is done.
No need to have a grace period; just one more slave slot will do the trick and when a new slave (or errant slave) is found it gets listed as "active".
And yes, RS485 is a suitable physical layer for this.
My plan is to use Attiny84 chips to implement this. Each board would
be fed with +15 VDC and +7 VDC. I can regulate this with 780x series
regulators to +5 VDC and +12 VDC for powering the chips and any other
sensors or relays I may wish.
I would use low drop-out regulators and I would give the drop-out voltage a little more headroom - there will be maybe a volt induction from your local home AC wiring and this means the regulators need a bit more space to regulate properly.
I would also consider using Manchester encoding so that you can high pass filter the data received without fear of corruption. 100 kohm biasing after the capacitor should be OK no matter how many slaves you have in place i.e. it won't destroy the impedance matching from end to end of the cable. I don't know what data rate you intend to run at but working at or above 9600 Baud seems sensible to help the high pass filter block any AC pertubations. Yes I know RS485 is a current loop but if you can inject a little bit more hardware security at the design stage it will pay dividends.
Designing for Manchester encoding also gives you a better chance to implement RF links should there be a need for any in the future.
Just a few thoughts on doing this if writing your own protocol.
Best Answer
To do this sort of dynamic addressing on a common bus you ultimately need to have one of two things:
If you don't have either of these two things you will need to hard code addresses into the master and slave devices.
The reason for this is that if you have an address space (of size N) that is too large to search efficiently by stepping through every N address, then any optimised search algorithm that could find the address in less than N steps must include the possibility of more than one slave transmitting a response at the same time. If the physical layer cannot handle this situation, then unfortunately you are out of luck. If the physical layer can detect collisions, then there are many options, such as having a random retry delay when a collision occurs (Ethernet), or using something like CAN bus where collisions are managed in a cooperative way.