"An introduction to asynchronous circuit design" by Davis and Nowick
(in particular, Figure 1 and Figure 2 and the nearby text)
describes two handshaking protocols as "pervasive".
The 4-cycle protocol, aka RZ (return to zero), 4-phase protocol, and level-signaling.
And the similar but more complicated to implement 2-cycle protocol, aka transition, 2-phase, or NRZ (non-return to zero) signaling -- which is very similar to the "data strobe encoding" used by SpaceWire and FireWire.
Either one sounds like it has most of the features you requested --
it's SPI-like in that there are exactly 4 signals, all 4 signals are one-way (no passive pull-ups), the master can pause the slave indefinitely until it is ready for the next bit from the slave, etc.
It also has a feature supercat requested that SPI doesn't have: the slave can pause the master indefinitely until it is ready for the next bit from the master.
I don't know of any chips that have the 4-cycle protocol built in, but it looks like it would be easy to bit-bang on a microcontroller or a CPLD.
In fact, it looks like it would be easier to bit-bang than SPI, since (like SPI) the master has no timing requirements, and (unlike SPI) the slave has no timing requirement either.
Is it possible to use the 4-phase protocol for synchronous bit transfers, and somehow build a higher-level protocol on top of that to get the other things supercat wants -- byte alignment, start-of-command frame alignment, attention/busy/idle states, etc?
As I understand it you want to send different data to each of the slaves, but the slaves don't have to send data back.
I2C is an addressed bus, so if you assign a different I2C address to each of the slaves you'll need only two wires to send the data. If needed you can ask data back as well. The Arduino's AVRs have an I2C compatible serial bus. And you can extend to more than 3 slaves without extra hardware, up to a maximum of 127.
UARTs don't have addressing, so you would need either 3 UARTs (which the AVR doesn't have), or add external logic to switch between UART lines (which costs money). Each additional slave means extra cost. Not recommended.
edit
Like Chris says you can use UART to create a multidrop bus. And then you'll have to add addressing, which makes your UART work a bit like I2C, but then asynchronous, and without address matching hardware like the I2C has. So still not really an advantage.
end of edit
SPI also uses shared lines for data: a single MOSI, and the MISO lines connected. To address each slave individually you'll need one SS (Slave Select) line per slave. So that's at least 5 I/Os: MOSI, SCK, 3 \$\times\$ SS, and MISO if you also want to read data from the slaves. Each additional slave adds 1 I/O pin on the master.
I think the I2C is the best solution, requiring the least number of wires. The protocol is a bit more complex than UART or SPI, but since the AVR has the hardware for it, it should be easy to use.
Best Answer
You can use standard I2C or SPI. Master implementations exist for Raspberry Pi, and slave implementations exist for AVR. Using TinyAVR seriously limits usage for a very basic processing, so maybe you should opt for AVR with more memory if that becomes the problem. You can also use RS485 single master multi slaves serial communication over Pi's RS232 if RS485 interface chips are used on Pi and all AVRs. Of course, in this case you must implement some known standard protocol or create your own custom protocol.