I haven't had much personal experience with RTOS's other than QNX (which is great on the whole but it's not cheap and I have had a really bad experience with a particular board vendor and QNX's we-don't-care attitude for systems other than their most common) which is too large for PICs and MSP430's.
Where you will benefit from an RTOS is in areas such as
- thread management/scheduling
- inter-thread communications + synchronization
- I/O on systems with stdin/stdout/stderr or serial ports or ethernet support or a filesystem (not an MSP430 or PIC for the most part, except for the serial ports)
For peripherals of a PIC or MSP430: for serial ports I'd use a ring buffer + interrupts... something I write once per system and just reuse; other peripherals I don't think you'd find much support from an RTOS, as they are so vendor-specific.
If you need timing that is rock-solid to the microsecond, an RTOS probably won't help -- RTOS's have bounded timing, but typically do have timing jitter in their scheduling due to context switching delays... QNX running on a PXA270 had jitter in the tens of microseconds typical, 100-200us maximum, so I wouldn't use it for stuff that has to run faster than about 100Hz or which needs timing much more accurate than about 500us. For that kind of stuff you probably will have to implement your own interrupt handling. Some RTOS's will play nicely with that, and others will make it a royal pain: your timing and their timing may not be able to coexist well.
If the timing/scheduling is not too complex, you may be better off using a well-designed state machine. I would highly recommend reading Practical Statecharts in C/C++ if you haven't already. We've used this approach in some of our projects where I work, and it's got some real advantages over traditional state machines for managing complexity.... which is really the only reason you need an RTOS.
Queues operate that way because that is a thread-safe transaction model for inter-task communication. You risk data corruption and/or ownership issues in any less-stringent scheme.
Are you copying the data into a buffer in memory then passing a pointer with the queue elements, or trying to pass all the data in the queue elements themselves? If you're not passing pointers then you'll get an increase in performance doing that instead of passing one byte at a time through queue elements.
Best Answer
I have been using FemtoOS in few projects and it works very well.