The PIC UART can be picky. Did you remember to check the Frame Overrun (OERR) bit? The PIC will be unable to receive UART communications until the OERR is cleared.
EDIT: I was also thinking...perhaps you could try a sort of loopback? That is, cut the UART out of the loop, and when the PC sends anything over USB, just send it straight back. This would tell you whether the issue is with the UART or the USB side of the PIC.
I'd say you're dreaming. The main problem will be the limited RAM.
In 2004, Eric Beiderman managed to get a kernel booting with 2.5MB of RAM, with a lot of functionality removed.
However, that was on x86, and you're talking about ARM. So I tried to build the smallest possible ARM kernel, for the 'versatile' platform (one of the simplest). I turned off all configurable options, including the ones that you're looking for (USB, WiFi, SPI, I2C), to see how small it would get. Now, I'm just referring to the kernel here, and this does not include any userspace components.
The good news: it will fit in your flash. The resulting zImage is 383204 bytes.
The bad news: with 256kB of RAM, it won't be able to boot:
$ size obj/vmlinux
text data bss dec hex filename
734580 51360 14944 800884 c3874 obj/vmlinux
The .text segment is bigger than your available RAM, so the kernel can't decompress, let alone allocate memory to boot, let alone run anything useful.
One workaround would be to use the execute-in-place support (CONFIG_XIP), if your system supports that (ie, it can fetch instructions directly from Flash). However, that means your kernel needs to fit uncompressed in flash, and 734kB > 700kB. Also, the .data and .bss sections total 66kB, leaving abut 190kB for everything else (ie, all dynamically-allocated data structures in the kernel).
That's just the kernel. Without the drivers you need, or any userspace.
So, yes, you're going to need a bit more RAM.
Best Answer
If it's a small application then there's a free version of IAR's Development tools (Embedded Workbench for ARM kickstart edition) The significant limitation on this is that it will not link applications above 32k in size.
Atmel also produce a USB stack (in At91lib), which provides a reasonably simple interface for sending/receiving on endpoints.