The trick here is PogoPins (Wikipedia)
Basically you make a jig where you drop the board in often descriped as a Bed-Of-Nails In-Circuit Tester/programmer, and it's then flashed without having to deal with the connector-mating aspect of the jtag interface.
LadyAda did a tutorial, and so did SparkFun
rope data structure
I am fascinated by the rope data structure.
I have a hobby project trying to adapt it it to a microcontroller with only a few bytes of RAM hooked to a huge Flash memory, so that I can insert and delete and otherwise arbitrarily edit variable-length text in huge text files. Text files far too large to fit into the RAM. Erasing the last half of the file and re-write it to flash, shifted by one byte, every time I insert or delete a character in the middle of a multi-megabyte text file, would be far too slow, but the rope data structure can do this much faster.
Since the rope data structure can represent such mutable random-access variable-length files as immutable fixed-length pieces, it seems to be a good match for flash memory -- all edits are written in a circular-log-like fashion.
Alas, all the bugs are not yet worked out in my code. :-(
fixed-length chronological logs
I did get a similar circular log system working, for a product I helped develop.
I simply wrote fixed-length records one after another, filling up flash as a circular array.
(With a completely blank flash, I started writing records about 3 blocks before the end of the array, so I could test the circular wrap-around after only a few records of data were stored, rather than starting at record zero and waiting for a month's worth of data to be written before finding out that there was a bug in my wrap-around code).
I made sure there were always at least 2 erased "erase blocks" ready to be written to.
After writing a record, if there was only 2 "erased blocks" after it that were empty, I unconditionally erased the oldest block of data -- the 3rd block of oldest data after the 2 "erased blocks".
(Near the end of the flash memory, "after" means "wrap around to the beginning of flash memory).
(Perhaps a single erased block would have been adequate -- I forget why I thought I needed at least 2 and sometimes 3).
I forget exactly how many records I put in each "erase block", but I made sure I never had a record straddle two erase blocks -- the first 2 bytes of every erase block of flash was either the "erased" value 0xFFFF, or the first two bytes of a Fletcher-16 checksum (which is never 0xFFFF) in the header of each record.
That made it quick to scan the next time it powered-up and find the head of the circular log -- I only had to look at the first two bytes of each erase block to distinguish between "erased" and "data" blocks.
(I was a little worried about "power failure in the middle of erasing a block" causing the first two bytes to be erased to 0xFFFF but leaving non-erased bytes in the middle of the block, so I wrote code for the microcontroller to check for this and restart the "erase a block" process).
Please tell me if you find other flash-friendly data structures or file systems.
Best Answer
Although you reference a 1 MB SPI Serial Flash chip, I think it is much more common to see file systems implemented on removable SD cards. That has two advantages: much larger storage (GB instead of MB), and you can remove the SD cards and read/write them on a PC.
An additional advantage is that the name-brand SD card makers like Kingston and SanDisk provide wear-leveling as part of the SD card architecture (but don't count on that from less expensive SD card makers).
Microchip (and I am sure other vendors) has a library you can use to implement either a FAT16 or FAT32 filesystem on an SD card. It is described in this application note AN1045, "Implementing File I/O Functions Using Microchip’s Memory Disk Drive File System Library". Obviously this will be targetd for PIC microcontrollers.
The low-level read/write SD card sector routines in this library could probably be modified to work with a SPI serial flash, but you would need to do your own wear-leveling and bad sector marking which would add a lot of complexity.