Problem solved and I would like to share the solution:
Software Data Protection
This feature is available on some flashes that requires a software sequence at the beginning of each write cycle in order for a write to be performed. To enable the software data protection feature, a series of three-write commands to specific addresses with specific data must be performed. Once set, the same 3-byte code must begin each write request. This feature is permanently enabled in the flash I'm using.
So the solution is to add the following commands before any byte write operation not only the first one:
* flash_addr1 = 0xaa;
* flash_addr2 = 0x55;
* flash_addr3 = 0xa0;
Some types of non-volatile memory device use error-correcting logic which adds an extra few bits to each programmable chunk (e.g. 5 bits per 16, 6 per 32, 7 per 64, 8 per 128, etc.) Generally the error correction code is chosen so that all bits blank is a valid representation; in some cases, but not all, it may also be chosen such that all bits programmed is also a valid combination. For simplicity, I'll assume a code with which guards each group of 4 bits with 3 guard bits. Compute the three guard bits guard bit as being the xor of either data bits 0+1+3, 0+2+3, or 1+2+3. I'll also assume that a blank word is zero.
The 16 possible code values are thus
3210 ABC 3210 ABC 3210 ABC 3210 ABC
0000 000 0100 011 1000 111 1100 100
0001 110 0101 101 1001 001 1100 010
0010 101 0110 110 1010 010 1110 001
0011 011 0111 000 1011 100 1111 111
When a memory nybble is read, the system can see what bits ABC should be according to the table. If one of the seven bits in the word is misread, the combination of ABC bits that don't match the computed value will indicate which bit was wrong.
Suppose a memory system used the 16-bit code shown above, and one wanted to overwrite a byte value of 1110 (ECC bits 001) with a value of 1000 (ECC bits should be 111). The net effect would be that the system would write 1000 with ECC bits of 001. When the data is read back, the system would see that for a value of 1000, the ECC bits should be 111 but are instead 001. The fact that bits A and B are wrong means bit 0 of the data was wrong and should be flipped; the system would thus read the value as 1001 (whose ECC is correctly 001).
In most cases, there should be enough flexibility in the design of an error-correcting code to permit both all-bits-clear and all-bits-set to be regarded as valid combinations. Some systems do not do so, however. If an error-correcting code would require an all-bits-programmed word to have two or more of the ECC bits blank, then an attempt to obliterate a word which has those bits programmed would likely visibly fail; attempts to program many other values would likely yield a state which was only one bit error away from failure rather than two.
I really wish memory designers would allow for data to be obliterated even if they don't allow most other overwriting patterns. Especially with NAND flash, it would make some operations a lot easier.
Best Answer
First of all, the way flash works, in general, is that a write command can only change bits from one to zero. To return bits to one, you must erase an entire erase block. Therefore, you cannot just "write to it like a hard drive". On most flash devices, multiple writes can be made to the same block or even the same byte, again as long as you are only changing ones to zeros. (There are some flash devices where the manufacturer recommends against multiple writes to the same block, but this is rare.)
To answer your specific questions:
1) In general, the endurance is based on erase cycles on each block. You can write individual bits (to zero) as much as you want, but doing an erase to return bits to one will wear down those bits of flash.
2) One way is to use the first byte of each record as a status indicator. It starts, after erasure, as 0xFF. When you start writing the record, you write 0x7F to that byte (zero the high bit). Once you complete writing the record, you write 0x3F to that byte (zero the next highest bit). To mark a record as deleted, you write 0x00 to that byte. All of this takes advantage of the fact that you can always turn a bit from one to zero.
Later, when you're reading records, you only look at records with 0x3F in that byte. 0x7F means that the record writing was interrupted (power fail/reboot) and is invalid. Once an entire erase block has only 0x00, 0x7F and 0xFF blocks, it can be erased.
By the way, you can keep a map of used/free blocks in RAM so that you only have to scan when booting up.
3) See the top of this posting.