With 120GB of memory, it seems possible that your machine also has multiple physical CPUs and a Non-Uniform Memory Access (NUMA) architecture... and if that's the case, you could have plenty of available memory but still, counter-intuitively, not enough available memory.
If I'm on track so far, you could be running into this problem when MySQL tries to grow the memory allocated to the MEMORY
table, and encounters an error that's related to the way Linux handles memory allocations when running on this architecture -- by only considering the available memory that is on the memory sticks that are directly connected to one particular CPU that was somewhat arbitrarily chosen to be "the one" from which memory requests would be serviced -- in spite of other physical memory being free elsewhere on the motherboard.
NUMA is fine in theory but perhaps not ideally suited to single processes needing massive amounts of memory, as is the case with MySQL... but there's a workaround.
The fix will be to modify the mysqld_safe
script by adding this line:
cmd="/usr/bin/numactl --interleave all $cmd"
...immediately after this line...
cmd="$NOHUP_NICENESS"
The reasons are explained here, in an excellent article originally written to address why MySQL servers with this architecture were heavily swapping in spite of free memory; however, that was a symptom of a larger issue which the author subsequently pointed out "is not entirely a swapping problem," which, even with swap disabled, could include "failed memory allocations."
http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
I'm speculating, of course, as to whether this applies to your system, but it seems like a worthwhile possibility. I used exactly this fix when I encountered an issue on a new 128GB machine when I tried to provision a 64GB InnoDB Buffer Pool and MySQL was unable to find 64GB free on a machine with nothing else running. When I realized the highest value I could successfully use was something less than 1/4 of the total memory in a 16-core machine with 4 physical processors ... sort of like what you seem to be bumping into ... was when I put the pieces together as to the nature of the problem.
Those ^@
are almost certainly binary zeroes. That is, xxd corruptfile | tail -3
will probably emit something like:
#######0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
#######0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
#######0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
The write was signaled to the kernel, but the contents never got flushed to disk. Therefore, the file was extended, in anticipation of a write, so the file is inadvertently sparse.
This is especially likely if you're not using a journaling filesystem, as a journal should cause the write to rolled back if it wasn't completed properly (as it wasn't, due to the crash).
Best Answer
unless you are using insert delayed, inserts/updates/deletes are committed immediately. The key file may not have written itself completely. If you have a caching controller card, it may have cached it and told the OS that it had committed it. Have you run a check table/repair table against it? If so, you might find that just the index is corrupt and that most of the records are still there.
MyISAM is fast, but, lacks quite a few precautions that need to be taken if a power problem/server crash happens. InnoDB is another method that is a bit safer, but, is slower depending on the types of queries you are doing. It adds row-level locking rather than table locking, but, select count(*) requires a scan of the rows whereas MyISAM can answer that from an index.
You could set up replication to send the data to a mirror, but, if it is in the same data center, you face the same possibility of it having corrupt data.