pQd's estimate of 7PB seems reasonable, and that's a lot of data for a RDBMS. I'm not sure I've ever heard of someone doing 7PB with any shared disk system, let alone MySQL.
Querying this volume of data with any shared disk system is going to be unusably slow. The fastest SAN hardware maxes out at 20GB/sec even when tuned for large streaming queries. If you can afford SAN hardware of this spec you can affort to use something better suited to the job than MySQL.
In fact, I'm struggling to conceive of a scenario where you could have a budget for a disk subsystem of this spec but not for a better DBMS platform. Even using 600GB disks (the largest 15K 'enterprise' drive currently on the market) you're up for something like 12,000 physical disk drives to store 7PB. SATA disks would be cheaper (and with 2TB disks you would need around 1/3 of the number), but quite a bit slower.
A SAN of this spec from a major vendor like EMC or Hitachi would run to many millions of dollars. Last time I worked with SAN equipment from a major vendor, the transfer cost of space on an IBM DS8000 was over £10k/TB, not including any capital allowance for the controllers.
You really need a shared nothing system like Teradata or Netezza for this much data. Sharding a MySQL database might work but I'd recommend a purpose built VLDB platform. A shared nothing system also lets you use much cheaper direct-attach disk on the nodes - take a look at Sun's X4550 (thumper) platform for one possibility.
You also need to think of your performance requirements.
- What's an acceptable run time for a query?
- How often will you query your dataset?
- Can the majority of the queries be resolved using an index (i.e. are they going to look at a small fraction - say: less than 1% - of the data), or do they need to do a full table scan?
- How quickly is data going to be loaded into the database?
- Do your queries need up-to-date data or could you live with a periodically refreshed reporting table?
In short, the strongest argument against MySQL is that you would be doing backflips to get decent query performance over 7PB of data, if it is possible at all. This volume of data really puts you into shared-nothing territory to make something that will query it reasonably quickly, and you will probably need a platform that was designed for shared-nothing operation from the outset. The disks alone are going to dwarf the cost of any reasonable DBMS platform.
Note: If you do split your operational and reporting databases you don't necessarily have to use the same DBMS platform for both. Getting fast inserts and sub-second reports from the same 7PB table is going to be a technical challenge at the least.
Given from your comments that you can live with some latency in reporting, you might consider separate capture and reporting systems, and you may not need to keep all 7PB of data in your operational capture system. Consider an operational platform such as Oracle (MySQL may do this with InnoDB) for data capture (again, the cost of the disks alone will dwarf the cost of the DBMS unless you have a lot of users) and a VLDB platform like Teradata, Sybase IQ, RedBrick, Netezza (note: proprietary hardware) or Greenplum for reporting
Here is something interesting to consider: Backing up the mysql
database limits you greatly in that you can only restore such a database to the same version of mysql you ran the backup from. Here is why:
Here is mysql.user from MySQL 5.0.45
mysql> desc mysql.user;
+-----------------------+-----------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+-----------------------------------+------+-----+---------+-------+
| Host | char(60) | NO | PRI | | |
| User | char(16) | NO | PRI | | |
| Password | char(41) | NO | | | |
| Select_priv | enum('N','Y') | NO | | N | |
| Insert_priv | enum('N','Y') | NO | | N | |
| Update_priv | enum('N','Y') | NO | | N | |
| Delete_priv | enum('N','Y') | NO | | N | |
| Create_priv | enum('N','Y') | NO | | N | |
| Drop_priv | enum('N','Y') | NO | | N | |
| Reload_priv | enum('N','Y') | NO | | N | |
| Shutdown_priv | enum('N','Y') | NO | | N | |
| Process_priv | enum('N','Y') | NO | | N | |
| File_priv | enum('N','Y') | NO | | N | |
| Grant_priv | enum('N','Y') | NO | | N | |
| References_priv | enum('N','Y') | NO | | N | |
| Index_priv | enum('N','Y') | NO | | N | |
| Alter_priv | enum('N','Y') | NO | | N | |
| Show_db_priv | enum('N','Y') | NO | | N | |
| Super_priv | enum('N','Y') | NO | | N | |
| Create_tmp_table_priv | enum('N','Y') | NO | | N | |
| Lock_tables_priv | enum('N','Y') | NO | | N | |
| Execute_priv | enum('N','Y') | NO | | N | |
| Repl_slave_priv | enum('N','Y') | NO | | N | |
| Repl_client_priv | enum('N','Y') | NO | | N | |
| Create_view_priv | enum('N','Y') | NO | | N | |
| Show_view_priv | enum('N','Y') | NO | | N | |
| Create_routine_priv | enum('N','Y') | NO | | N | |
| Alter_routine_priv | enum('N','Y') | NO | | N | |
| Create_user_priv | enum('N','Y') | NO | | N | |
| ssl_type | enum('','ANY','X509','SPECIFIED') | NO | | | |
| ssl_cipher | blob | NO | | | |
| x509_issuer | blob | NO | | | |
| x509_subject | blob | NO | | | |
| max_questions | int(11) unsigned | NO | | 0 | |
| max_updates | int(11) unsigned | NO | | 0 | |
| max_connections | int(11) unsigned | NO | | 0 | |
| max_user_connections | int(11) unsigned | NO | | 0 | |
+-----------------------+-----------------------------------+------+-----+---------+-------+
37 rows in set (0.01 sec)
Here is mysql.user from MySQL 5.1.32
mysql> desc mysql.user;
+-----------------------+-----------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+-----------------------------------+------+-----+---------+-------+
| Host | char(60) | NO | PRI | | |
| User | char(16) | NO | PRI | | |
| Password | char(41) | NO | | | |
| Select_priv | enum('N','Y') | NO | | N | |
| Insert_priv | enum('N','Y') | NO | | N | |
| Update_priv | enum('N','Y') | NO | | N | |
| Delete_priv | enum('N','Y') | NO | | N | |
| Create_priv | enum('N','Y') | NO | | N | |
| Drop_priv | enum('N','Y') | NO | | N | |
| Reload_priv | enum('N','Y') | NO | | N | |
| Shutdown_priv | enum('N','Y') | NO | | N | |
| Process_priv | enum('N','Y') | NO | | N | |
| File_priv | enum('N','Y') | NO | | N | |
| Grant_priv | enum('N','Y') | NO | | N | |
| References_priv | enum('N','Y') | NO | | N | |
| Index_priv | enum('N','Y') | NO | | N | |
| Alter_priv | enum('N','Y') | NO | | N | |
| Show_db_priv | enum('N','Y') | NO | | N | |
| Super_priv | enum('N','Y') | NO | | N | |
| Create_tmp_table_priv | enum('N','Y') | NO | | N | |
| Lock_tables_priv | enum('N','Y') | NO | | N | |
| Execute_priv | enum('N','Y') | NO | | N | |
| Repl_slave_priv | enum('N','Y') | NO | | N | |
| Repl_client_priv | enum('N','Y') | NO | | N | |
| Create_view_priv | enum('N','Y') | NO | | N | |
| Show_view_priv | enum('N','Y') | NO | | N | |
| Create_routine_priv | enum('N','Y') | NO | | N | |
| Alter_routine_priv | enum('N','Y') | NO | | N | |
| Create_user_priv | enum('N','Y') | NO | | N | |
| Event_priv | enum('N','Y') | NO | | N | |
| Trigger_priv | enum('N','Y') | NO | | N | |
| ssl_type | enum('','ANY','X509','SPECIFIED') | NO | | | |
| ssl_cipher | blob | NO | | NULL | |
| x509_issuer | blob | NO | | NULL | |
| x509_subject | blob | NO | | NULL | |
| max_questions | int(11) unsigned | NO | | 0 | |
| max_updates | int(11) unsigned | NO | | 0 | |
| max_connections | int(11) unsigned | NO | | 0 | |
| max_user_connections | int(11) unsigned | NO | | 0 | |
+-----------------------+-----------------------------------+------+-----+---------+-------+
39 rows in set (0.00 sec)
Here is mysql.user from MySQL 5.5.12
mysql> desc mysql.user;
+------------------------+-----------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+-----------------------------------+------+-----+---------+-------+
| Host | char(60) | NO | PRI | | |
| User | char(16) | NO | PRI | | |
| Password | char(41) | NO | | | |
| Select_priv | enum('N','Y') | NO | | N | |
| Insert_priv | enum('N','Y') | NO | | N | |
| Update_priv | enum('N','Y') | NO | | N | |
| Delete_priv | enum('N','Y') | NO | | N | |
| Create_priv | enum('N','Y') | NO | | N | |
| Drop_priv | enum('N','Y') | NO | | N | |
| Reload_priv | enum('N','Y') | NO | | N | |
| Shutdown_priv | enum('N','Y') | NO | | N | |
| Process_priv | enum('N','Y') | NO | | N | |
| File_priv | enum('N','Y') | NO | | N | |
| Grant_priv | enum('N','Y') | NO | | N | |
| References_priv | enum('N','Y') | NO | | N | |
| Index_priv | enum('N','Y') | NO | | N | |
| Alter_priv | enum('N','Y') | NO | | N | |
| Show_db_priv | enum('N','Y') | NO | | N | |
| Super_priv | enum('N','Y') | NO | | N | |
| Create_tmp_table_priv | enum('N','Y') | NO | | N | |
| Lock_tables_priv | enum('N','Y') | NO | | N | |
| Execute_priv | enum('N','Y') | NO | | N | |
| Repl_slave_priv | enum('N','Y') | NO | | N | |
| Repl_client_priv | enum('N','Y') | NO | | N | |
| Create_view_priv | enum('N','Y') | NO | | N | |
| Show_view_priv | enum('N','Y') | NO | | N | |
| Create_routine_priv | enum('N','Y') | NO | | N | |
| Alter_routine_priv | enum('N','Y') | NO | | N | |
| Create_user_priv | enum('N','Y') | NO | | N | |
| Event_priv | enum('N','Y') | NO | | N | |
| Trigger_priv | enum('N','Y') | NO | | N | |
| Create_tablespace_priv | enum('N','Y') | NO | | N | |
| ssl_type | enum('','ANY','X509','SPECIFIED') | NO | | | |
| ssl_cipher | blob | NO | | NULL | |
| x509_issuer | blob | NO | | NULL | |
| x509_subject | blob | NO | | NULL | |
| max_questions | int(11) unsigned | NO | | 0 | |
| max_updates | int(11) unsigned | NO | | 0 | |
| max_connections | int(11) unsigned | NO | | 0 | |
| max_user_connections | int(11) unsigned | NO | | 0 | |
| plugin | char(64) | YES | | | |
| authentication_string | text | YES | | NULL | |
+------------------------+-----------------------------------+------+-----+---------+-------+
42 rows in set (0.01 sec)
If you attempt a restore of mysql.user to a version of MySQL it was not meant for, it will produce random permission problems. The way to backup the mysql user permissions in a version-agnostic way is to dump the user grants in SQL. That way, the user grants are completely portable. There are two ways to accomplish this:
OPTION #1 : Using MAATKIT
mk-show-grants will generate the SQL needed whatever mysql instance you connect to. (Keep in mind that MAATKIT is being migrated to the Percona Toolkit. This tool will most likely be called pt-show-grants).
OPTION #2 : Script the dumping of the SQL GRANTS
I have written my own emulation of mk-show-grants. It will leave out anonymous users. It looks like this:
mysql -hhostaddr -umyuserid -pmypassword --skip-column-names -A -e"SELECT CONCAT('SHOW GRANTS FOR ''',user,'''@''',host,''';') FROM mysql.user WHERE user<>''" | mysql -hhostaddr -umyuserid -pmypassword --skip-column-names -A | sed 's/$/;/g' > MySQLUserGrants.sql
Using either of these options will create a more stable backup of user grants.
ON A SEPARATE NOTE
Now if you are using this log-output option
[mysqld]
log-output=TABLE
the mysql database will populate the slow log (if enabled) as a mysql table in the mysql schema rather than a text file. So, doing physical backups will include such mysql table-based logs. Believe me, it is not worth the diskspace if the general log and slow query log are enabled and piling up in the mysql schema. Just stick with the MySQL Grants Dump Options.
UPDATE 2011-09-19 15:54 EDT
There is one very important factor in maintaining backups of MySQL permissions via SQL Grants:
Each user comes out with their password in some modified MD5 format. For mysql 4.0 and back, it is a 16-character hexadecimal string. For mysql 4.1+, it is 41 characters (An asterisk followed by a 40-character hexadecimal string).
Before you restore a SQL Grants dump, check the SQL Grants dump file for any 16-character hexadecimal passwords. If you see even one, you must add the following to /etc/my.cnf (or my.ini for Windows) on the mysql server you will restore to:
[mysqld]
old_password=1
The old_password directive permits 16-char and 41-char passwords to coexist and correctly authenticate in the same running mysql instance. Any passwords created going forward will be 16-characters.
MySQL restart is not required. Just run this:
SET GLOBAL old_password = 1;
Best Answer
Try doing it in two steps first:
So it backs up to a file and then pipe that into the second command:
If the reference error is in the mysqldump step either fix the views or drop it from the database. I can't imaging they're still used if they reference a non-existing table.