I read everywhere that Aurora is more performant (claims etc). If that's the case and cost differences are ignored, what would be the reason for any one to use RDS MySQL? What are the technical differences between the two which would make someone not choose Aurora?
Mysql – AWS RDS MySQL vs Aurora
amazon-rdsamazon-web-servicesMySQL
Related Solutions
I have had some perspective on this in the last few months & I believe these items to watch will address all the concerns above:
1) The comment from @Ross on the original posting is the key. T2 instances, no matter what scale and no matter whether they are EC2 or RDS, will stop performing when their CPU credits run out as the peak CPU demands continue.
2) The failure mode of a CMS web server we have seen most often is shown exactly by this condition: the CloudWatch graph dives towards zero when the CPU percentage needed by httpd
processes exceeds the CPU percentage assigned to that instance type (see doc link below).
3) The quick solution for a T2 instance that has exhausted CPU credits is to shut down, upgrade the instance type, and start up the instance again, which takes about 3-4 minutes. The most vital description of the capacities of different instance types is here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html
4) Any production web server on AWS must have an Elastic IP address assigned in advance for this reason: if not, and the instance is rescaled, the IP address will change, leaving the web server inaccessible far beyond what would otherwise only be 3-4 minutes of downtime.
5) The only way to acquire more CPU credits is to upgrade the machine type. The amount of credits each T2 instance size can hold is described in the doc link above: it is always equal to the CPU work that instance type would do in 24 hours.
6) The machine can be returned to its original scale during a bit of scheduled downtime (again, 3-4 minutes) after peak performance demands die down.
7) I/O activity hasn't caused any performance degradation for our web server in any peak periods so far. The amount of IOPS is determined strictly by EBS volume size. Both the exact meaning of IOPS, and that relationship, are described here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-io-characteristics.html
8) Neither of the Cloud Watch metrics Freeable Memory nor DB Connections were of any use anticipating or correcting performance problems in a web server intensive environment.
The problem is that the AWS RDS MySQL read-replica mechanism requires that the two schemas be exact.
Replication, in general, requires this.
But, if your definition of "close enough" is indeed close enough, this can be done in RDS for MySQL the same way you'd always do it.
create a DB instance ... and SET it to be the replica.
Well, that won't be the way you do it.
That isn't how you'd do it outside RDS, either. You always have to start with an identical replica, then change it, because replication by definition relies on a known, specific point in time where the data sets as well as schemata are identical, after which changes become possible to the extent that MySQL indeed considers them "close enough."
Examples of "close enough" might include a replica with more indexes or fewer ... but obviously not things like a new unique key or foreign key constraint on the replica where the data on the master violates the constraint ... and a collation change could cause an existing constraint to become violated without any actual data changing. (Or not, if you switched from case-insensitive to binary).
Adding tables to the replica is fine, dropping tables is not, and it is even possible to add columns or drop columns from tables, if and only if the remaining columns by ordinal position beginning at the first column are identical. That is, you can either add columns to or remove columns from the right-edge of the table; you can't change column order, but you can change data types if coercion is possible, such as increasing the length of a VARCHAR
. You can rename columns as long as BINLOG_FORMAT
on the master is set to ROW
, which will often be the better choice when attempting such changes. In the case of RDS, the only alternative is MIXED
. They wisely prevent strictly STATEMENT
-based replication. Note that BINLOG_FORMAT
on the replica is irrelevant to slave conversions.
MySQL generally tries to implicitly do type conversions during replication. See also Replication with Differing Table Definitions on Master and Slave for the general MySQL perspective on this.
You may indeed get away with what you are planning. If not, you should find out rather quickly.
But at least with RDS, it's easy enough to try again if you break it... and, the master's integrity and performance are unaffected by a broken replica.
Here's your fix:
Read Replicas are designed to support read queries, but you might need occasional updates, such as adding an index to speed the specific types of queries accessing the replica. You can enable updates by setting the
read_only
parameter to 0 in the DB parameter group for the Read Replica.http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.MySQL
Unless this is your only replica, be sure to copy the parameter group to a new one, apply that one to the replica, then change the new group's setting. Leaving other replicas writable as a side effect is a recipe for trouble.
Once this is done and takes effect, the replica is writable. If your proposed changes are indeed "close enough," then after you log in to the replica directly, using the same credentials you'd use for the master, and make your changes, it will keep replicating.
Or, if not, replication will break.
There is no need to pause replication when making schema changes to a replica, because MySQL automatically suspends the execution of replication events using normal locking mechanisms (like table metadata locks) while DDL operations are in progress on the replica involving objects to which the next replication event requires access.
The SHOW SLAVE STATUS
statement works identically to standard MySQL, in RDS. If Slave_IO_Running
and Slave_SQL_Running
both show Yes
and Seconds_Behind_Master
is not NULL
then the replica is not broken; if Seconds_Behind_Master
= 0 then the replica is in real-time sync with the master (> 0 means it is lagging, trying to catch up).
Question: would a loooonngggg change (recollating the column will take on the order of an hour) cause problems on the master? I am thinking of massively backed up replication journals, and hour of traffic blocked on a lock in the read-only replica.
This will not be a problem with RDS, for two reasons.
The most significant reason is that MySQL replication uses two threads -- one to receive the logs from the master (the I/O thread) and one to execute them (the SQL thread) -- these are the threads whose "running" status I mentioned, above. When the replica is blocking the execution of events, due to local changes, it continues to receive them. As long as the replica doesn't run out of storage, everything continues as expected when the locks are released.
Additionally, though not terribly meaningful in this context, since the I/O thread will remain running, RDS doesn't use the standard expire_logs_days
system variable to purge old binlogs from the master. Instead, it purges them itself -- as soon as none of your replicas will need them, but not before -- helpful, since they count against your storage allocation. (You can also configure RDS to leave them there longer, if desired). I've stopped RDS replication entirely for over 24 hours and started it back up with no issues.
Best Answer
The technical differences are summarised nicely in this SlideShare - http://www.slideshare.net/AmazonWebServices/amazon-aurora-amazons-new-relational-database-engine
It's really quite a different architecture/implementation under the covers from standard MySQL, and one that is fundamentally closed.
Amazon are being coy about the extent to which the front end is a MySQL derivative or a complete rewrite that is protocol-compatible - see http://www.theregister.co.uk/2014/11/26/inside_aurora_how_disruptive_is_amazons_mysql_clone/?page=2 - but it looks likely it's at least a major fork with lots of new code. It will have different bugs from the main MySQL releases, which users will be reliant on Amazon to fix. A distributed transactional database backend is a complex thing to write, and while Amazon have some of the best engineers in the world for this sort of system, it's still quite new.
It relies on a completely new Amazon-specific multi-tenanted storage backend, and the updated software isn't freely available as open source, so you can't just download it and run it on your own servers. It may diverge over time (e.g. years) in terms of functional features supported, meaning that code developed against Aurora may no longer work against mainstream MySQL releases, providing a risk of increased lock-in to Amazon.
Regardless, especially if your application needs them, the performance, low replica lag, scalability and recovery time reductions over standard MySQL look pretty compelling in the short term. The lock-in and costs are certainly much lower than with, for example, Oracle's Exadata - which is really the class of solution that Amazon are targetting.