PostgresQL on Amazon EBS volume, realistic performance, or move to something more lightweight

amazon-web-servicesmongodbpostgresql

I'm working on a little research project, currently running as an instance on ec2, and I'm hoping to figure out whether I'm going down the right path. We, like a thousand other people, are making use of some of twitters streaming feeds to do gather some data to have fun with and my db seems to be having problems keeping up, and queries take what seems to be a very long time. I'm not a DBA by trade, so I'll just dump some info here and add more if need be.

System specs:

ec2 xl, 15 gigs of ram

ebs: 4 100 gb drives, raid 0.

The stream we're getting we're looking at around 10k inserts per minute.

3 main tables, with the users we're tracking somewhere in the neighborhood of 26M rows currently.

Is this amount of inserts on this hardware too much to ask out of ebs? Should take a look at some things with less overhead like mongodb?

Best Answer

@Gnanam's link points to some good advice, particularly this description of a working setup. I see no reason to avoid using EBS, but treat an EBS volume as you would a single hard drive in a real server: prone to failure. Thus, you'll want a RAID level with good resistance to failure, so not RAID 0. And given your requirements, you want a RAID level that's also fast on write. So RAID 10 across 6-10 volumes seems like the best place to start.

As for actual performance, it's going to depend on your indexing requirements and the size and type of data you're inserting. The great thing about AWS is that it's relatively cheap to find out how a certain configuration will perform. So what you'll need to do is to come up with some sample data and way to simulate the incoming feed you're trying to process (a script that inserts the records one at a time and writes a log statement with a timestamp every X number of rows, for example). It's probably okay if the sample data repeats over time for your purposes, but make sure your script can run for an hour or more at least.

Now, run that script against a postgresql database set up on various EBS configurations, using snapshotting or Amazon's new Cloud Formation service to produce some reliably reproducible starting points, and measure the performance changes as you change the configuration (and over time will be important as well). You might want to toss in single-volume and RAID5 configurations just to compare.

Related Solutions

Shrinking Amazon EBS volume size

I had the same question as you, so I worked out how to do it.

First, I did this from the Ubuntu 32-bit EBS-backed ami from the US-East region, other OS's or images may work differently. However, I suspect that you should be ok, as long as you are using an ext* filesystem. It might work on other filesystems, but you'll have to figure out how to resize them on your own.

The steps are basically:

Attach two volumes to a running instance, the first based on the snapshot you want to shrink, and the second a blank volume having the new size you want to shrink to.
Check the file system of the first volume and repair any errors.
Shrink the file system on the first volume so it is only as big as it needs to be to hold the data.
Copy the file system from the first volume to the second.
Expand the file system on the second volume to it's maximum size.
Make sure everything looks good by checking the second volume for errors.
Take a snapshot of the second volume.
Create a machine image based on the snapshot of the second volume you just took.

You first need to get some information from the ami you want to shrink. In particular, you need the kernel ID and ramdisk ID, if any (the image I shrunk didn't have a ramdisk). All this information should be available from the aws management console ,in the AMI window.

The kernel ID looks like kia-xxxxxxxx, and the snapshot ID looks like snap-xxxxxxxx, and ramdisk IDs look like RIA-xxxxxxxx.

Next, launch a linux instance. I launched a Ubuntu instance. You can use a t1.micro instance if you like. It doesn't take much power to do these next steps.

After the machine is running, attach the snapshot you wrote down from the first step. In my case, I attached it to /dev/sdf

Then, create a new volume, having the size you want. In my case, I created a 5GB volume, as that's the size I wanted to shrink it to. Don't create this new volume from a snapshot. We need a new blank volume. Next, attach it to the running instance, in my case I attached it as /dev/sdg

Next, ssh into the machine but don't mount the attached volumes.

At this point, I erred on the side of paranoia, and I opted to check the file system on the large volume, just to make sure there were no errors. If you are confident that there are none, you can skip this step:

$ sudo e2fsck -f /dev/sdf

Next, I resized the file system on the large volume so that it was only as big as the data on the disk:

$ sudo resize2fs -M -p /dev/sdf

The -M shrinks it, and the -p prints the progress.

The resize2fs should tell you how large the shrunkin filesystem is. In my case, it gave me the size in 4K blocks.

We now copy the shrunkin file system to the new disk. We're going to copy the data in 16MB chunks, so we need to figure out how many 16MB chunks we need to copy. This is where that shrunken file system size comes in handey.

In my case, the shrunk file system was just over 1 GB, because I had installed a lot of other programs on the basic Ubuntu system before taking a snapshot. I probably could have gotten away with just copying the size of the file system rounded up to the nearest 16MB, but I wanted to play it safe.

So, 128 times 16MB chunks = 2GB:

$ sudo dd if=/dev/sdf ibs=16M of=/dev/sdg obs=16M count=128

I copied in blocks of 16MB because with EBS, you pay for each read and write, so I wanted to minimize the number of them as much as possible. I don't know if doing it this way did so, but it probably didn't hurt.

We then need to resize the file system we just copied to the new volume so that it uses all the available space on the volume.

$ sudo resize2fs -p /dev/sdg

Finally, check it, to make sure everything is well:

$ sudo e2fsck -f /dev/sdg

That's all we need to do in this machine, though it couldn't hurt to mount the new volume, just as a test. However, this step is almost certainly optional, as e2fsck should have caught any problems.

We now need to snapshot the new volume, and create an AMI based on it. We're done with the machine, so you can terminate it if you like.

Make sure the small volume is unmounted if you mounted it, and then take a snapshot of it. Again, you can do this in the management console.

~~The final step requires the commandline ec2 tools.~~

EDIT:

Since this answer was posted the AWS console allows you to simply right click a snapshot and select Create Image from Snapshot. You will still need to select the appropriate Kernel ID. If it does not appear on the list make sure you've selected the appropriate architecture.

We use the ec2-register application to register an AMI based on the snapshot you just took, so write down the snap-xxxxxxxx value from the snapshot you just took.

You should then use a command like:

ec2-register -C cert.pem -K sk.pem -n The_Name_of_Your_New_Image
-d Your_Description_of_This_New_AMI --kernel aki-xxxxxxxx
-b "/dev/sda1=snap-xxxxxxxx" --root-device-name /dev/sda1

You of course need to replace the kernel ID with the one you wrote down at the beginning and the snapshot ID with the one you created in the previous step. You also need to point it at your secret key (called sk.pem) above, and your x509 cert (called cert.pem). You can of course choose whatever you want for the name and description.

Hope this helps.

Switching from Amazon EC2 instance-store to EBS Volume

Basically you just need to copy the running instance to am EBS volume. Before doing this stop any services which change things on the filesystem (mysql, etc...)

So create a volume, make sure it's in the same availability zone as your s3 backed instance, and attach it to that instance.

ec2-create-volume -s 10 -z us-east-1d
ec2-attach-volume -i i-instance_id -d /dev/sdh

Copy everything over to the ebs volume and validate.

dd bs=65536 if=/dev/sda1 of=/dev/sdh
fsck /dev/sdh

Then mount the drive

mkdir -p 000 /ebs
mount /dev/sdh /ebs

make sure /ebs/etc/fstab wont try and mount anything that's not there, then unmount the drive

umount /dev/sdh

You can then create a snapshot of that volume, then you can ec2-register it as an ami, you have to do this from the command line, I don't think you can register an ami from a snapshot using the web interface.

Best Answer

Related Solutions

Shrinking Amazon EBS volume size

Switching from Amazon EC2 instance-store to EBS Volume

Related Topic