Which aws zone to choose if website traffic will be from india only

amazon ec2amazon-web-services

I am going to make my website live on AWS servers and 100% of intended website audience will be from India only. As amazon offers you to get services from various zones, I was wondering is there any substantial difference in performance and page fetch time if I choose my server to be in North Virginia or in Asia Pacific?

I have read somewhere that more the server is near a client's computer, more fast the webpage loads on his machine. As number of hops between client's machine and server gets reduced.

If that is the case then Amazon offers three places in Asia Pacific(Singapore, Tokyo, Sidney) out of which Asia Pacific(Singapore) is the nearest place near India. Should I take server in Singapore only or I can choose any location?

Please suggest me?

Best Answer

I am running a website whose traffic is totally from India only and tested various options.

If the traffic is from India only, just go for Singapore Zone in Asia Pacific region.

The latency from their is minimum and will vary somewhere around 70-120ms (measure from Delhi, India).

Though you would have to pay some extra bucks comparative to N. Virginia Region, but it's worth it.

The latency from N. Virginia region will be somewhere around 250-350 ms.

tl;dr

The huge amount of regional traffic was caused by an apt-get update on startup of the machine.

At first I suspected the software I am running on the cluster, because this sends a hell a lot of DNS requests out - probably it does not use any DNS caching. And the DNS server is in another Availability Zone.

Full way to debug such stuff

I debugged this with a friend, here is how we arrived at the solution so everyone with this issue can follow:

First of all, from the billing management, you can see that the cost is $0.01 per GB. So it reflects the following points from the Pricing web page (which go a bit more into detail):

Amazon EC2, Amazon RDS, Amazon Redshift and Amazon ElastiCache instances or Elastic Network Interfaces in the same Availability Zone
- Using a public or Elastic IP address
Amazon EC2, Amazon RDS, Amazon Redshift and Amazon ElastiCache instances or Elastic Network Interfaces in another Availability Zone or peered VPC in the same AWS Region

Next I checked an explanation on AWS about Availability Zones and Regions. What I have to pay for is definitely traffic that comes from the same region (us-east-1 in my case). It can either be traffic passing from one AZ to another AZ (we knew before) or traffic using a public IP address or Elastic IP address within the same AZ (we also knew from the other serverfault question). However, it now seems that this list is indeed exhaustive.

I knew I had:

6 EC2 machines in a cluster
no RDS
no Redshift
no ElastiCache
no Elastic IP address

Peered VPC

VPC is an own product, so go to VPC. From there you can see how many VPCs you have. In my case it was only one, so peering is not possible at all. But you can still go to Peering Connections and see if anything is set there.

Subnets

From the Subnet section in VPC we also found out some important clue for further debugging. IP ranges of the different Availability Zones in us-east-1:

172.31.0.0/20 for us-east-1a
172.31.16.0/20 for us-east-1b
172.31.32.0/20 for us-east-1e
172.31.48.0/20 for us-east-1d

Since all my machines should be in us-east-1d, I verified that. And indeed they all had IPs starting with 172.31.48, 172.31.51 and 172.31.54. So far, so good.

tcpdump

This then finally helped us setting the right filters for tcpdump. Now knowing with which IPs I should be communicating in order to avoid costs (network 172.31.48.0/20 only), we set up a filter for tcpdump. This helped removing all the noise that made me not see the external communication. Plus, before I did not even know that communication with [something].ec2.internal could be the problem, since I did not know enough about regions, AZs and their respective IP-ranges.

First we came up with this tcpdump filter:

tcpdump "not src net 172.31.48.0 mask 255.255.240.0" -i eth0

This should show all traffic coming in from everywhere but us-east-1d. It showed a lot of traffic from my SSH connection, but I saw something weird flying by - an ec2.internal address. Shouldn't they have all been filtered out, because we do not show AZ-internal traffic anymore?

IP ip-172-31-0-2.ec2.internal.domain > ip-172-31-51-15.ec2.internal.60851

But this is not internal! It's from another AZ, namely us-east-1a. This is from the DNS system.

I extended the filter to check how many of these messages occur:

sudo tcpdump "not src net 172.31.48.0 mask 255.255.240.0 and not src host $MY_HOSTNAME" -i eth0

I waited 10 seconds, stopped the logging and it was 16 responses from DNS!

Next days, still the same problem

However, after installing dnsmasq nothing has changed. Still several GB of traffic when I used the cluster.

From day to day I removed more tasks from the cluster and finally tried it one day without any startup scripts (fine!) and one day with startup scripts only + instant shutdown (traffic!).

The analysis of the startup script revealed that apt-get update and apt-get install ... are the only component pulling huge files. Through a Google research I learned that Ubuntu indeed has a package repository inside AWS. This can also be seen from the sources.list:

http://us-east-1.ec2.archive.ubuntu.com/ubuntu/

Resolving the hostname leads to the following IP addresses:

us-east-1.ec2.archive.ubuntu.com.   30  IN  A   54.87.136.115
us-east-1.ec2.archive.ubuntu.com.   30  IN  A   54.205.195.154
us-east-1.ec2.archive.ubuntu.com.   30  IN  A   54.198.110.211
us-east-1.ec2.archive.ubuntu.com.   30  IN  A   54.144.108.75

So I setup a Log Flow service and logged the cluster during boot time. Then, I downloaded the log files and ran them through a python script to sum up all transferred bytes to any of these 4 IP addresses. And the result matches my traffic. I had 1.5 GB traffic during the last test, had 3 clusters of 5 machines each and according to my Log Flow log each machine queries about 100 MB from the Ubuntu repository.

Best Answer

Related Solutions

Allow traffic from any instance in AWS VPC

AWS regional traffic: Track down where it comes from