Add the CIDR block of your VPC to your ingress rules of your security group.
You will also need to ensure that egress rules are configured for your other security groups to allow outbound traffic from your instances. Again, you can limit it to the same CIDR block.
For example, if your VPC CIDR block was 10.0.0.0/16
, then:
- On your target security group, add an ingress rule on the desired port for
10.0.0.0/16
.
- On all possible source security groups, add an egress rule on the desired port for
10.0.0.0/16
.
However, to be more secure, I would recommend permitting traffic based on security group rather than CIDR block. For example:
- On your target security group, add an ingress rule on the desired port for the source security group.
- On your source security group, add an egress rule on the desire port for the target security group.
tl;dr
The huge amount of regional traffic was caused by an apt-get update
on startup of the machine.
At first I suspected the software I am running on the cluster, because this sends a hell a lot of DNS requests out - probably it does not use any DNS caching. And the DNS server is in another Availability Zone.
Full way to debug such stuff
I debugged this with a friend, here is how we arrived at the solution so everyone with this issue can follow:
First of all, from the billing management, you can see that the cost is $0.01 per GB. So it reflects the following points from the Pricing web page (which go a bit more into detail):
- Amazon EC2, Amazon RDS, Amazon Redshift and Amazon ElastiCache instances or Elastic Network Interfaces in the same Availability Zone
- Using a public or Elastic IP address
- Amazon EC2, Amazon RDS, Amazon Redshift and Amazon ElastiCache instances or Elastic Network Interfaces in another Availability Zone or peered VPC in the same AWS Region
Next I checked an explanation on AWS about Availability Zones and Regions. What I have to pay for is definitely traffic that comes from the same region (us-east-1
in my case). It can either be traffic passing from one AZ to another AZ (we knew before) or traffic using a public IP address or Elastic IP address within the same AZ (we also knew from the other serverfault question). However, it now seems that this list is indeed exhaustive.
I knew I had:
- 6 EC2 machines in a cluster
- no RDS
- no Redshift
- no ElastiCache
- no Elastic IP address
Peered VPC
VPC is an own product, so go to VPC. From there you can see how many VPCs you have. In my case it was only one, so peering is not possible at all. But you can still go to Peering Connections and see if anything is set there.
Subnets
From the Subnet section in VPC we also found out some important clue for further debugging. IP ranges of the different Availability Zones in us-east-1
:
172.31.0.0/20
for us-east-1a
172.31.16.0/20
for us-east-1b
172.31.32.0/20
for us-east-1e
172.31.48.0/20
for us-east-1d
Since all my machines should be in us-east-1d
, I verified that. And indeed they all had IPs starting with 172.31.48
, 172.31.51
and 172.31.54
. So far, so good.
tcpdump
This then finally helped us setting the right filters for tcpdump. Now knowing with which IPs I should be communicating in order to avoid costs (network 172.31.48.0/20
only), we set up a filter for tcpdump
. This helped removing all the noise that made me not see the external communication. Plus, before I did not even know that communication with [something].ec2.internal
could be the problem, since I did not know enough about regions, AZs and their respective IP-ranges.
First we came up with this tcpdump filter:
tcpdump "not src net 172.31.48.0 mask 255.255.240.0" -i eth0
This should show all traffic coming in from everywhere but us-east-1d
. It showed a lot of traffic from my SSH connection, but I saw something weird flying by - an ec2.internal
address. Shouldn't they have all been filtered out, because we do not show AZ-internal traffic anymore?
IP ip-172-31-0-2.ec2.internal.domain > ip-172-31-51-15.ec2.internal.60851
But this is not internal! It's from another AZ, namely us-east-1a
. This is from the DNS system.
I extended the filter to check how many of these messages occur:
sudo tcpdump "not src net 172.31.48.0 mask 255.255.240.0 and not src host $MY_HOSTNAME" -i eth0
I waited 10 seconds, stopped the logging and it was 16 responses from DNS!
Next days, still the same problem
However, after installing dnsmasq nothing has changed. Still several GB of traffic when I used the cluster.
From day to day I removed more tasks from the cluster and finally tried it one day without any startup scripts (fine!) and one day with startup scripts only + instant shutdown (traffic!).
The analysis of the startup script revealed that apt-get update
and apt-get install ...
are the only component pulling huge files. Through a Google research I learned that Ubuntu indeed has a package repository inside AWS. This can also be seen from the sources.list
:
http://us-east-1.ec2.archive.ubuntu.com/ubuntu/
Resolving the hostname leads to the following IP addresses:
us-east-1.ec2.archive.ubuntu.com. 30 IN A 54.87.136.115
us-east-1.ec2.archive.ubuntu.com. 30 IN A 54.205.195.154
us-east-1.ec2.archive.ubuntu.com. 30 IN A 54.198.110.211
us-east-1.ec2.archive.ubuntu.com. 30 IN A 54.144.108.75
So I setup a Log Flow service and logged the cluster during boot time. Then, I downloaded the log files and ran them through a python script to sum up all transferred bytes to any of these 4 IP addresses. And the result matches my traffic. I had 1.5 GB traffic during the last test, had 3 clusters of 5 machines each and according to my Log Flow log each machine queries about 100 MB from the Ubuntu repository.
Best Answer
I am running a website whose traffic is totally from India only and tested various options.
If the traffic is from India only, just go for Singapore Zone in Asia Pacific region.
The latency from their is minimum and will vary somewhere around 70-120ms (measure from Delhi, India).
Though you would have to pay some extra bucks comparative to N. Virginia Region, but it's worth it.
The latency from N. Virginia region will be somewhere around 250-350 ms.