Software engineer here, not a ton of experience managing servers, but wanting to understand how auto-scale works.
Here's the background:
We have a stateless application running on the azure cloud, which talks to an Azure SQL database behind the scenes. The database itself is geo-replicated across two different server regions.
We've setup auto-scale, such that, if server load exceeds 80%, we will scale out and add an instance. When the load drops back below 50%, we will scale back down. Scaling does not usually happen, but during periods of peak usage, the server will auto-scale.
Here's my question:
With auto-scale on, does azure automatically handle any load balancing between the instances? I understand that azure also has some load balancer products, but I'm trying to understand if we need them or not.
Without a load balancer explicitly setup, is scaling our instances pointless?
Best Answer
If you are running a single instance of your webapp there is nothing to actually load balance. Hence, you would just setup scaling in the portal based on a particular metric. For example, if CPU % > 80% for X mins then Scale up to X instances. Then set another rule that states when CPU < 80% for X mins then scale down to X instances.
If you were to setup two instances and load balance them you would have to add a load balance on top of the instances. This is simple to do. Then from there you could also set the same monitoring rules.
https://docs.microsoft.com/en-us/azure/architecture/best-practices/auto-scaling
https://azure.microsoft.com/en-us/features/autoscale/
https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/insights-autoscale-best-practices