The reason is that packets are properly routed to the on premises machines, but those same machines do not know where to send their answers back to Azure clients.
Correct, yes. But that does not mean you have to configure the route on every machine.
I just had the exact same issue and I solved it like this:
Environment:
1 Azure Subscription with 1 VNet with 1 Dynamic VPN Gateway.
1 RRAS Server on Premise behind my Router, routing the IPSEC Ports to the RRAS Server (I know, bad setup, NOT supported by Microsoft, but still works if you do the correct port forwarding).
All Default Gateways on all on Premise Machines are set to the Router, not the RRAS Server.
Situation:
Exactly the same as you. VPN is set up and connected.
Connecting to the Azure Machines from the RRAS Server works, connecting to the RRAS IP's from the Azure Machines works too.
What doesn't work is connecting to the Azure Machines from another on Premise Machine and connecting to another on Premise Machine from Azure Machines.
Resolution:
As you stated yourself, the on Premise Machines don't know how to connect to the Azure subnet. But instead of configuring this static route on all the on Premise Machines, I created just 1 static route on my Router, pointing the Azure Subnet to my RRAS Server. Magic Magic, all connections from all Machines on Premise to Azure and vice versa started working like a charm.
Of course, the smoothest solution would be to use your default gateway router to connect to the Azure network as this would solve the "my normal default gateway doesn't know of azure" problem.
Best Answer
Enable read caching on the disk(s) hosting the data files and TempDB.
The only way you can check this by looking at the VM in the Azure portal, if you don't have access then you'll need to get someone to look. Go to the VM and then the disks tab, each disk will state what caching it is using.
Stripe multiple Azure data disks to get increased IO throughput.
Disk striping will be done from inside the OS, either in disk management or storage spaces. Look at the amount of disks configured in the Azure portal, then look at storage spaces or disk management to see how they are setup.
Format with documented allocation sizes.
Open an admin command prompt and run this command:
Look at the bytes per cluster value.