Assuming that all your back-end machines are "active" and able to respond to requests all you really need is a load balancer front-end.
A good load balancer will be able to keep track of the number of connections going to each host and dynamically distribute new connections to avoid swamping one of the back-end systems (by "good" I mean "expensive", like Cisco Content Switches/Content Switch Service Modules). Price goes hand in hand with features here: Content switches are pretty high up on the solutions tier.
I've got no experience with HAProxy, but it sounds like it can do least-connection load balancing like content switches, so this would probably be a good choice (and at a much more attractive price point). I'm not sure if HAProxy can do source-tracking (send all connections from the same IP to the same back-end) though.
A few steps down down the pf firewall (or the pfsense customized distribution) can do load balancing (random or round-robin, I don't believe they can do "weighted least connections" as a balancing option like the content switches). Source tracking is implemented in pf, though you may have to play with how long that information is retained to avoid problems with connections getting moved from one server to another.
If you're already using pf/pfsense as your firewall this is a no-cost option: We use this in my current deployment with good results, but our connections aren't as long-lived as yours.
As far as I can see, this is your misconception about networking topics including OSI Layers and communication methods.
To briefly answer your main question, I should say that the way that your LoadBalancer is treating completely depends on your configuration and on how you have defined/wanted it to be treating. But to show you how LoadBalancing really works in Layer 4 and 7 of OSI Model, please read the following information :
First of all, about RST Packets you should note that what you mentioned is totally common because RST is used to reset the connection and can occurs on both sides depending on what could not be completed and when there is no more conversation happening between server and client whilst the connection is not yet ended. Excerpt from an answer in Quora, A RST packet is sent either in the middle of the 3-way handshake when the server rejects the connection or is unavailable OR in the middle of data transfer when either the server or client becomes unavailble or rejects further communication without the formal 4-way TCP connection termination process.
Transmission Control Protocol (TCP) operates at the transport layer (Layer 4 in OSI Model). TCP provides reliable, ordered, and error-checked delivery of a stream of octets and creates virtual connection between applications running on hosts communicating by an IP network. In other words, it provides a communication service at an intermediate level between an application program and the Internet Protocol, and since IP packets can - may - be lost, corrupted, or arrive out of order, TCP has mechanisms for correcting these errors, transforming the stream of IP packets into a reliable communication channel. Each application is assigned a unique TCP port number to enable delivery to the correct application on hosts where many applications are running. For instance, the standard TCP port 22 has been assigned for contacting SSH servers - default ports can be changed in configuration files if needed.
In Layer 4 load balancing load balancer’s IP address is advertised to clients for a web site or service. So as might have been guessed, destination address recorded in clients' requests would be the address of LoadBalancer. When the Layer 4 load balancer receives a request and makes the load balancing decision, it also performs Network Address Translation (NAT) on the request packet, changing the recorded destination IP address from its own to that of the content server it has chosen on the internal network. For instance, in your scenario your LoadBalancer will change its own address to the one which is needed to server the requested service to the client, to be more explicit let's consider one of your three servers behind the LoadBalancer is serving as your storage server and the client is willing to read a PDF or whatever content on your website. Client's destination address is set to the public IP address which is assigned to your LoadBalancer, and when your LoadBalancer receives the request it will decide to which server - in internal network using of its own - it should be map the request depending on rules you have set and configured in your LoadBalancer. Similarly, before forwarding server responses to clients, the load balancer changes the source address recorded in the packet header from the internal server’s IP address - your storage for example - to its own. (The destination and source TCP port numbers recorded in the packets are sometimes also changed in a similar way.) Then it makes their routing decisions based on address information extracted from the first few packets in the TCP stream, and do not inspect packet content.
And to answer "if load balancer is responsible for reseting connections, will I not see RST packets on server side as it is being sent by LB with changed source ip?", I should say you will only see RST packets on other servers if their connection is needed to be reset with LoadBalancer, not client's connection with LoadBalancer.
BTW, I highly recommend you use tcpdump on your internal servers to see if you can receive requests from LoadBalancer or not, so you can see what is going wrong and you will find out how to fix your problems. Don't you bewilder yourself by using WireShark, it is a brilliant tool but you should be familiar enough to understand what it is showing you.
Best Answer
If you have more clients than backend servers, this might not be a problem. Try a least connection algorithm, like "leastconn" from haproxy. Making up an example, maybe your 10 switches stream metrics data via gRPC into 3 backend nodes of your monitoring platform. Every backend gets some work.
Even if you only have one connection, this still might not be a problem, assuming a single node can handle it. Effectively this becomes an active/passive configuration. Whether that host kept idle is worth the expense is your decision.
That said, sometimes load balancers inspect the application at layer 7. A common HTTP example is cookie affinity. Layer 7 is not required for long lived connections, however.