Generally speaking, the direction of the traffic is determined by the location of the host establishing the connection.
In your example, as you indicated, the original VPN connection (#1) is inbound because the traffic originates at a host outside the network which establishes a connection to an internal resource.
Once this connection is established (#2), while the data flows in both directions, it is still an inbound connection because it was still initiated from the external host. All return traffic is considered to be part of this original connection.
When a client is connected to VPN, this establishes a "virtual" presence inside the network. So, for the VPN client accessing the web site (#3), the physical host uses the inbound connection to the virtual host that was already created, and the virtual host establishes an outbound connection through the firewall to the web site.
Again, this may not be universally true with all vendors, but generally this is how inbound/outbound connections are treated.
First note that all of these technical terms are subject to distortion by Marketing, so sometimes you have to read between the lines.
The term switch refers to a layer-2 device, so it is forwarding on layer-2 info. On an Ethernet this would be the 48 bit MAC. With MPLS it would be the Label. The term router officially refers to a layer-3 device, so it is forwarding on layer-3 info (the 32/128 bit IPv[46] address, unless you have a very strange network). That's the primary difference between the terms.
How each device actually implements that forwarding is partly controlled by the protocols defined to exchange the relevant data, and the engineering choices made by the device designer, and is mostly opaque to the outside. So, to get more detail on how a particular product does it's thing, you need to get more detail from the manufacturer than the legal department wants them to divulge, so under NDA (Non-Disclosure Agreement). Unless the device you care about is just a general purpose computer running Free or Open Source Software, in which case, you look at the source.
Best Answer
The architecture of SDN prevents stateful in-line processing of packet flows (ignoring using firewall hardware, NFV or experimental stateful switches).
Therefore anything that monitors the state of a flow requires that state to be held by the controller. This causes unacceptable workload for the controller and increases state held across (typically) multiple controllers. Consequently most SDN FW proposals are either static or dynamic packet-filters, for example using a rule checking algorithm before installing FW switch rules.
Take for example, closing a TCP flow. A traditional firewall observes the FIN handshake (2x FIN, 2x ACK) as it happens and closes the firewall on seeing the last ACK packet. Rather that deal with this state, an OpenFlow switch uses a time-out, meaning the firewall hole is left open. An attacker can then use the switches renewal of a time-out to keep the firewall open until the attacker is finished.
For the controller to close the hole, means the controller needs to see the FIN packets (increasing rules in the rule space, possibly buffering in the switch) and the two responding ACKs, but obviously without seeing every ACK in the flow and equally obviously allowing the flow to finish correctly.
Setting up a flow based on seeing a SYN packet or two is easy. Removing a rule so your firewall is in synch with the two end hosts, not so easy using the SDN architecture.
Of course YMMV depending on what you want your SDN packet-filter to do.