HLS and MPEG DASH are not particularly low latency as standard and the figures you are getting are not unusual.
Some examples from a publicly available DASH forum document (linked below) include:
Given the resources of some of these organisations, the results you have achieved are not bad!
There is quite a focus in the streaming industry at this time on enabling lower latency, the target being to come as close as possible to traditional broadcast latency.
One key component of the latency in chunked Adaptive Bit Rate (ABR, see this answer for more info: https://stackoverflow.com/a/42365034/334402 ) is the need for the player to receive and decode one or more segments of the video before it can display it. Traditionally the player had to receive the entire segment before it could start to decode and display it. The diagram from the first linked open source reference below illustrates this:
Low latency DASH and HLS leverage CMAF, 'Common Media Application Format' which breaks the segments, which might be 6 seconds long for example, into smaller 'chunks' within each segment. These chunks are designed to allow the player to decode and start playing them before it has received the full segment.
Other sources of latency in a typical live stream will be any transcoding from one format to another and any delay in a streaming server receiving the feed, from the webcam in your case, and encoding and packaging it for streaming.
There is quite a lot of good information available on low latency streaming at this time both from standards bodies and open source discussions which I think will really help you appreciate the issues (all links current at time of writing). From open source and standards discussions:
and from vendors:
Note - a common use case often quoted in the broadcast world is the case where someone watching a live event like a game may hear their neighbours celebrating a goal or touchdown before they see it themselves, because their feed has higher latency than their neighbours. While this is a driver for low latency, this is really a synchronisation issue which would require other solutions if a 'perfectly' synchronised solution was the goal.
As you can see low latency streaming is not a simple challenge and it may be that you want to consider other approaches depending on the details of your use case, including how many subscribers you have, whether some loss of quality if a fair trade off for lower latency etc. As mentioned by @user1390208 in the comments, a more real time focused video communication technology like WebRTC may be a better match for the solution you are targeting.
If you want to provide a service that provides life streaming and also a recording, you may want to consider using a real time protocol for the live streaming view and HLS/DASH streaming for anyone looking back through recordings where latency may not be important but quality may be more key.
Best Answer
Technologies and Requirements
The only web-based technology set really geared toward low latency is WebRTC. It's built for video conferencing. Codecs are tuned for low latency over quality. Bitrates are usually variable, opting for a stable connection over quality.
However, you don't necessarily need this low latency optimization for all of your users. In fact, from what I can gather on your requirements, low latency for everyone will hurt the user experience. While your users in control of the robot definitely need low latency video so they can reasonably control it, the users not in control don't have this requirement and can instead opt for reliable higher quality video.
How to Set it Up
In-Control Users to Robot Connection
Users controlling the robot will load a page that utilizes some WebRTC components for connecting to the camera and control server. To facilitate WebRTC connections, you need some sort of STUN server. To get around NAT and other firewall restrictions, you may need a TURN server. Both of these are usually built into Node.js-based WebRTC frameworks.
The cam/control server will also need to connect via WebRTC. Honestly, the easiest way to do this is to make your controlling application somewhat web based. Since you're using Node.js already, check out NW.js or Electron. Both can take advantage of the WebRTC capabilities already built in WebKit, while still giving you the flexibility to do whatever you'd like with Node.js.
The in-control users and the cam/control server will make a peer-to-peer connection via WebRTC (or TURN server if required). From there, you'll want to open up a media channel as well as a data channel. The data side can be used to send your robot commands. The media channel will of course be used for the low latency video stream being sent back to the in-control users.
Again, it's important to note that the video that will be sent back will be optimized for latency, not quality. This sort of connection also ensures a fast response to your commands.
Video for Viewing Users
Users that are simply viewing the stream and not controlling the robot can use normal video distribution methods. It is actually very important for you to use an existing CDN and transcoding services, since you will have 10k-15k people watching the stream. With that many users, you're probably going to want your video in a couple different codecs, and certainly a whole array of bitrates. Distribution with DASH or HLS is easiest to work with at the moment, and frees you of Flash requirements.
You will probably also want to send your stream to social media services. This is another reason why it's important to start with a high quality HD stream. Those services will transcode your video again, reducing quality. If you start with good quality first, you'll end up with better quality in the end.
Metadata (chat, control signals, etc.)
It isn't clear from your requirements what sort of metadata you need, but for small message-based data, you can use a web socket library, such as Socket.IO. As you scale this up to a few instances, you can use pub/sub, such as Redis, to distribution messaging throughout the servers.
To synchronize the metadata to the video depends a bit on what's in that metadata and what the synchronization requirement is, specifically. Generally speaking, you can assume that there will be a reasonable but unpredictable delay between the source video and the clients. After all, you cannot control how long they will buffer. Each device is different, each connection variable. What you can assume is that playback will begin with the first segment the client downloads. In other words, if a client starts buffering a video and begins playing it 2 seconds later, the video is 2 seconds behind from when the first request was made.
Detecting when playback actually begins client-side is possible. Since the server knows the timestamp for which video was sent to the client, it can inform the client of its offset relative to the beginning of video playback. Since you'll probably be using DASH or HLS and you need to use MCE with AJAX to get the data anyway, you can use the response headers in the segment response to indicate the timestamp for the beginning the segment. The client can then synchronize itself. Let me break this down step-by-step:
Date:
header can indicate the exact date/time for the start of the segment.Date:
header (let's say2016-06-01 20:31:00
). Client continues buffering the segments.00:00:00
on the video player is actualy2016-06-01 20:31:00
.This should meet your needs and give you the flexibility to do whatever you need to with your video going forward.
Why not [magic-technology-here]?
There are tradeoffs in every approach. I think what I have outlined here separates the concerns and gives you the best tradeoffs in each area. Please feel free to ask for clarification or ask follow-up questions in the comments.