Php – Long Polling/HTTP Streaming General Questions

comethttp-streamingjquerylong-pollingPHP

I'm trying to make a theoretical web chat application with php and jquery, I've read about long polling and http streaming, and I managed to apply most principles introduced in the articles. However, there are 2 main things I still can't get my head around.

With Long Polling

How will the server know when an update have been sent? will it need to query the databse continually or is there a better way?

With HTTP Streaming

How do I check for the results during the Ajax connection is still active? I'm aware of jQuery's success function for ajax calls, but how do I check the data while the connection is still ongoing?

I'll appreciate any and all answers, thanks in advance.

Best Answer

Yeah, the Comet-like techniques usually blowing up the brain in the beginning -- just making you think in a different way. And another problem is there are not that much resources available for PHP, cuz everyone's doing their Comet in node.js, Python, Java, etc.

I'll try to answer your questions, hope it would shed some light on this topic for people.

How will the server know when an update have been sent? will it need to query the databse continually or is there a better way?

The answer is: in the most general case you should use a message queue (MQ). RabbitMQ or the Pub/Sub functionality built into the Redis store may be a good choices, though there are many competing solutions on the market available such as ZeroMQ, Beanstalkd, etc.

So instead of continuous querying your database, you can just subscribe for an MQ-event and just hang until someone else will publish a message you subscribed for and MQ will wake you up and send a message. The chat app is a very good use case to understand this functionality.

Also I have to mention that if you would search for Comet-chat implementations in other languages, you might notice simple ones not using MQ. So how do they exchange the information then? The thing is such solutions are usually implemented as standalone single-threaded asynchronous servers, so they can store all connections in a thread local array (or something similar), handle many connections in a single loop and just pick a one and notify when needed. Such asynchronous server implementations are a modern approach that fits Comet-technique really great. However you're most likely implementing your Comet on top of mod_php or FastCGI, in this case this simple approach is not an option for you and you should use MQ.

This could still be very useful to understand how to implement a standalone asynchronous Comet-server to handle many connections in a single thread. Recent versions of PHP support Libevent and Socket Streams, so it is possible to implement such kind of server in PHP as well. There's also an example available in PHP documentation.

How do I check for the results during the Ajax connection is still active? I'm aware of jQuery's success function for ajax calls, but how do I check the data while the connection is still ongoing?

If you're doing your long-running polls with a usual Ajax technique such as plain XHR, jQuery Ajax, etc. you don't have an easy way to transmit several responses in a single Ajax request. As you mentioned you only have 'success' handler to deal with the response in whole and not with its part. As a workaround people send only a single response per request and process it in a 'success' handler, after that they just open a new long-poll request. This is just how HTTP-protocol works.

Also should be mentioned that actually there are workaround to implement streaming-like functionality using various techniques using techniques such as infinitely long page in a hidden IFRAME or using multipart HTTP-responses. Both of those methods are certain drawbacks (the former one is considered unreliable and sometimes could produce unwanted browser behavior such as infinite loading indicator and the latter one leaks consistent and straightforward cross-browser support, however certain applications still are known to successfully rely on that mechanism falling back to long-polling when the browser can't properly handle multipart responses).

If you'd like to handle multiple responses per single request/connection in a reliable way you should consider using a more advanced technology such as WebSocket which is supported by the most current browsers or on any platform that supports raw sockets (such as Flash or if you develop for a mobile app for instance).

Could you please elaborate more on message queues?

Message Queue is a term that describes a standalone (or built-in) implementation of the Observer pattern (also known as 'Publish/Subscribe' or simply PubSub). If you develop a big application, having one is very useful -- it allows you to decouple different parts of your system, implement event-driven asynchronous design and make your life much easier, especially in a heterogeneous systems. It has many applications to the real-world systems, I'll mention just couple of them:

Task queues. Let's say we're writing our own YouTube and need to convert users' video files in the background. We should obviously have a webapp with the UI to upload a movie and some fixed number of worker processes to convert the video files (maybe we would even need a number of dedicated servers where our workers only will leave). Also we would probably have to write our workers in C to ensure better performance. All we have to do is just setup a message queue server to collect and deliver video-conversion tasks from the webapp to our workers. When the worker spawns it connects to the MQ and goes idle waiting for a new tasks. When someone uploads a video file the webapp connects to the MQ and publishes a message with a new job. Powerful MQs such as RabbitMQ can equally distribute tasks among number of workers connected, keep track of what tasks had been completed, ensure nothing will get lost and will provide fail-over and even admin UI to browse current tasks pending and stats.
Asynchronous behavior. Our Comet-chat is a good example. Obviously we don't want to periodically poll our database all time (what's the use of Comet then? -- Not big difference of doing periodical Ajax-requests). We would rather need someone to notify us when a new chat-message appears. And a message queue is that someone. Let's say we're using Redis key/value store -- this is a really great tool that provides PubSub implementation among its data store features. The simplest scenario may look like following:
1. After someone enters the chat room a new Ajax long poll request is being made.
2. Request handler on the server side issues the command to Redis to subscribe a 'newmessage' channel.
3. Once someone enters a message into his chat the server-side handler publishes a message into the Redis' 'newmessage' topic.
4. Once a message is published, Redis will immediately notify all those pending handlers which subscribed to that channel before.
5. Upon notification PHP-code that keeps long-poll request open, can return the request with a new chat message, so all users will be notified. They can read new messages from the database at that moment, or the messages may be transmitted directly inside message payload.

I hope my illustration is easy to understand, however message queues is a very broad topic, so refer to the resources mentioned above for further reading.

Related Solutions

Php – How to implement basic “Long Polling”

It's simpler than I initially thought.. Basically you have a page that does nothing, until the data you want to send is available (say, a new message arrives).

Here is a really basic example, which sends a simple string after 2-10 seconds. 1 in 3 chance of returning an error 404 (to show error handling in the coming Javascript example)

msgsrv.php

<?php
if(rand(1,3) == 1){
    /* Fake an error */
    header("HTTP/1.0 404 Not Found");
    die();
}

/* Send a string after a random number of seconds (2-10) */
sleep(rand(2,10));
echo("Hi! Have a random number: " . rand(1,10));
?>

Note: With a real site, running this on a regular web-server like Apache will quickly tie up all the "worker threads" and leave it unable to respond to other requests.. There are ways around this, but it is recommended to write a "long-poll server" in something like Python's twisted, which does not rely on one thread per request. cometD is an popular one (which is available in several languages), and Tornado is a new framework made specifically for such tasks (it was built for FriendFeed's long-polling code)... but as a simple example, Apache is more than adequate! This script could easily be written in any language (I chose Apache/PHP as they are very common, and I happened to be running them locally)

Then, in Javascript, you request the above file (msg_srv.php), and wait for a response. When you get one, you act upon the data. Then you request the file and wait again, act upon the data (and repeat)

What follows is an example of such a page.. When the page is loaded, it sends the initial request for the msgsrv.php file.. If it succeeds, we append the message to the #messages div, then after 1 second we call the waitForMsg function again, which triggers the wait.

The 1 second setTimeout() is a really basic rate-limiter, it works fine without this, but if msgsrv.php always returns instantly (with a syntax error, for example) - you flood the browser and it can quickly freeze up. This would better be done checking if the file contains a valid JSON response, and/or keeping a running total of requests-per-minute/second, and pausing appropriately.

If the page errors, it appends the error to the #messages div, waits 15 seconds and then tries again (identical to how we wait 1 second after each message)

The nice thing about this approach is it is very resilient. If the clients internet connection dies, it will timeout, then try and reconnect - this is inherent in how long polling works, no complicated error-handling is required

Anyway, the long_poller.htm code, using the jQuery framework:

<html>
<head>
    <title>BargePoller</title>
    <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.2.6/jquery.min.js" type="text/javascript" charset="utf-8"></script>

    <style type="text/css" media="screen">
      body{ background:#000;color:#fff;font-size:.9em; }
      .msg{ background:#aaa;padding:.2em; border-bottom:1px #000 solid}
      .old{ background-color:#246499;}
      .new{ background-color:#3B9957;}
    .error{ background-color:#992E36;}
    </style>

    <script type="text/javascript" charset="utf-8">
    function addmsg(type, msg){
        /* Simple helper to add a div.
        type is the name of a CSS class (old/new/error).
        msg is the contents of the div */
        $("#messages").append(
            "<div class='msg "+ type +"'>"+ msg +"</div>"
        );
    }

    function waitForMsg(){
        /* This requests the url "msgsrv.php"
        When it complete (or errors)*/
        $.ajax({
            type: "GET",
            url: "msgsrv.php",

            async: true, /* If set to non-async, browser shows page as "Loading.."*/
            cache: false,
            timeout:50000, /* Timeout in ms */

            success: function(data){ /* called when request to barge.php completes */
                addmsg("new", data); /* Add response to a .msg div (with the "new" class)*/
                setTimeout(
                    waitForMsg, /* Request next message */
                    1000 /* ..after 1 seconds */
                );
            },
            error: function(XMLHttpRequest, textStatus, errorThrown){
                addmsg("error", textStatus + " (" + errorThrown + ")");
                setTimeout(
                    waitForMsg, /* Try again after.. */
                    15000); /* milliseconds (15seconds) */
            }
        });
    };

    $(document).ready(function(){
        waitForMsg(); /* Start the inital request */
    });
    </script>
</head>
<body>
    <div id="messages">
        <div class="msg old">
            BargePoll message requester!
        </div>
    </div>
</body>
</html>

Php – What are Long-Polling, Websockets, Server-Sent Events (SSE) and Comet

In the examples below the client is the browser and the server is the webserver hosting the website.

Before you can understand these technologies, you have to understand classic HTTP web traffic first.

Regular HTTP:

A client requests a webpage from a server.
The server calculates the response
The server sends the response to the client.

HTTP

Ajax Polling:

A client requests a webpage from a server using regular HTTP (see HTTP above).
The client receives the requested webpage and executes the JavaScript on the page which requests a file from the server at regular intervals (e.g. 0.5 seconds).
The server calculates each response and sends it back, just like normal HTTP traffic.

Ajax Polling

Ajax Long-Polling:

A client requests a webpage from a server using regular HTTP (see HTTP above).
The client receives the requested webpage and executes the JavaScript on the page which requests a file from the server.
The server does not immediately respond with the requested information but waits until there's new information available.
When there's new information available, the server responds with the new information.
The client receives the new information and immediately sends another request to the server, re-starting the process.

Ajax Long-Polling

HTML5 Server Sent Events (SSE) / EventSource:

A client requests a webpage from a server using regular HTTP (see HTTP above).
The client receives the requested webpage and executes the JavaScript on the page which opens a connection to the server.
The server sends an event to the client when there's new information available.
- Real-time traffic from server to client, mostly that's what you'll need
- You'll want to use a server that has an event loop
- Connections with servers from other domains are only possible with correct CORS settings
- If you want to read more, I found these very useful: (article), (article), (article), (tutorial).

HTML5 SSE

HTML5 Websockets:

A client requests a webpage from a server using regular http (see HTTP above).
The client receives the requested webpage and executes the JavaScript on the page which opens a connection with the server.
The server and the client can now send each other messages when new data (on either side) is available.
- Real-time traffic from the server to the client and from the client to the server
- You'll want to use a server that has an event loop
- With WebSockets it is possible to connect with a server from another domain.
- It is also possible to use a third party hosted websocket server, for example Pusher or others. This way you'll only have to implement the client side, which is very easy!
- If you want to read more, I found these very useful: (article), (article) (tutorial).

HTML5 WebSockets

Comet:

Comet is a collection of techniques prior to HTML5 which use streaming and long-polling to achieve real time applications. Read more on wikipedia or this article.

Now, which one of them should I use for a realtime app (that I need to code). I have been hearing a lot about websockets (with socket.io [a node.js library]) but why not PHP ?

You can use PHP with WebSockets, check out Ratchet.