REST – HTTP Status Code for ‘Still Processing’

httprest

I'm building a RESTful API that supports queuing long-running tasks for eventual handling.

The typical workflow for this API would be:

  1. User fills in form
  2. Client posts data to API
  3. API returns 202 Accepted
  4. Client redirects user to a unique URL for that request (/results/{request_id})
  5. ~eventually~
  6. Client visits URL again, and sees the results on that page.

My trouble is on step 6. Any time a user visits the page, I file a request to my API (GET /api/results/{request_id}). Ideally, the task will have been completed by now, and I'd return a 200 OK with the results of their task.

But users are pushy, and I expect many overzealous refreshes, when the result is not yet finished processing.

What is my best option for a status code to indicate that:

  • this request exists,
  • it's not done yet,
  • but it also hasn't failed.

I don't expect a single code to communicate all of that, but I'd like something that lets me pass metadata instead of having the client expect content.

It could make sense to return a 202, since that would have no other meaning here: it's a GET request, so nothing is possibly being "accepted." Would that be a reasonable choice?

The obvious alternative to all this — which functions, but defeats one purpose of status codes — would be to always include the metadata:

200 OK

{
    status: "complete",
    data: {
        foo: "123"
    }
}

…or…

200 OK

{
    status: "pending"
}

Then client-side, I would (sigh) switch on response.data.status to determine whether the request was completed.

Is this what I should be doing? Or is there a better alternative? This just feels so Web 1.0 to me.

Best Answer

HTTP 202 Accepted (HTTP/1.1)

You are looking for HTTP 202 Accepted status. See RFC 2616:

The request has been accepted for processing, but the processing has not been completed.

HTTP 102 Processing (WebDAV)

RFC 2518 suggests using HTTP 102 Processing:

The 102 (Processing) status code is an interim response used to inform the client that the server has accepted the complete request, but has not yet completed it.

but it has a caveat:

The server MUST send a final response after the request has been completed.

I'm not sure how to interpret the last sentence. Should the server avoid sending anything during the processing, and respond only after the completion? Or it only forces to end the response only when the processing terminates? This could be useful if you want to report progress. Send HTTP 102 and flush response byte by byte (or line by line).

For instance, for a long but linear process, you can send one hundred dots, flushing after each character. If the client side (such as a JavaScript application) knows that it should expect exactly 100 characters, it can match it with a progress bar to show to the user.

Another example concerns a process which consists of several non-linear steps. After each step, you can flush a log message which would eventually be displayed to the user, so that the end user could know how the process is going.

Issues with progressive flushing

Note that while this technique has its merits, I wouldn't recommend it. One of the reasons is that it forces the connection to remain open, which could hurt in terms of service availability and doesn't scale well.

A better approach is to respond with HTTP 202 Accepted and either let the user to get back to you later to determine whether the processing ended (for instance by calling repeatedly a given URI such as /process/result which would respond with HTTP 404 Not Found or HTTP 409 Conflict until the process finishes and the result is ready), or notify the user when the processing is done if you're able to call the client back for instance through a message queue service (example) or WebSockets.

Practical example

Imagine a web service which converts videos. The entry point is:

POST /video/convert

which takes a video file from the HTTP request and does some magic with it. Let's imagine that the magic is CPU-intensive, so it cannot be done in real-time during the transfer of the request. This means that once the file is transferred, the server will respond with a HTTP 202 Accepted with some JSON content, meaning “Yes, I got your video, and I'm working on it; it will be ready somewhere in the future and will be available through the ID 123.”

The client has a possibility to subscribe to a message queue to be notified when the processing finishes. Once it is finished, the client can download the processed video by going to:

GET /video/download/123

which leads to an HTTP 200.

What happens if the client queries this URI before receiving the notification? Well, the server will respond with HTTP 404 since, indeed, the video doesn't exist yet. It may be currently prepared. It may never been requested. It may exist some time in the past and be removed later. All that matters is that the resulting video is not available.

Now, what if the client cares not only about the final video, but also about the progress (which would be even more important if there is no message queue service or any similar mechanism)?

In this case, you can use another endpoint:

GET /video/status/123

which would result a response similar to this:

HTTP 200
{
    "id": 123,
    "status": "queued",
    "priority": 2,
    "progress-percent": 0,
    "submitted-utc-time": "2016-04-19T13:59:22"
}

Doing the request over and over will show the progress until it's:

HTTP 200
{
    "id": 123,
    "status": "done",
    "progress-percent": 100,
    "submitted-utc-time": "2016-04-19T13:59:22"
}

It is crucial to make a difference between those three types of requests:

  • POST /video/convert queues a task. It should be called only once: calling it again would queue an additional task.
  • GET /video/download/123 concerns the result of the operation: the resource is the video. The processing—that is what happened under the hood to prepare the actual result prior to request and independently to the request—is irrelevant here. It can be called once or several times.
  • GET /video/status/123 concerns the processing per se. It doesn't queue anything. It doesn't care about the resulting video. The resource is the processing itself. It can be called once or several times.