REST – HTTP Status Code for ‘Still Processing’

httprest

I'm building a RESTful API that supports queuing long-running tasks for eventual handling.

The typical workflow for this API would be:

User fills in form
Client posts data to API
API returns 202 Accepted
Client redirects user to a unique URL for that request (/results/{request_id})
~eventually~
Client visits URL again, and sees the results on that page.

My trouble is on step 6. Any time a user visits the page, I file a request to my API (GET /api/results/{request_id}). Ideally, the task will have been completed by now, and I'd return a 200 OK with the results of their task.

But users are pushy, and I expect many overzealous refreshes, when the result is not yet finished processing.

What is my best option for a status code to indicate that:

this request exists,
it's not done yet,
but it also hasn't failed.

I don't expect a single code to communicate all of that, but I'd like something that lets me pass metadata instead of having the client expect content.

It could make sense to return a 202, since that would have no other meaning here: it's a GET request, so nothing is possibly being "accepted." Would that be a reasonable choice?

The obvious alternative to all this — which functions, but defeats one purpose of status codes — would be to always include the metadata:

200 OK

{
    status: "complete",
    data: {
        foo: "123"
    }
}

…or…

200 OK

{
    status: "pending"
}

Then client-side, I would (sigh) switch on response.data.status to determine whether the request was completed.

Is this what I should be doing? Or is there a better alternative? This just feels so Web 1.0 to me.

Best Answer

HTTP 202 Accepted (HTTP/1.1)

You are looking for HTTP 202 Accepted status. See RFC 2616:

The request has been accepted for processing, but the processing has not been completed.

HTTP 102 Processing (WebDAV)

RFC 2518 suggests using HTTP 102 Processing:

The 102 (Processing) status code is an interim response used to inform the client that the server has accepted the complete request, but has not yet completed it.

but it has a caveat:

The server MUST send a final response after the request has been completed.

I'm not sure how to interpret the last sentence. Should the server avoid sending anything during the processing, and respond only after the completion? Or it only forces to end the response only when the processing terminates? This could be useful if you want to report progress. Send HTTP 102 and flush response byte by byte (or line by line).

For instance, for a long but linear process, you can send one hundred dots, flushing after each character. If the client side (such as a JavaScript application) knows that it should expect exactly 100 characters, it can match it with a progress bar to show to the user.

Another example concerns a process which consists of several non-linear steps. After each step, you can flush a log message which would eventually be displayed to the user, so that the end user could know how the process is going.

Issues with progressive flushing

Note that while this technique has its merits, I wouldn't recommend it. One of the reasons is that it forces the connection to remain open, which could hurt in terms of service availability and doesn't scale well.

A better approach is to respond with HTTP 202 Accepted and either let the user to get back to you later to determine whether the processing ended (for instance by calling repeatedly a given URI such as /process/result which would respond with HTTP 404 Not Found or HTTP 409 Conflict until the process finishes and the result is ready), or notify the user when the processing is done if you're able to call the client back for instance through a message queue service (example) or WebSockets.

Practical example

Imagine a web service which converts videos. The entry point is:

POST /video/convert

which takes a video file from the HTTP request and does some magic with it. Let's imagine that the magic is CPU-intensive, so it cannot be done in real-time during the transfer of the request. This means that once the file is transferred, the server will respond with a HTTP 202 Accepted with some JSON content, meaning “Yes, I got your video, and I'm working on it; it will be ready somewhere in the future and will be available through the ID 123.”

The client has a possibility to subscribe to a message queue to be notified when the processing finishes. Once it is finished, the client can download the processed video by going to:

GET /video/download/123

which leads to an HTTP 200.

What happens if the client queries this URI before receiving the notification? Well, the server will respond with HTTP 404 since, indeed, the video doesn't exist yet. It may be currently prepared. It may never been requested. It may exist some time in the past and be removed later. All that matters is that the resulting video is not available.

Now, what if the client cares not only about the final video, but also about the progress (which would be even more important if there is no message queue service or any similar mechanism)?

In this case, you can use another endpoint:

GET /video/status/123

which would result a response similar to this:

HTTP 200
{
    "id": 123,
    "status": "queued",
    "priority": 2,
    "progress-percent": 0,
    "submitted-utc-time": "2016-04-19T13:59:22"
}

Doing the request over and over will show the progress until it's:

HTTP 200
{
    "id": 123,
    "status": "done",
    "progress-percent": 100,
    "submitted-utc-time": "2016-04-19T13:59:22"
}

It is crucial to make a difference between those three types of requests:

POST /video/convert queues a task. It should be called only once: calling it again would queue an additional task.
GET /video/download/123 concerns the result of the operation: the resource is the video. The processing—that is what happened under the hood to prepare the actual result prior to request and independently to the request—is irrelevant here. It can be called once or several times.
GET /video/status/123 concerns the processing per se. It doesn't queue anything. It doesn't care about the resulting video. The resource is the processing itself. It can be called once or several times.

Regarding the example requests

/GoalTree/GetByDate?versionDate=...
/GoalTree/GetById?versionId=...

For the format, you said, you always return the nearest revision to that date. It will never not return an object, so it should always be returning 200 OK. Even if this were able to take a date range, and the logic were to return all objects within that timeframe returning 200 OK - 0 Results is ok, as that is what the request was for - the set of things that met that criteria.

However, the latter is different as you are asking for a specific object, presumably unique, with that identity. Returning 200 OK in this case is wrong as the requested resource doesn't exist and is not found.

Regarding choosing status codes

2xx codes Tell a User Agent (UA) that it did the right thing, the request worked. It can keep doing this in the future.
3xx codes Tell a UA what you asked probably used to work, but that thing is now elsewhere. In future the UA might consider just going to the redirect.
4xx codes Tell a UA it did something wrong, the request it constructed isn't proper and shouldn't try it again, without at least some modification.
5xx codes Tell a UA the server is broken somehow. But hey that query could work in the future, so there is no reason not to try it again. (except for 501, which is more of a 400 issue).

You mentioned in a comment using a 5xx code, but your system is working. It was asked a query that doesn't work and needs to communicate that to the UA. No matter how you slice it, this is 4xx territory.

Consider an alien querying our solar system

Alien: Computer, please tell me all planets that humans inhabit.

Computer: 1 result found. Earth

Alien: Computer, please tell me about Earth.

Computer: Earth - Mostly Harmless.

Alien: Computer, please tell me about all planets humans inhabit, outside the asteroid belt.

Computer: 0 results found.

Alien: Computer, please destroy Earth.

Computer: 200 OK.

Alien: Computer, please tell me about Earth.

Computer: 404 - Not Found

Alien: Computer, please tell me all planets that humans inhabit.

Computer: 0 results found.

Alien: Victory for the mighty Irken Empire!

Request content-type of HTTP errors

RFC2616 states: " Any response containing an entity-body MAY be subject to negotiation, including error responses." This means that it is acceptable to use the Accept header for this purpose, but that the server is not required to do so (it may choose to ignore the browser's preferences for error messages).

Note that typically, a browser would not send a header that looks like your examples:the usual way of using the Accept header is to list all formats you are capable of understanding in order of preference, not just a single format. The application to error messages is more apparent in this case, as rather than requesting "application/pdf", the browser would be more likely to request "application/pdf, txt/html, text/plain", which the server is able to fulfill.