No, there's no pre-defined set of "must-have" HTTP status codes. Out of the 40-or-so status codes listed in RFC 2616 I put together a list of codes that, in my experience, are the ones you should consider using to their full extent.
200, 201
302, 304
400, 403, 404
500
8 status codes, not that bad. Depending on your particular application, there are others that may be helpful (202, 503, etc) but the majority of status codes (particularly in the 1xx/4xx/5xx groups) are only useful in specific conditions or will be handled by your application server.
In short, use the codes that match the semantics of your application's API.
To address your specific example of replacing 401/403 with 404, that's a special case where revealing the existence of a resource is itself a security risk. There are few applications where this behavior is necessary.
GET /users/admin/edit
should return 403.
GET /docs/top-secret/missile-launch-codes
should return 404.
Interesting question.
Basically, we can reduce this down to the right way to classify things in terms analogous to OSI layers. HTTP is commonly defined as an Application Level protocol, and HTTP is indeed a generic Client/Server protocol.
However, in practice, the server is almost always a relaying device, and the client is a web browser, responsible for interpreting and rendering content: The server just passes things on to an arbitrary application, and that applications sends back arbitrary scripts which the browser is responsible for executing. The HTTP interaction itself--the request/response forms, status codes, and so on--is mostly an affair of how to request, serve, and render arbitrary content as efficiently as possible, without getting in the way. Many of the status codes and headers are indeed designed for these purposes.
The problem with trying to piggyback the HTTP protocol for handling application-specific flows, is that you're left with one of two options: 1) You must make your request/response logic a subset of the HTTP rules; or 2) You must reuse certain rules, and then the separation of concerns tends to get fuzzy. This can look nice and clean at first, but I think it's one of those design decisions you end up regretting as your project evolves.
Therefore, I would say it is better to be explicit about the separation of protocols. Let the HTTP server and the web browser do their own thing, and let the app do its own thing. The app needs to be able to make requests, and it needs the responses--and its logic as to how to request, how to interpret the responses, can be more (or less) complex than the HTTP perspective.
The other benefit of this approach, which is worth mentioning, is that applications should, generally speaking, not be dependent upon an underlying transport protocol (from a logical point of view). HTTP itself has changed in the past, and now we have HTTP 2 kicking in, following SPDY. If you view your app as no more than an HTTP functionality plugin, you might get stuck there when new infrastructures take over.
Best Answer
Don't invent status codes
You are not expected to invent your own response codes, since the point of the API is to use a standard interface any developer can understand.
The fact that you maintain both the API and its client is irrelevant: since everyone can trace the calls to the API, everyone can implement a different client. The point of using standard interfaces is also:
To facilitate maintenance. If your app should later be maintained by your coworker, he can find his way with ease. He will be lost if it appears that HTTP 306 means success, HTTP 500 is a “Not found” and HTTP 909 is “Internal Server Error”, unless the error is a
FileNotFound
exception, in which case it's HTTP 404.To facilitate the work of your system administrator. Tools which deal with server logs use status codes to determine whether this is a error or not. Two concrete examples are:
The displaying of the number of errors (HTTP 4xx and HTTP 5xx) for a given server and:
The tracking of hacking attempts. For instance, when I see a hacking attempt which results in HTTP 200, this immediately attracts my attention: it should be HTTP 4xx instead.
To avoid messing with your HTTP server. IIS, for instance, can easily be offended by some status codes, will catch them and generate its own responses. Configuring both your app and IIS is not always easy (and I've spent several days banging my head with this issue when I started programming ASP.NET websites).
To avoid problems with proxies. For instance, if your API is hosted on a server accessed through Nginx, I'm not sure sysadmins will be happy to spend a few hours redoing all the configuration for your app.
A basic example. When configuring failover, I use
proxy_next_upstream
directive which indicates in which cases Nginx should switch to the failover machine. Looking at the documentation, I have an impression that you can't set the directive to, say, HTTP 200, so if your API usesHTTP 200
as “Fatal error, all data was corrupted so you may be better using another mirror”, you're out of luck.To avoid reinventing the wheel. Why would you waste your time standardizing your set codes, while there is already one?
To leverage support from many tools. For instance, Fiddler relies on HTTP response code to colorize the items. I'm not sure if it's easy to configure it to use different statuses. In the same way, different API testing tools may rely on status codes as well (while CURL, conveniently, couldn't care less about the response status, Python's
requests
package, for instance, will do a redirection in a presence of HTTP 3xx code).By the way, browsers themselves interpret the response code:
AJAX requests returning HTTP 3xx may lead to automatic redirection.
HTTP 4xx and HTTP 5xx are displayed in red in browsers' developer tools.
Provide additional information
The response code other than HTTP 200 doesn't mean that you can't feed the client with JSON as well (unless it's HTTP 204, in which case there should be no content). This means that you're free to give as much information as you want to the client through JSON. Personally, I include:
The ID of the error in a form of a string. For instance
price-range-invalid
orproduct-not-found
.The URI containing additional help, when relevant (when the API is large enough to contain individual pages for every error message). For instance http://example.com/api/v1/errors/price-range-invalid.
The description. For instance “The current price is outside the allowed range. It should be superior to 0 and inferior or equal to 50000.”.
See also how other APIs handle the errors and what sort of information they provide. For instance, this is how Google+ API errors look like:
Don't provide too much information
Finally, make sure you don't include the exception itself in the details, since it may contain sensitive information and represent a path for a hacker to explore (including the stack trace).
For instance, some PHP websites, when encountering a database related error, just show the SQL query and the error to the user. For a hacker, this is very convenient. For a legitimate user, this is just unfriendly and unhelpful. Don't do that.