There's a general misconception (and misuse) associated with 403 Forbidden
: it's not supposed to give anything away about what the server thinks about the request. It's specifically designed to say,
I get what you're requesting, but I'm not going handle the request, no matter what you try. So stop trying.
Any UA or client should interpret that to mean that the request will never work, and respond appropriately.
This has implications for clients handling requests on behalf of users: if a user isn't logged in, or mistypes, the client handling the request should reply, "I'm sorry, but I can't do anything" after the first time it gets the 403
and stop handling future requests. Obviously, if you want a user to still be able to request access to their personal information after a failure, this is a user-hostile behavior.
403
is in contrast to 401 Authorization Required
, which does give away that the server will handle the request as long as you pass the correct credentials. This is usually what people think about when they hear 403
.
It's also in contrast with 404 Page Not Found
which, as others pointed out, is designed not only to say "I can't find that page" but to suggest to the client that the server makes no claims of success or failure for future requests.
With 401
and 404
, the server doesn't say anything to the client or UA about how they should proceed: they can keep trying in hopes of getting a different response.
So 404
is the appropriate way to handle a page you don't want to show to everyone, but don't want to give away anything about why you won't show it in certain situations.
Of course, this assumes the client making the request cares for petty RFC flippancy. A malicious enough client isn't going to care about the status code returned except in an incidental manner. One will know it's a hidden user page (or a potential hidden user page) by comparing it to other, known user pages.
That is, let's say your handler is users/*
. If I know users/foo
, users/bar
and users/baaz
work, the server returning a 401
, 403
, or 404
for users/quux
doesn't mean I'm not going to try it, especially if I have reason to believe there is a quux
user. A standard example scenario is Facebook: my profile is private, but my comments on public profiles are not. A malicious client knows I exist even if you return 404
on my profile page.
So status codes aren't for the malicious use cases, they're for the clients playing by the rules. And for those clients, a 401
or a 404
request is most appropriate.
The whole of the Internet is built on conventions. We call them RFCs. While nobody will come and arrest you if you violate an RFC, you do run the risk that your service will not interoperate with the rest of the world. And if that happens, you run the risk of your startup not getting any customers, your business getting bad press, your stockholders revolting, your getting laid off permanently, etc.
HTTP status codes have their own IANA registry, each one traceable back to the RFC (or in one case, I-D) that defined it.
In the particular case of Twitter's strange 420 status code versus the standard 429 status code defined in RFC 6585, the most likely explanation is that the latter was only recently defined; the RFC dates to April 2012. We see that Twitter only uses 420 in the previous deprecated version 1 of its API; the current API version 1.1 actually uses the 429 status code. So it's clear that Twitter needed a status code for this and defined their own; once a standard one was available they switched to it.
Best practice, of course, is to stick as closely to the standards as possible. When you read RFCs, you will almost always find words like "MUST" and "SHOULD"; these have specific meanings when you are building your application, which you can find in RFC 2119.
Best Answer
No, there's no pre-defined set of "must-have" HTTP status codes. Out of the 40-or-so status codes listed in RFC 2616 I put together a list of codes that, in my experience, are the ones you should consider using to their full extent.
200, 201
302, 304
400, 403, 404
500
8 status codes, not that bad. Depending on your particular application, there are others that may be helpful (202, 503, etc) but the majority of status codes (particularly in the 1xx/4xx/5xx groups) are only useful in specific conditions or will be handled by your application server.
In short, use the codes that match the semantics of your application's API.
To address your specific example of replacing 401/403 with 404, that's a special case where revealing the existence of a resource is itself a security risk. There are few applications where this behavior is necessary.
GET /users/admin/edit
should return 403.GET /docs/top-secret/missile-launch-codes
should return 404.