I really like the first approach in general.
- it's simple to understand and implement
- it's secure (to my knowledge)
- it's a not uncommon approach which I've seen used in the past
One thing I don't see mentioned about the first that you should keep in mind, the timestamp used to hash the token needs to have a TTL expiry that's exceedingly short (like 1 second) so you verify the message wasn't sent with the same timestamp and token from a message 12 hours earlier; obviously it would calculate as legit but is not in this case.
If these are the only two options you're considering though I'd just like to make sure you've looked at other approaches too, as there are many. More than I'm going to list in fact. These are some common auth approaches which are worth studying just to see if they might fit your purpose better, and if nothing else understanding them may give you some ideas to help tighten up whichever approach you do go with.
Do note, I am not a security expert.
OAuth/Federated
In this approach you have a 3rd party guarantor where the consuming code requests the token/cert/what have you from them and passes that to you, at this point all you need to do is ask the 3rd party if the key you were given is legit.
Pro:
- Standards based
- Issues will be found by others on other people's systems so you will find out if insecurity happens
- Much less auth work will be needed by you
Con:
- You have to deal with a 3rd party servicer and their API, or create and host your own "3rd party" to segregate the auth out of your main service.
- For many services overkill, but conceptually worth considering
Asynchronous Certificates
Here you would have your clients encrypt their communications with a public cert you have shared with them when they created a user. On your side you would decrypt using the private key associated with there user. Generally you would initiate the communication with a challenge-response to show they can encrypt/decrypt as you expect identifying them as who they claim to be. Though "synchronous" approaches are possible which don't use the challenge-response, they have slightly less security and some time synchronization issues which can make them trickier.
from Novell (yeah I know, novell? really?)
Tokens use a variable as the basis to generate the one-time password.
This variable is called the challenge. The two main methods for
determining the variable used to generate the password are
asynchronous or synchronous.
With the asynchronous or challenge-response method, the server
software sends the token an external challenge---a randomly generated
variable--- for the token device to encrypt. The token uses this
challenge variable, the encryption algorithm, and the shared secret to
generate the response---the correctly encrypted password.
With the synchronous method, the challenge variable used to generate
the password is determined internally by the token and the server. A
time counter, event counter, or time and event counter combination
within each device is used as the basis for the challenge variable.
Because the token and the server each separately and internally
determine the challenge variable from their own counters, it is very
important for their time counters and the event counters to stay
synchronized. Because it is so easy for the server and the token to
get out of sync, most implementations allow for a certain amount of
drift between the counters. Usually, a small range or window of these
counter values is used to compute the password. However, if the token
and server get out of sync beyond this window, a special procedure is
necessary to synchronize them.
Pro:
- Certificates have CA roots which make them trustworthy and difficult to forge
- There are standard facilities in operating systems for managing and maintaining cert stores easily
- Well-studied approach, lots of information available on it
- Expiry along with a variety of other things are in-built facilities of standard certificates, they are generally robust
Con:
- Certificates can be tricky to work with programmatically
- Depending on if you require an external CA, may not be free
- May need to maintain cert stores manually to ensure expected root trusts are configured
NTLM
Don't laugh, if this is a smaller or internal only service and you're in a windows environment, there is nothing wrong with using standard NTLM authentication to guarantee access. Especially if you're working with IIS this is hands down the simplest approach. Easy to maintain and configure as well in a web.config.
Pro:
- Extremely easy to configure, implement, and maintain
Con:
- Minimal interoperability
- Not sufficient for public facing authentication
Nonces
When working with nonces in your authentication approach, you supply a method to get a nonce on the service. This method returns a unique arbitrary string or piece of data ("a nonce") on each request. Every request to other methods now require a nonce to be retrieved, and used in the crypto algorithm for the request. The value here is that the server keeps track of the nonces used, and never allows reuse of a nonce, this completely prevents replay attacks because once a request with one nonce is made, a request with that nonce can never be made again. As nonces are requested they're added to a list of available nonces, as they're used they're moved from the available list to the used list. When generating a nonce you ensure what you generate is not on the used list and the available list will never again have one of the old ones and therefore no repeats can be made.
Pro:
- Thwarts replay attacks quite well
- Not altogether difficult to implement or understand
Con:
- Requires clients make two requests for each one request (though may be lessened by requiring nonces for only certain requests)
- Requires management of nonces, which should be transactional
- Negatively affects performance by requiring the extra requests for nonces (transactionality further increases resource cost of working with nonces)
Best Answer
The short answer is: there is no adequate way to fully protect publicly accessible data from copying.
Cheap loss-less copy is one of main advantages of digital data. Anyone who can access your data is able copy it. And it's not easy to get rid of this ability.
The trick with nonce suggested by you makes scraping a little bit more complex, but not impossible, and even not much harder. A bot can evaluate anything that can be evaluated by a web browser. And in fact, modern bots usually do it. They may run a headless web browser (like PhantomJS) and "see" pages exactly as user sees. The most advanced bots emulate mouse clicks and randomize delays between actions, so it is very hard to distinguish them from humans.
And if your data is really public (there is no authentication) then there is no strong way to protect it from bots, although you can make their life a little bit harder. it is always a confrontation between a shop’s owner and a bot’s owner. The shop’s owner tries to make the bot more complex and expensive by making data harder to extract. The bot’s developer tries to make the bot more sophisticated. It continues until the bot or protection mechanisms become too costly for someone’s business.
You can use several tricks like data obfuscation, captcha, nonce, some heuristics to detect human activity. It will filter-out most of mass spiders that aren’t developed specially for your web site. If someone is aimed to your shop and develops a scraper specially for it then it is likely that you can’t protect from him.
So I think you should go to the light side and minimize costs by making your JSON API as simple and straightforward as possible.