API Design – API with Limits vs Site Crawling

api-designweb-api

I started working on a website,for tracking and rating watched anime/manga/etc. and recommendations, and it should also have an API, for providing the info about series and other things.

On similar sites, I have noticed that, to use an API, one typically needs a token/auth of sorts, and there are certain usage limits, even if it's for reading info publicly available on the site.

But the problem is, you could circumvent all those limits by crawling the site directly.
Even if the format is less convenient, once you have a parser in place there's no problem. Actually, if it uses clientside rendering, the info will already be sent in a convenient format.
And on the other hand, this would also put more strain on the server, because the info may be spread out on multiple pages, needing more requests, and it would also send info not required by the client app.

In the end, is there a point in restricting the API used for info that's available publicly on the site? Should there be an unrestricted, unauthed API for reading public info, in order to avoid needless blunder for both sides?
Or should, instead, the site itself have request limits, like an API?

Best Answer

Having a public API for data access from your site is about making the data available in a convenient, supported, well-defined and always-up-to-date manner. It is a way for a site owner to say 'here is data I collect and own, but I want you to be able to use it so I'm making it available. Oh, and I promise not to change the structure or do anything that might break your applications without communicating about it clearly'.

Crawling has some technical limitations, some very important legal considerations AND is prone to breaking without any sort of notification from the owner of the data. Personally I would not hesitate to consume a public JSON API if that has data I need, but I'd be hard pressed to start writing a crawler/parser to get it off a website...

Related Topic