I would like to know with the restrictions of GAE listed below, is it even possible to build a great social app (like Facebook) by hosting that app on GAE?
In other words is GAE an infrastructure capable of hosting an app used by 600 million active users?
Restrictions I've dugged out from a couple of forums / blogs (please feel free to add to the list if you find anything missing):
-
HTTP Request/Response
- Max request size: 32 MB
- Max response size: 32 MB
- All requests must respond within 30 seconds otherwise GAE will throw DeadlineExceededException
- Each cron job must be executed within 10 minutes
- Cron jobs cannot utilize map reduce
- Every GET or POST to another site is aborted after 5 seconds. You can configure it to wait till 10 seconds max. (intermediate servers would be necessary to work with Twitter and Facebook many times)
- Client can not connect to GAE through FTP (only HTTP and HTTPS).
- No https for custom domains. Only for your-app-id.appspot.com domains.
- If you get an influx of users, you get "over quota" error
-
Database
- Database behavior is not the same in the local development than in the actual servers.
- GQL. Nothing else.
- No query can retrieve more than 1000 records (sucks seriously if you want to allow your client to have a one-click-go-offline-now button)
- If you need linear access to a massive amount of records to perform an operation, you are out of luck (Google's systems are massively clustered)
- Memcache values max size is 1 MB.
- Can't do simple text search
- You can't join 2 tables.
- Slow (You have to read about how to separate tables using inheritance so that you can search in a table, get the key and then obtain its parent in order to avoid deserialization performance)
- "Too many indexes" runtime exception
- An entity can at most have 5000 property values in an index
- Key names of the form * (start and end with two underscores) are reserved, and should not be used by the application.
- Key names are limited to 500 bytes (UTF-8 encoded, I guess)
-
Language
- python or java or Go (or languages that uses the JVM like Groovy, Scala and others)
-
Server Issues
- No static IP (may have throttling and quota problems calling third party APIs)
- Each application is limited to 3000 files
- No control of OS or hardware running the web app
Best Answer
I think what annoys me about this question is that you've phrased it and loaded it with "facts" in an attempt to gather a definitive no.
The truth is that you could develop an App Engine app that replicates the features of Facebook, Twitter, or Tumblr. And assuming the app was well written, it would scale to support hundreds of millions of users. The main reason you wouldn't want to (which doesn't seem to be a consideration for you) is the cost of running a service that size on App Engine.
Also, I fail to see how any of the restrictions you've listed would prevent you from developing such an app.
HTTP Request/Response
-- if you are developing a social app that frequently needs to deliver pages larger than 10MB you are probably doing it wrong. Also, if you do need to deliver content larger than 32MB you can use the blobstore for files up to 2GB.
-- There is no way that Facebook, Twitter, or Tumblr are just taking user uploads and copying them to a folder. Not an issue.
-- If you need longer than 30 seconds to deliver results to a user's request you are probably doing it wrong.
-- If you can't divide a lengthy task into 10 minute chunks, A: you're probably doing it wrong and B: you can now move that task to a Backend instance, which doesn't have a time limit on requests.
Cron jobs cannot utilize map reduce - never used map reduce, but I think this requires a citation.
Every GET or POST to another site is aborted after 5 seconds. You can configure it to wait till 10 seconds max. (intermediate servers would be necessary to work with Twitter and Facebook many times) - True.
-- If a user-facing request to an external API is taking longer than 10 seconds it's probably a good idea to tell the user to retry anyway. If it's not a user-facing request you can automatically retry the task until the API responds.
-- Why is this an issue? Do you think any large-scale company deploys changes via FTP?
-- It's on the roadmap though.
-- If you properly budget your app you will never see an over quota error. The Royal Wedding site was hosted on App Engine and received 32,000 requests per second. No over quota errors. Also, ever seen the fail whale on Twitter, or the over capacity error on Tumblr? That's essentially their over quota error.
Database
-- If you mean running the datastore on your laptop is slower than running it on App Engine's cluster, then true, otherwise not true at all.
-- Most developers use db filters to query the datastore. Plus, you could equally say that MySQL allows "SQL. Nothing else."
-- The 1000 record limit was lifted a long time ago. Besides, show me any user-facing page on Facebook, Twitter, or Tumblr that requires more than 1000 records to render.
-- I'm not even sure what you're getting at here. Most people regard the speed of Google's massive cluster as a huge advantage of the system.
Memcache values max size is 10 MB. - Actually it's 1MB per memcache entry, same as every other memcache implementation.
Can't do simple text search - True.
-- It's a feature that's on deck. Most large sites don't do their own text search indexing.
-- App Engine developers need to adjust their thinking from single massive multi-join SQL query to several smaller individual queries, or denormalize data so that joins aren't needed.
-- translation/citation required.
-- There is a limit to the number of indexes in a single app. I've only seen academic research applications hit it though.
-- So if someone has more than 5000 friends they would need two entities in the friends group.
__*__
(start and end with two underscores) are reserved, and should not be used by the application. - True-- But so what?
-- Again, so what? Key names aren't for storing novellas, they're for uniquely identifying an entity.
Language
-- Actually you can also run any language that runs on the JVM, including PHP and JRuby. Not sure why it's an issue though, Python and Java are two powerful languages with lots of available tools, tutorials, and experienced programmers.
Server Issues
-- Most third party APIs are aware of App Engine and/or have a relationship with Google. A few times Twitter has accidentally blocked App Engine and it gets fixed within a few hours.
-- If you really need more than 3000 code files for your web application you can use zip imports (Also, you might be doing it wrong).
-- App Engine is a Platform as a Service. Not having to worry about servicing the OS or hardware is what people are paying for. This is the key advantage of App Engine, not a limitation.