Web-development – is GAE an infrastructure capable of hosting an app used by millions of active users

google-app-enginegoogle-cloud-datastoreweb-applicationsweb-developmentwebsites

I would like to know with the restrictions of GAE listed below, is it even possible to build a great social app (like Facebook) by hosting that app on GAE?

In other words is GAE an infrastructure capable of hosting an app used by 600 million active users?

Restrictions I've dugged out from a couple of forums / blogs (please feel free to add to the list if you find anything missing):

  1. HTTP Request/Response

    • Max request size: 32 MB
    • Max response size: 32 MB
    • All requests must respond within 30 seconds otherwise GAE will throw DeadlineExceededException
    • Each cron job must be executed within 10 minutes
    • Cron jobs cannot utilize map reduce
    • Every GET or POST to another site is aborted after 5 seconds. You can configure it to wait till 10 seconds max. (intermediate servers would be necessary to work with Twitter and Facebook many times)
    • Client can not connect to GAE through FTP (only HTTP and HTTPS).
    • No https for custom domains. Only for your-app-id.appspot.com domains.
    • If you get an influx of users, you get "over quota" error
  2. Database

    • Database behavior is not the same in the local development than in the actual servers.
    • GQL. Nothing else.
    • No query can retrieve more than 1000 records (sucks seriously if you want to allow your client to have a one-click-go-offline-now button)
    • If you need linear access to a massive amount of records to perform an operation, you are out of luck (Google's systems are massively clustered)
    • Memcache values max size is 1 MB.
    • Can't do simple text search
    • You can't join 2 tables.
    • Slow (You have to read about how to separate tables using inheritance so that you can search in a table, get the key and then obtain its parent in order to avoid deserialization performance)
    • "Too many indexes" runtime exception
    • An entity can at most have 5000 property values in an index
    • Key names of the form * (start and end with two underscores) are reserved, and should not be used by the application.
    • Key names are limited to 500 bytes (UTF-8 encoded, I guess)
  3. Language

    • python or java or Go (or languages that uses the JVM like Groovy, Scala and others)
  4. Server Issues

    • No static IP (may have throttling and quota problems calling third party APIs)
    • Each application is limited to 3000 files
    • No control of OS or hardware running the web app

Best Answer

I think what annoys me about this question is that you've phrased it and loaded it with "facts" in an attempt to gather a definitive no.

The truth is that you could develop an App Engine app that replicates the features of Facebook, Twitter, or Tumblr. And assuming the app was well written, it would scale to support hundreds of millions of users. The main reason you wouldn't want to (which doesn't seem to be a consideration for you) is the cost of running a service that size on App Engine.

Also, I fail to see how any of the restrictions you've listed would prevent you from developing such an app.

  1. HTTP Request/Response

    • Max request size: 10 MB - wrong, raised to 32MB.
    • Max response size: 10 MB - wrong, raised to 32MB.

    -- if you are developing a social app that frequently needs to deliver pages larger than 10MB you are probably doing it wrong. Also, if you do need to deliver content larger than 32MB you can use the blobstore for files up to 2GB.

    • You can't access the file system. (forget about saving uploads to filesystem) - wrong. You can read from the local file system and can upload and read/write file to the blobstore.

    -- There is no way that Facebook, Twitter, or Tumblr are just taking user uploads and copying them to a folder. Not an issue.

    • All requests must respond within 10 minutes otherwise GAE will throw DeadlineExceededException - wrong. It's 30 seconds actually.

    -- If you need longer than 30 seconds to deliver results to a user's request you are probably doing it wrong.

    • Each cron job must be executed within 30 seconds - wrong, it's 10 minutes.

    -- If you can't divide a lengthy task into 10 minute chunks, A: you're probably doing it wrong and B: you can now move that task to a Backend instance, which doesn't have a time limit on requests.

    • Cron jobs cannot utilize map reduce - never used map reduce, but I think this requires a citation.

    • Every GET or POST to another site is aborted after 5 seconds. You can configure it to wait till 10 seconds max. (intermediate servers would be necessary to work with Twitter and Facebook many times) - True.

    -- If a user-facing request to an external API is taking longer than 10 seconds it's probably a good idea to tell the user to retry anyway. If it's not a user-facing request you can automatically retry the task until the API responds.

    • Client can not connect to GAE through FTP (only HTTP and HTTPS). - True

    -- Why is this an issue? Do you think any large-scale company deploys changes via FTP?

    • No https for custom domains. Only for your-app-id.appspot.com domains. - True.

    -- It's on the roadmap though.

    • If you get an influx of users, you get "over quota" error - Half true.

    -- If you properly budget your app you will never see an over quota error. The Royal Wedding site was hosted on App Engine and received 32,000 requests per second. No over quota errors. Also, ever seen the fail whale on Twitter, or the over capacity error on Tumblr? That's essentially their over quota error.

  2. Database

    • Database behavior is not the same in the local development than in the actual servers. - False

    -- If you mean running the datastore on your laptop is slower than running it on App Engine's cluster, then true, otherwise not true at all.

    • GQL. Nothing else. - False

    -- Most developers use db filters to query the datastore. Plus, you could equally say that MySQL allows "SQL. Nothing else."

    • No query can retrieve more than 1000 records (sucks seriously if you want to allow your client to have a one-click-go-offline-now button) - False.

    -- The 1000 record limit was lifted a long time ago. Besides, show me any user-facing page on Facebook, Twitter, or Tumblr that requires more than 1000 records to render.

    • If you need linear access to a massive amount of records to perform an operation, you are out of luck (Google's systems are massively clustered)

    -- I'm not even sure what you're getting at here. Most people regard the speed of Google's massive cluster as a huge advantage of the system.

    • Memcache values max size is 10 MB. - Actually it's 1MB per memcache entry, same as every other memcache implementation.

    • Can't do simple text search - True.

    -- It's a feature that's on deck. Most large sites don't do their own text search indexing.

    • You can't join 2 tables. - True.

    -- App Engine developers need to adjust their thinking from single massive multi-join SQL query to several smaller individual queries, or denormalize data so that joins aren't needed.

    • Slow (You have to read about how to separate tables using inheritance so that you can search in a table, get the key and then obtain its parent in order to avoid deserialization performance) - ???

    -- translation/citation required.

    • "Too many indexes" runtime exception - True

    -- There is a limit to the number of indexes in a single app. I've only seen academic research applications hit it though.

    • An entity can at most have 5000 property values in an index - True

    -- So if someone has more than 5000 friends they would need two entities in the friends group.

    • Key names of the form __*__ (start and end with two underscores) are reserved, and should not be used by the application. - True

    -- But so what?

    • Key names are limited to 500 bytes (UTF-8 encoded, I guess) - True

    -- Again, so what? Key names aren't for storing novellas, they're for uniquely identifying an entity.

  3. Language

    • python or java or Go (anything else would have to be translated to these languages) - Half true

    -- Actually you can also run any language that runs on the JVM, including PHP and JRuby. Not sure why it's an issue though, Python and Java are two powerful languages with lots of available tools, tutorials, and experienced programmers.

  4. Server Issues

    • No static IP (Throttling and quota problems calling third party APIs) - Half true

    -- Most third party APIs are aware of App Engine and/or have a relationship with Google. A few times Twitter has accidentally blocked App Engine and it gets fixed within a few hours.

    • Each application is limited to 3000 files - Half true

    -- If you really need more than 3000 code files for your web application you can use zip imports (Also, you might be doing it wrong).

    • No control of OS or hardware running the web app - True

    -- App Engine is a Platform as a Service. Not having to worry about servicing the OS or hardware is what people are paying for. This is the key advantage of App Engine, not a limitation.

Related Topic