R – Async urlfetch on App Engine

asynchronousgoogle-app-enginegoogle-cloud-datastoreurlfetch

My app needs to do many datastore operations on each request. I'd like to run them in parallel to get better response times.

For datastore updates I'm doing batch puts so they all happen asynchronously which saves many milliseconds. App Engine allows up to 500 entities to be updated in parallel.

But I haven't found a built-in function that allows datastore fetches of different kinds to execute in parallel.

Since App Engine does allow urlfetch calls to run asynchronously, I created a getter URL for each kind which returns the query results as JSON-formatted text. Now my app can do async urlfetch calls to these URLs which could parallelize the datastore fetches.

This technique works well with small numbers of parallel requests, but App Engine throws errors when attempting to run more than 5 or 10 of these urlfetch calls at the same time.

I'm only testing now, so each urlfetch is the identical query; since they work fine in small volumes but start failing with more than a handful of simultaneous requests, I'm thinking it must have something to do with the async urlfetch calls.

My questions are:

  1. Is there a limit to the number of urlfetch.create_rpc() calls that can run asynchronously?
  2. The synchronous urlfecth.fetch() function has a 'deadline' parameter that will allow the function to wait up to 10 seconds for a response before failing. Is there any way to tell urlfetch.create_rpc() how long to wait for a response?
  3. What do the errors shown below mean?
  4. Is there a better server-side technique to run datastore fetches of different kinds in parallel?

    File "/base/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 501, in get_result
    return self.__get_result_hook(self)
    File "/base/python_lib/versions/1/google/appengine/api/urlfetch.py", line 331, in _get_fetch_result
    raise DownloadError(str(err))
    InterruptedError: ('The Wait() request was interrupted by an exception from another callback:', DownloadError('ApplicationError: 5 ',))

Best Answer

Since App Engine allows async urlfetch calls but does not allow async datastore gets, I was trying to use urlfetch RPCs to retrieve from the datastore in parallel.

The lack of async datastore gets is an acknowledged issue:

http://code.google.com/p/googleappengine/issues/detail?id=1889

And there's now a third-party tool that allows async queries:

http://code.google.com/p/asynctools/

"asynctools is a library allowing you to execute Google App Engine API calls in parallel. API calls can be mixed together and queued up and then all are kicked off in parallel."

This is exactly what I was looking for.

Related Topic