Scala – the cost of creating actors in Akka

akkaperformancescala

Consider a scenario in which I am implementing a system that processes incoming tasks using Akka. I have a primary actor that receives tasks and dispatches them to some worker actors that process the tasks.

My first instinct is to implement this by having the dispatcher create an actor for each incoming task. After the worker actor processes the task it is stopped.

This seems to be the cleanest solution for me since it adheres to the principle of "one task, one actor". The other solution would be to reuse actors – but this involves the extra-complexity of cleanup and some pool management.

I know that actors in Akka are cheap. But I am wondering if there is an inherent cost associated with repeated creation and deletion of actors. Is there any hidden cost associated with the data structures Akka uses for the bookkeeping of actors ?

The load should be of the order of tens or hundreds of tasks per second – think of it as a production webserver that creates one actor per request.

Of course, the right answer lies in the profiling and fine tuning of the system based on the type of the incoming load.
But I wondered if anyone could tell me something from their own experience ?

LATER EDIT:

I should given more details about the task at hand:

Only N active tasks can run at some point. As @drexin pointed out – this would be easily solvable using routers. However, the execution of tasks isn't a simple run and be done type of thing.
Tasks may require information from other actors or services and thus may have to wait and become asleep. By doing so they release an execution slot. The slot can be taken by another waiting actor which now has the opportunity to run. You could make an analogy with the way processes are scheduled on one CPU.
Each worker actor needs to keep some state regarding the execution of the task.

Note: I appreciate alternative solutions to my problem, and I will certainly take them into consideration. However, I would also like an answer to the main question regarding the intensive creation and deletion of actors in Akka.

Best Answer

You should not create an actor for every request, you should rather use a router to dispatch the messages to a dynamic amount of actors. That's what routers are for. Read this part of the docs for more information: http://doc.akka.io/docs/akka/2.0.4/scala/routing.html

edit:

Creating top-level actors (system.actorOf) is expensive, because every top-level actor will initialize an error kernel as well and those are expensive. Creating child actors (inside an actor context.actorOf) is way cheaper.

But still I suggest you to rethink this, because depending on the frequency of the creation and deletion of actors you will also put afditional pressure on the GC.

edit2:

And most important, actors are not threads! So even if you create 1M actors, they will only run on as many threads as the pool has. So depending on the throughput setting in the config every actor will process n messages before the thread gets released to the pool again.

Note that blocking a thread (includes sleeping) will NOT return it to the pool!

Related Solutions

Javascript – the difference between call and apply

The difference is that apply lets you invoke the function with arguments as an array; call requires the parameters be listed explicitly. A useful mnemonic is "A for array and C for comma."

See MDN's documentation on apply and call.

Pseudo syntax:

theFunction.apply(valueForThis, arrayOfArgs)

theFunction.call(valueForThis, arg1, arg2, ...)

There is also, as of ES6, the possibility to spread the array for use with the call function, you can see the compatibilities here.

Sample code:

function theFunction(name, profession) {
    console.log("My name is " + name + " and I am a " + profession +".");
}
theFunction("John", "fireman");
theFunction.apply(undefined, ["Susan", "school teacher"]);
theFunction.call(undefined, "Claude", "mathematician");
theFunction.call(undefined, ...["Matthew", "physicist"]); // used with the spread operator

Discovery of Akka actors in cluster

I'm working on a private project which is basically a very extended version of the chatroom example and I also had startup problems with akka and the whole "decentralized" thinking. So I can tell you how I "solved" my extended chatroom:

I wanted a server which could easily be deployed multiple times without much additional configuration. I am using redis as the storage for all open user sessions (simple serialization of their ActorRefs) and for all chatrooms.

The server has the following actors:

WebsocketSession: which holds the connection to one user and handles requests from the user and forwards messages from the system.
ChatroomManager: this is the central broadcaster, which is deployed on every instance of the server. If a user wants to send a message to a chatroom, the WebSocketSession-Actor sends all the information to the ChatroomManager-Actor which then broadcasts the message to all members of the chatroom.

So here is my procedure:

User A connects to server 1 which allocates a new WebsocketSession. This actor inserts the absolute path to this actor into redis.
User A joins a chatroom X which also inserts his absolute path (I use this as the unique ID of a user session) into redis (each chatroom has a "connections" set)
User B connects to server 2 -> redis
User B joins chatroom X -> redis
User B sends a message to chatroom X as follows: user B sends his message over the Websocket to his session-actor, which (after some checks) sends a actor-message to the ChatroomManager. This actor actually retrieves the user-list of the chatroom from redis (absolute paths used with akka's actorFor-method) and then sends the message to each session-actor. These session-actors then write to their websockets.

In each ChatroomManager-actor I do some ActorRef caching which gave additional speed. I think this differs from your approach, especially that these ChatroomManagers handle requests for all chatrooms. But having one actor for one chatroom is a single point of failure which I wanted to avoid. Further would this cause a lot more messages, eg:

User A and user B are on server 1.
Chatroom X is on server 2.

If user A wants to talk to user B, they both would have to communicate over the chatroom-actor on server 1.

In addition I used akka's functionalities like (round-robin)-routers to create multiple instances of a the ChatroomManager-actor on each system to handle many requests.

I spend some days on setting up the whole akka remote infrastructure in combination with serialization and redis. But now I am able to create any number of instances of the server application which use redis to share there ActorRefs (serialized as absolute paths with ip+port).

This may helps you a little bit further and I'm open for new questions (please not about my english ;).

Best Answer

Related Solutions

Javascript – the difference between call and apply

Discovery of Akka actors in cluster

Related Topic