C# Scalability – How to Avoid Chatty Interfaces

cdesignscalabilityserver-side

Background:
I am designing a server application and creating separate dll's for different subsystems. To simplify things, let's say I have two subsystems: 1) Users 2) Projects

Users's public interface has a method like:

IEnumerable<User> GetUser(int id);

And Projects's public interface has a method like:

IEnumerable<User> GetProjectUsers(int projectId);

So, for example, when we need to display the users for a certain project, we can call GetProjectUsers and that will give back objects with sufficient info to show in a datagrid or similar.

Problem:
Ideally, the Projects subsystem shouldn't also store user info and it should just store Ids of the users participating in a project. In order to serve the GetProjectUsers, it needs to call GetUser of the Users system for each user id stored in its own database. However, this requires a lot of separate GetUser calls, causing a lot of separate sql queries inside the User subsystem. I haven't really tested this but having this chatty design will affect the scalability of the system.

If I put aside the separation of the subsystems, I could store all info in a single schema accessable by both systems and Projects could simply do a JOIN to get all project users in a single query. Projects would also need to know how to generate User objects from the query results. But this breaks the separation which has many advantages.

Question:
Can anyone suggest a way to keep the separation while avoiding all these individual GetUser calls during GetProjectUsers?


For example, one I idea I had was for Users to give external systems the ability to "tag" users with a label-value pair, and to request users with a certain value, e.g.:

void AddUserTag(int userId, string tag, string value);
IEnumerable<User> GetUsersByTag(string tag, string value);

Then the Projects system could tag each user as they are added to the project:

AddUserTag(userId,"project id", myProjectId.ToString());

and during GetProjectUsers, it could request all the project users in a single call:

var projectUsers = usersService.GetUsersByTag("project id", myProjectId.ToString());

the part I'm not sure about this is: yes, Users is agnostic of projects but really the information about project membership is stored in the Users system, not Projects. I just doesn't feel natural so I'm trying to determine if there's a big disadvantage here that I'm missing.

Best Answer

What is missing in your system is the cache.

You say:

However, this requires a lot of separate GetUser calls, causing a lot of separate sql queries inside the User subsystem.

The number of calls to a method doesn't have to be the same as the number of SQL queries. You get the information about the user once, why would you query for the same information again if it didn't change? Very probably, you may even cache all the users in memory, which would result in zero SQL queries (unless a user changes).

On the other hand, by making Projects subsystem query both the projects and the users with an INNER JOIN, you introduce an additional issue: you are querying the same piece of information in two different locations in your code, making cache invalidation extremely difficult. As a consequence:

  • Either you won't introduce cache at all any time later,

  • Or you will spend weeks or months studying what should be invalidated when a piece of information changes,

  • Or you will add cache invalidation in straightforward locations, forgetting the other ones and resulting in difficult to find bugs.


Rereading your question, I notice a keyword I missed the first time: scalability. As a rule of thumb, you may follow the next pattern:

  1. Ask yourself whether the system is slow (i.e. it violates a non-functional requirement of performance, or is simply a nightmare to use).

    If the system is not slow, don't bother about performance. Bother about clean code, readability, maintainability, testing, branch coverage, clean design, detailed and easy to understand documentation, good code comments.

  2. If yes, search for the bottleneck. You do that not by guessing, but by profiling. By profiling, you determine the exact location of the bottleneck (given that when you guess, you may nearly every time get it wrong), and may now focus on that part of the code.

  3. Once the bottleneck found, search for solutions. You do that by guessing, benchmarking, profiling, writing alternatives, understanding compiler optimizations, understanding optimizations that are up to you, asking questions on Stack Overflow and moving to low-level languages (including Assembler, when necessary).

What is the actual issue with Projects subsystem asking for info to Users subsystem?

The eventual future scalability issue? This is not an issue. Scalability may become a nightmare if you start merging everything into one monolithic solution or querying for the same data from multiple locations (as explained below, because of the difficulty to introduce cache).

If there is already a noticeable performance issue, then, step 2, search for the bottleneck.

If it appears that, indeed, the bottleneck exists and is due to the fact that Projects requests for users through Users subsystem (and is situated at database querying level), only then should you search for an alternative.

The most common alternative would be to implement caching, drastically reducing the number of queries. If you're in a situation where caching doesn't help, than further profiling may show you that you need to reduce the number of queries, or add (or remove) database indexes, or throw more hardware, or redesign completely the whole system.