Handling error messages from others services in Micro Service Architecture

apierror handlingmicroservices

Our company runs applications on a Micro Service architecture that includes thousands of services. I am working on a backend application "X" that talks to 50+ services. Frontend services call my service "X" to execute requests on other services.

Problem:

Front end wants to show user friendly messages when something fails on other services.

Other services do not return user friendly messages. It is not possible for me to request changes by other teams as there are several.
There are no agreed error codes as such. Other services return a string error message. Currently, it is passed back to the UI. Sometimes the error messages are a pointer references (bad code :/)

Possible Solution:

Check for error message string and have a mapping in my service to a user friendly message. But things can break if the callee service changed their error message. Fallback to a default error message when a custom error mapping is not found.

Any more ideas on scalable and sustainable solution? Thanks!

Best Answer

Disclaimers

Our company runs applications on a Micro Service architecture that includes thousands of services. I am working on a backend application "X" that talks to 50+ services. Frontend services call my service "X" to execute requests on other services.

First of all, thousands of random services don't make an architecture to be Microservices like architecture. It's still necessary a certain sense of a "whole" and a little bit of arrangement among services. Guidelines or rules of thumb.

Contextualize the backend within the 'whole'

I assume, this backend is neither gateway nor proxy. It has its own business and a well defined domain. So, regarding other services, 'X' is a facade to ease the access to this domain.

As a facade, hidding implementation details (as for instance, integrations) is among its responsibilities. No implementation detail should reach other services and this includes integration errors. Whatever happened in 'X', it's nobody business.

That said, it doesn't mean we cannot tell to the user that something went wrong. We can, but we do it abstracting the details. We won't give the sense of something remote is failing. Right the opposite, something in 'X' failed and that's it.

Since we are speaking about thousands of possible integrations (+50 atm), the number of possible and different errors is significant. If we map every single one to a custom message, the end-user is going to be overwhelmed by so many (and uncontextualized) information. If we map all the errors to a small set of custom errors, we are biasing the information, making hard for us to track the problem and solve it.

In my opinion, error messages should provide to the user with the sense that there's something we can do to amend the problem.

Nevertheless, if end-users still want to know what's going on under the hood, there are better ways. For example, logs.

Accountability

Other services do not return user-friendly messages. It is not possible for me to request changes by other teams as there are several.There are no agreed error codes as such.

Other services return a string error message. Currently, it is passed back to the UI. Sometimes the error messages are a pointer references (bad code :/)

As developer, your responsibility is to expose these arguments to the stakeholders. It's a matter of accountability. In my opinion, there's a leak of technical leadership and that's a real problem when it comes to distributed systems.

There's no technical envision. If there was, services would be implemented upon rules of thumb addressed to make the system scalable and ease the integrations among services. Right now looks like services appear wildly.

If I were asked to do what you have been requested to do (and I have been sometimes), I would argue whether turning the current anarchy into user-friendly messages is beyond the scope of X.

At least, "rise the hand", expose your concerns, expose your alternatives and let whoever has the accountability to decide.

Make your solutions valuable for the company

Check for error message string and have a mapping in my service to a user-friendly message. But things can break if the callee service changed their error message. Fallback to a default error message when a custom error mapping is not found.

You are right. That's a weak solution. It's brittle and inefficient in the mid-long run.

I also think it causes coupling since changes in these strings might force you to refractor the mappings. Not a big deal improvement.

Any more ideas on a scalable and sustainable solution?

Reporting. Handle the errors, give a code/ticket/id to them and report. Then, allow the front-end to visualize the report. For instance, sharing a link to the reporting service.

Error. < A user-friendly and very default error message >. Follow the link for further information

This way, you can integrate as many services as you need. And you release yourself from the overhead of handling and translating random strings into new random, but user-friendly, strings.

The reporting service is reusable for the rest of the services so that, if you have correlated IDs, should be possible for you to allow users to have a panoramic view of the errors and the causes. In distributed architectures, traceability is quite important.

Later, the reporting service can be enhanced with as many mappings as you need to give readable and useful instructions about what to do if error X happens. If strings change here doesn't matters at all. What we have (store) is a final state of the report.

The reporting service will open the door to a possible normalization of the errors within the organization since the service will expose a public API (hence a contract).

Related Solutions

Spring4 – Error Handling and HTTP Status Codes for REST Services

For example 500 Internal Server could imply that the apache server internally has some permission issue.

Not necessarily. Uncached errors in the server-side application will cause the java server to return a "controlled" error 500.

From the web client point of view, 500 means:

-Something went wrong (somewhere) on the server-side. We don't know what. Don't retry the request-

For everything, a 200 should be returned by the application code.

The catalogue of status code is wider, but 200 is basically the default code to say: Ok, everything went fine. I encourage you to look at the 2xx status code list to enrich server-client the communication.

For example, if any exception is thrown within a service method then a 500 is returned to the client. Is the HTTP response convention broken by Spring?

That's ok. When application errors reach the application server, the server catches them and does return the only reasonable error for uncontrolled errors. 500.

In that case, because an exception might be thrown by application code, shouldn't a 200 status code be returned?

That would be read in this way:

--the request successfully failed--

From the communication point of view, it doesn't seem to me effective. Usually, 5xx error codes mean: Try it much later, while 4xx codes mean: Try it again but, this time, do it well. Returning a 2xx code when the request didn't finish successfully. What are we communicating to the client?

We might argue that the text message will tell the user what to do, despite de https status code but, what happens if there's no user? How would you programme a machine-to-machine communication if every call ends "successfully"? Matching strings? Declaring custom status codes? Wouldn't that add unnecessary complexity?

Should the application code catch the error, return a 200 status code and add a more business/application specific message.

Depends. If you want your application to be a good www citizen then no, you should not. It's good for web applications to make a proper usage of the architecture web. So, if you need to communicate an error (5xx, 4xx) alongside with a specific error message, then do It. Tell to the web client: the request has not been processed due to the following errors

Let's say the application database is having issues? Or is it OK to return the 500 message?

It's ok, For this specific case, a 500 status code make sense. But, ultimately, depends on the requirements and your preferences. If you are concerned about how to manage the error handling with Spring web, it might interest the following links 1 or 2.

Error Handling Microservices – Upstreaming Microservices Errors

Exceptions should be treated just like domain models. Each service works with their own domain models and should have their own set of exception models as well. When communicating with external systems, the service should convert external exceptions to its domain exceptions as soon as possible. Basically I'm saying go with solution #2.

Lets consider the communication from service A -> B. Service A should first of all have an interface defined to decouple the business logic from the implementation of requests to B. In your example A is an account service and B is a user service. So let's call the interface UserService. This interface would have a set of (ideally) compiler-checked exceptions.

interface UserService
    def getUser(id): User throws UserNotFoundException, UserServiceException

You should implement HTTP client for service B so that any service that needs to depend on service B imports the common HTTP client. The error responses from requests to service B will be defined in this HTTP client component. That way they're only defined once.

class BHttpClient
    def getUserById(id) = 
        response = http.get("/users/${id}").send
        if (response.status == 404) throw new UnknownUserException
        else if (response.status == 500) throw new InternalServerException
        else return json.parse[User](response.content)

The implementation of UserService, HttpUserService will use that HTTP client to communicate with B, should catch HTTP and transport exceptions from the client and wrap them in the appropriate "domain" exception.

class HttpUserService(client: BHttpClient) implements UserService
    def getUser(id) = 
        try {
            client.getUserById(id)
        } catch {
            case e: UnknownUserException => throw new UserNotFoundException(e)
            case e: InternalServerException => throw new UserServiceException(e)
        }

Cons: Service A will need to catch and expect that Server B is capable of returning a bunch of errors, adding coupling between Service A and B.

Service A will catch errors from the http client in HttpUserService and wrap them in meaningful errors for service A. The business logic in service A is decoupled from service B through the UserService interface. HttpUserService is coupled to BHttpClient, but decoupled from service B because you can mock service B at the transport level.

Even if you choose to use a different architecture like @Laiv describes in the comments, you'll still want to decouple yourself from message and events you receive by converting the message models and exceptions into domain exceptions in each service. I don't agree with @Laiv, that it's as cut and dry as asynchronous message architecture or you might as well implement a monolith. There are still big gains that can be made by a synchronous, distributed service oriented architecture like you've described. The first and hardest step of getting the right architecture is to decouple the components. By dividing into microservices early, you can more easily adopt an asynchronous approach later if you need it.