Java – How to design a complex REST API considering DB performance

databasehibernatejavaperformancerest

I've been following some tutorials on how to design REST APIs, but I still have some big questions marks. All these tutorials show resources with relatively simple hierarchies, and I would like to know how the principles used in those apply to a more complex one. Furthermore, they stay at a very high/architectural level. They barely show any relevant code, let alone the persistence layer. I'm specially concerned about database load/performance, as Gavin King said:

you will save yourself effort if you pay attention to the database at
all stages of development

Let's say my application will provide training for Companies. Companies have Departments and Offices. Departments have Employees. Employees have Skills and Courses, and certain Level of certain skills are required to be able to sign for some courses. The hierarchy is as as follows, but with :

-Companies
  -Departments
    -Employees
      -PersonalInformation
        -Address
      -Skills (quasi-static data)
        -Levels (quasi-static data)
      -Courses
        -Address
  -Offices
    -Address

Paths would be something as:

companies/1/departments/1/employees/1/courses/1
companies/1/offices/1/employees/1/courses/1

Fetching a resource

So ok, when returning a company, I obviously don't return the whole hierarchy companies/1/departments/1/employees/1/courses/1 + companies/1/offices/../. I might return a list of links to the departments or the expanded departments, and have to take the same decission at this level: do I return a list of links to the department's employees or the expanded employees? That will depend on the number of departments, employees, etc.

Question 1: Is my thinking correct, is "where to cut the hierarchy" a typical engineering decission I need to make?

Now let's say that when asked GET companies/id, I decide to return a list of links to the department collection, and the expanded office information. My companies don't have many offices, so joining with the tables Offices and Addresses shouldn't be a big deal. Example of response:

GET /companies/1

200 OK
{
  "_links":{
    "self" : {
      "href":"http://trainingprovider.com:8080/companies/1"
      },
      "offices": [
            { "href": "http://trainingprovider.com:8080/companies/1/offices/1"},
            { "href": "http://trainingprovider.com:8080/companies/1/offices/2"},
            { "href": "http://trainingprovider.com:8080/companies/1/offices/3"}
      ],
      "departments": [
            { "href": "http://trainingprovider.com:8080/companies/1/departments/1"},
            { "href": "http://trainingprovider.com:8080/companies/1/departments/2"},
            { "href": "http://trainingprovider.com:8080/companies/1/departments/3"}
      ]
  }
  "name":"Acme",
  "industry":"Manufacturing",
  "description":"Some text here",
  "offices": {
    "_meta":{
      "href":"http://trainingprovider.com:8080/companies/1/offices"
      // expanded offices information here
    }
  }
}

At the code level, this implies that (using Hibernate, I'm not sure how it is with other providers, but I guess that's pretty much the same) I won't put a collection of Department as a field in my Company class, because:

As said, I'm not loading it with Company, so I don't want to load it eagerly
And if I don't load it eagerly, I might as well remove it, because the persistence context will close after I load a Company and there is no point in trying to load it afterwards (LazyInitializationException).

Then, I'll put a Integer companyId in the Department class, so that I can add a department to a company.

Also, I need to get the ids of all the departments. Another hit to the DB but not a heavy one, so should be ok. The code could look like:

@Service
@Path("/companies")
public class CompanyResource {

    @Autowired
    private CompanyService companyService;

    @Autowired
    private CompanyParser companyParser;

    @Path("/{id}")
    @GET
    @Consumes(MediaType.APPLICATION_JSON)
    @Produces(MediaType.APPLICATION_JSON)
    public Response findById(@PathParam("id") Integer id) {
        Optional<Company> company = companyService.findById(id);
        if (!company.isPresent()) {
            throw new CompanyNotFoundException();
        }
        CompanyResponse companyResponse = companyParser.parse(company.get());
        // Creates a DTO with a similar structure to Company, and recursivelly builds
        // sub-resource DTOs such as OfficeDTO
        Set<Integer> departmentIds = companyService.getDepartmentIds(id);
        // "SELECT id FROM departments WHERE companyId = id"
        // add list of links to the response
        return Response.ok(companyResponse).build();
    }
}

@Entity
@Table(name = "companies")
public class Company {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Integer id;

    private String name;

    private String industry;

    @OneToMany(fetch = EAGER, cascade = {ALL}, orphanRemoval = true)
    @JoinColumn(name = "companyId_fk", referencedColumnName = "id", nullable = false)
    private Set<Office> offices = new HashSet<>();

    // getters and setters
}

@Entity
@Table(name = "departments")
public class Department {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Integer id;

    private String name;

    private Integer companyId;

    @OneToMany(fetch = EAGER, cascade = {ALL}, orphanRemoval = true)
    @JoinColumn(name = "departmentId", referencedColumnName = "id", nullable = false)
    private Set<Employee> employees = new HashSet<>();

    // getters and setters
}

Updating a resource

For the update operation, I can expose an endpoint with PUT or POST. Since I want my PUT to be idempotent, I can't allow partial updates. But then, if I want to modify the company's description field, I need to send the whole resource representation. That seems too bloated. The same when updating an employee's PersonalInformation. I don't think it makes sense having to send all the Skills + Courses together with that.

Question 2: Is PUT just used for fine-grained resources?

I've seen in the logs that, when merging an entity, Hibernate executes a bunch of SELECT queries. I guess that's just to check if anything has changed and update whatever information needed. The upper the entity in the hierarchy, the heavier and more complex the queries. But some sources advise to use coarse grained resources. So again, I'll need to check how many tables are too much, and find a compromise between resource granularity and DB query complexity.

Question 3: Is this just another "know where to cut" engineering decission or am I missing something?

Question 4: Is this, or if not, what is the right "thinking process" when designing a REST service and searching for a compromise between resource granularity, query complexity and network chattiness?

Best Answer

I think you have complexity because you are starting with over-complication:

Paths would be something as:

companies/1/departments/1/employees/1/courses/1
companies/1/offices/1/employees/1/courses/1

Instead I would introduce simpler URL scheme like this:

GET companies/
    Returns a list of companies, for each company 
    return short essential info (ID, name, maybe industry)

GET companies/1
    Returns single company info like this:

    {
        "name":"Acme",
        "description":"Some text here"
        "industry":"Manufacturing"
        departments: {
            "href":"/companies/1/departments"
            "count": 5
        }
        offices: {
            "href":"/companies/1/offices"
            "count": 3
        }
    }

    We don't expand the data for internal sub-resources, 
    just return the count, so client knows that some data is present.
    In some cases count may be not needed too.

GET companies/1/departments
    Returns company departments, again short info for each department

GET departments/
    Here you need to decide if it makes sense to expose 
    a list of departments or not. 
    If not - leave only companies/X/departments method.

    Note, that you can also use query string to make this 
    method "searchable", like:
        /departments?company=1 - list of all departments for company 1
        /departments?type=support - all 'support' departments for all companies

GET departments/1
    Returns department 1 data

This way it answers most of your questions - you "cut" the hierarchy right away and you don't bind your URL scheme to the internal data structure. For example, if we know employee ID, would you expect to query it like employees/:ID or like companies/:X/departments/:Y/employees/:ID?

Regarding PUT vs POST requests, from your question it is clear that you feel the partial updates will be more efficient for your data. So I would just use POSTs.

In practice, you actually want to cache data reads (GET requests) and it is less important for data updates. And update are often can't be cached regardless of what type of request you do (like if server automatically sets the update time - it will be different for every request).

Update: regarding the right "thinking process" - since it is based on HTTP, we can apply the regular way of thinking when designing the web-site structure. In this case on the top we can have a list of companies and show a short description for each with a link to the "view company" page, where we show company details and links to offices / departments and so on.

Related Solutions

REST API – What Is the Limit on Resource Levels?

Yes,

GET /api/companies/1/employees

would be the way to get the employees of a company.

What exactly your employee resource offers would depend on some details of your application. For example you could have logins for admins that are directly related to a specific company. It this case it would be a possible shortcut the employees resource and directly go to an employee or list of employees. But in normal cases /api/employees would only work in combination with companies.

Very interesting question would be the nesting levels. This is one of the questions some people can discuss for hours. Ruby on Rails for example has now an option for "shallow" routes.

While there is no hard technical limit to the number of levels (max length of an URL maybe) I think two levels should be enough for most cases and deeper nesting should be avoided if possible since doing so will make your code better structured and maintainable.

Your example is quite typical and it is easy to show why you would not really need a third level for the tasks in most scenarios. Your tasks are directly related to an employee. So the entity resource somehow already carries the information about the company. (Selecting an employee automatically selects the company). So providing the company in the URL again somehow repeats information.

GET /api/employees/123/tasks

has all the information necessary to get to your data. So why make it more complex?

I'm not sure if I fully understand your last question.

Restricting access in this context is rather easy. Say you have a client who has only access to resources of the company with id 123. Then all you have to do is to start your controller code by assigning some "hard limits". If the logged in user is not allowed to access anything but one company, then create an obkect for that company and from there on use only this object for any firther access.

Assume we have a Rails projects. For this we have a model for each resource you mention. In addition we have an authentication system in place that gives us the currently logged in user. In this case it would be as easy as this to return something like the list of employees:

current_user.company.employees

Notice that it does not even try to use any company id used in the URL (which in case of Rails would be stored in params[:company_id]).

Try to get the tasks of a specific employee with

GET /api/companies/123/employees/456/tasks

current_user.company.employees.find(params[employee_id]).tasks

Notice again that this is going the "full path" instead of simply using

Employee.find(params[employee_id]).tasks

The first form will create a SQL query with a lot of LEFT JOINS that starts from user, goes through company, employees and tasks. And important in this case: It will fail if a single relation is missing.

Even if not in context of a login and permissions you can still use this to enforce that the user has a certain knowledge about the data. Though in this case you would maybe have to use deeper nested routes. Same example without login but the user making the request must know the correct company id. In this case:

GET /api/companies/123/employees/456/tasks

Company.find(params[:company_id]).employees.find(params[employee_id]).tasks

Again go the full way. If your framework does not have this kind of object relations simply nake sure that your SQL requests use JOINS in a similar fashion.

Rest – How to design a RESTful API with good performance

No one said you can't make a URI for /api/foos/?page=1. A container is not the same as the things it contains; a list of foos is a different resource from the individual foos.