Java – How to design a complex REST API considering DB performance

databasehibernatejavaperformancerest

I've been following some tutorials on how to design REST APIs, but I still have some big questions marks. All these tutorials show resources with relatively simple hierarchies, and I would like to know how the principles used in those apply to a more complex one. Furthermore, they stay at a very high/architectural level. They barely show any relevant code, let alone the persistence layer. I'm specially concerned about database load/performance, as Gavin King said:

you will save yourself effort if you pay attention to the database at
all stages of development

Let's say my application will provide training for Companies. Companies have Departments and Offices. Departments have Employees. Employees have Skills and Courses, and certain Level of certain skills are required to be able to sign for some courses. The hierarchy is as as follows, but with :

-Companies
  -Departments
    -Employees
      -PersonalInformation
        -Address
      -Skills (quasi-static data)
        -Levels (quasi-static data)
      -Courses
        -Address
  -Offices
    -Address

Paths would be something as:

companies/1/departments/1/employees/1/courses/1
companies/1/offices/1/employees/1/courses/1

Fetching a resource

So ok, when returning a company, I obviously don't return the whole hierarchy companies/1/departments/1/employees/1/courses/1 + companies/1/offices/../. I might return a list of links to the departments or the expanded departments, and have to take the same decission at this level: do I return a list of links to the department's employees or the expanded employees? That will depend on the number of departments, employees, etc.

Question 1: Is my thinking correct, is "where to cut the hierarchy" a typical engineering decission I need to make?

Now let's say that when asked GET companies/id, I decide to return a list of links to the department collection, and the expanded office information. My companies don't have many offices, so joining with the tables Offices and Addresses shouldn't be a big deal. Example of response:

GET /companies/1

200 OK
{
  "_links":{
    "self" : {
      "href":"http://trainingprovider.com:8080/companies/1"
      },
      "offices": [
            { "href": "http://trainingprovider.com:8080/companies/1/offices/1"},
            { "href": "http://trainingprovider.com:8080/companies/1/offices/2"},
            { "href": "http://trainingprovider.com:8080/companies/1/offices/3"}
      ],
      "departments": [
            { "href": "http://trainingprovider.com:8080/companies/1/departments/1"},
            { "href": "http://trainingprovider.com:8080/companies/1/departments/2"},
            { "href": "http://trainingprovider.com:8080/companies/1/departments/3"}
      ]
  }
  "name":"Acme",
  "industry":"Manufacturing",
  "description":"Some text here",
  "offices": {
    "_meta":{
      "href":"http://trainingprovider.com:8080/companies/1/offices"
      // expanded offices information here
    }
  }
}

At the code level, this implies that (using Hibernate, I'm not sure how it is with other providers, but I guess that's pretty much the same) I won't put a collection of Department as a field in my Company class, because:

  • As said, I'm not loading it with Company, so I don't want to load it eagerly
  • And if I don't load it eagerly, I might as well remove it, because the persistence context will close after I load a Company and there is no point in trying to load it afterwards (LazyInitializationException).

Then, I'll put a Integer companyId in the Department class, so that I can add a department to a company.

Also, I need to get the ids of all the departments. Another hit to the DB but not a heavy one, so should be ok. The code could look like:

@Service
@Path("/companies")
public class CompanyResource {

    @Autowired
    private CompanyService companyService;

    @Autowired
    private CompanyParser companyParser;

    @Path("/{id}")
    @GET
    @Consumes(MediaType.APPLICATION_JSON)
    @Produces(MediaType.APPLICATION_JSON)
    public Response findById(@PathParam("id") Integer id) {
        Optional<Company> company = companyService.findById(id);
        if (!company.isPresent()) {
            throw new CompanyNotFoundException();
        }
        CompanyResponse companyResponse = companyParser.parse(company.get());
        // Creates a DTO with a similar structure to Company, and recursivelly builds
        // sub-resource DTOs such as OfficeDTO
        Set<Integer> departmentIds = companyService.getDepartmentIds(id);
        // "SELECT id FROM departments WHERE companyId = id"
        // add list of links to the response
        return Response.ok(companyResponse).build();
    }
}
@Entity
@Table(name = "companies")
public class Company {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Integer id;

    private String name;

    private String industry;

    @OneToMany(fetch = EAGER, cascade = {ALL}, orphanRemoval = true)
    @JoinColumn(name = "companyId_fk", referencedColumnName = "id", nullable = false)
    private Set<Office> offices = new HashSet<>();

    // getters and setters
}
@Entity
@Table(name = "departments")
public class Department {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Integer id;

    private String name;

    private Integer companyId;

    @OneToMany(fetch = EAGER, cascade = {ALL}, orphanRemoval = true)
    @JoinColumn(name = "departmentId", referencedColumnName = "id", nullable = false)
    private Set<Employee> employees = new HashSet<>();

    // getters and setters
}

Updating a resource

For the update operation, I can expose an endpoint with PUT or POST. Since I want my PUT to be idempotent, I can't allow partial updates. But then, if I want to modify the company's description field, I need to send the whole resource representation. That seems too bloated. The same when updating an employee's PersonalInformation. I don't think it makes sense having to send all the Skills + Courses together with that.

Question 2: Is PUT just used for fine-grained resources?

I've seen in the logs that, when merging an entity, Hibernate executes a bunch of SELECT queries. I guess that's just to check if anything has changed and update whatever information needed. The upper the entity in the hierarchy, the heavier and more complex the queries. But some sources advise to use coarse grained resources. So again, I'll need to check how many tables are too much, and find a compromise between resource granularity and DB query complexity.

Question 3: Is this just another "know where to cut" engineering decission or am I missing something?

Question 4: Is this, or if not, what is the right "thinking process" when designing a REST service and searching for a compromise between resource granularity, query complexity and network chattiness?

Best Answer

I think you have complexity because you are starting with over-complication:

Paths would be something as:

companies/1/departments/1/employees/1/courses/1
companies/1/offices/1/employees/1/courses/1

Instead I would introduce simpler URL scheme like this:

GET companies/
    Returns a list of companies, for each company 
    return short essential info (ID, name, maybe industry)
GET companies/1
    Returns single company info like this:

    {
        "name":"Acme",
        "description":"Some text here"
        "industry":"Manufacturing"
        departments: {
            "href":"/companies/1/departments"
            "count": 5
        }
        offices: {
            "href":"/companies/1/offices"
            "count": 3
        }
    }

    We don't expand the data for internal sub-resources, 
    just return the count, so client knows that some data is present.
    In some cases count may be not needed too.
GET companies/1/departments
    Returns company departments, again short info for each department
GET departments/
    Here you need to decide if it makes sense to expose 
    a list of departments or not. 
    If not - leave only companies/X/departments method.

    Note, that you can also use query string to make this 
    method "searchable", like:
        /departments?company=1 - list of all departments for company 1
        /departments?type=support - all 'support' departments for all companies
GET departments/1
    Returns department 1 data

This way it answers most of your questions - you "cut" the hierarchy right away and you don't bind your URL scheme to the internal data structure. For example, if we know employee ID, would you expect to query it like employees/:ID or like companies/:X/departments/:Y/employees/:ID?

Regarding PUT vs POST requests, from your question it is clear that you feel the partial updates will be more efficient for your data. So I would just use POSTs.

In practice, you actually want to cache data reads (GET requests) and it is less important for data updates. And update are often can't be cached regardless of what type of request you do (like if server automatically sets the update time - it will be different for every request).

Update: regarding the right "thinking process" - since it is based on HTTP, we can apply the regular way of thinking when designing the web-site structure. In this case on the top we can have a list of companies and show a short description for each with a link to the "view company" page, where we show company details and links to offices / departments and so on.