How to Paginate Local and Remote Data in Java

hibernatejavapaginationrest

My team is currently facing a problem that we don't know how to tackle.

Some technical details: we use Java 8, Hibernate, Spring, MySQL, and AngularJS for the front-end.

We need to do pagination on a combined set of local and remote data. The approach we are taking is the following:

  1. Query our local database since we support pagination with Hibernate and retrieve the paginated records.
  2. Prepare the previous records to invoke a REST service to retrieve extra details.
  3. Merge all the details and render the data in a webpage.

This approach works fine when we are doing a search to simply retrieve more details from the remote service into our local data.

The problem lies in the impossibility of using our local data as the pagination reference.

Example:
Let's say that we are searching for all the SuperBrand cars in stock that use milk as fuel and have square wheels. In our local DB we have 10k cars from which 2k are SuperBrand. So we have this initial volume of 2k records of local data that Hibernate manage to paginate into 20 pages of 100 records each. However, the customer only wants the milk and square wheels models. In this case, we need to invoke a third-party that can tell us of those 2k cars which ones fulfil the search criteria. The thing is that not all cars run on milk and have square wheels which may disrupt the pagination. Plus, the third-party doesn't support pagination.

For this reason, we've dismissed the possibility of sending to the third-party the data per page, i.e., send the 100 records of page 1 and check with the third-party, and when the customer selects page 2, send the correspondent 100 records and re-check with the third-party. This is not ideal because from the 100 records of page 1 maybe only 10 follow the criteria. As consequence, we would only display 10 records on page 1. Perhaps the whole 100 records of page 2 fulfilled the criteria and could actually be displayed. Nevertheless, pagination is already broken by page 1.

The other possibility is to send the whole 2k records, invoke the third-party, reduce the data volume to the relevant number of records fulfilling the search criteria and create a pagination mechanism in the server to handle and hold the data. Even if conceptually this would work I'm still worried about performance for huge data volumes.

Questions regarding the topic:

  • Is the previous option a valid approach?
  • Is there any possibility to stream the data even with a remote service involved, instead of doing pagination (bear in mind that the searches are triggered in AngularJS through REST calls to our servers)?
  • Would anyone recommend another approach?

Best Answer

This is a difficult problem. Maybe we can simplify it?

Problem with standard paging is that you typically need to know how many are there results in total - so that you can display links to page 1, page 2 ... page X. Maybe you can simplify the UI so that you only have "next" link to the next page - customer doesn't see the total number of results (because it is not known). This allows us to not call 3rd party service for all DB results.

Now, how does it work. Page size is lets say 20. We get the request for page 1 - we load let's say first 100 results from the database satisfying the DB filter - more than we need, because we expect that 3rd party filter will reduce this number significantly. We send these 100 to the 3rd party service, it tells us that 12 of these satisfy the filter, so we need 8 more. We load another 100 results (with correct offset) from the database and again call 3rd party service, it tells us that 15 of those satisfy the filter. That's more than we need - we take just 8 of those and display them.

Now we need to display "next" link for second page - but the trick is that we don't index the page in the URL request (like ?page=2"), but index the last displayed record (e.g. "?offset=155"). In this particular case, offset should be the offset of 20th element satisfying the condition (so something between 100 and 200).

This of course isn't perfect - it still has nasty performance worst cases, but at least they are not linear to the number of items in the database.

Related Topic