Design – Implementing paging with multiple data sources

Architecturedesignpagination

I have multiple data sources that I need to search across and return back to the client (web app).

For example the sources are:

  1. an elastic search index
  2. a sql database

Is there an efficient way to perform paging across two sources? At the moment I am searching on one, and then reducing the searchable items on the second, then paging only.

Alternative options:

  • Ideally, I would like to move one source into the other, but for various reasons (e.g. space constraints, pricing etc.) this seems not a viable option.
  • Disabling the search until a more refined criteria is placed in, so the returning result set is guaranteed to be smaller and thus paging is of less importance.

Without the paging, the performance of this aspect of the application is not great when the search criteria is more open.

Are there any approaches for this nature of searching?

Best Answer

Get the data into a single index

The simplest, no muss, no fuss solution.

But then again, why would anything Enterprise be simple?

Supply two result sets

The best you can do is to provide the first page of each sources answer. If either source runs dry, simply return their set as empty. Don't be tempted to provide more results from the other source, because if the dry source suddenly fills up you are going to have user confused as to why some results are repeated.

Merge at the client

Alternately if you have some measure of control over the client, you can list pages from the api from both sources, and use the quality metric to sort the returned data into pages for the user. You will need to ensure that you have the next item (or end of data) from both sources to ensure a good merge for that page. This will place some burden on the users computer, so make sure their system is up to the intended load.

Messy Hack - Here for completeness avoid if at all possible.

There is a rather bad hack that you could do. It would provide the almost illusion of a unified data source. It is however woefully inefficient and breaks basic encapsulation. Add a parameter per data source to act as an item offset. To produce a Page of N items, run a query against each datasource for the offset + N items. Merge these in the API and return the top N items, along with updated offsets for the next page.

Choices

Fight for a single index, and use the two result sets as the alternative. Seriously hide the fact that you could merge the data at the client, or at the api. You don't want to have to undo those later, and the business team will both expect that you can do this for every data source now, and complain bitterly when it is no longer responsive, and what the costs in man hours are to fix it. It is simply best to deny them now and get the work done to support this going forward.

Related Topic