C# – Is it possible to cache data on a REST service that returns paginated data

asp.netcdapperpaginationrest

Under the REST architecture principles, a RESTful application should be stateless, therefore each time I invoke an ASP.NET 4 REST service (with GET verb) that pulls tens of thousands of records, the REST service paginates them with in chunks of 10 (with OData v4), which makes the UI lightweight because it only loads 10 records each time, however each time the user calls for the next chunk of 10 records the ASP.NET controller calls the read method on the data access layer (Dapper micro ORM) which in turns pulls the same thousands of records over and over again, even though the controller only returns 10 records each time thanks to OData pagination engine, the data access layer (Dapper) queries for the same thousands of records each time, which is expensive and slow. I know I could modify the query that Dapper uses so the pagination filter goes down to the query level, but I find that's too much burden to do since the filter OData sends can be quite complex and I don't have the luxury to generate a semantic tree for generating filters on the WHERE clause, and besides, isn't that OData work in the first place? Isn't it possible to simply cache the thousands of records somewhere to avoid calling the database each time if I the same filer is being request over and over again?

Oh yes, and Entity Framework is an absolute no go, Dapper is mandatory instead.

Best Answer

Isn't it possible to simply cache the thousands of records somewhere to avoid calling the database each time

Yes, but it is the exact opposite of being stateless, which REST is. You're trying to go against the main ideology of RESTful services.

Can it be done? On a technical level, if you really want, sure. But this is a case of the XY problem. Your proposed solution (Y - caching the data) is a workaround for the actual problem (X - paginating the query to lower the amount of data returned).

Storing the full dataset only solves half of the problem. If the user only looks at two pages on average, that still means that you retrieved 20 useful rows, and 9980 useless rows (assuming 10,000 rows total). That's means that 99.8% of the retrieved data is never used. That's still a big waste of space.

Furthermore, you'd have to cache a separate dataset for every unique combination of filters. Which means that you're likely going to have a ton of duplicate data in memory, because a given record may appear in several differently filtered datasets.
Assuming you have a significant amount of concurrent users and are caching all their requests, I'm starting to think that it'd be more efficient for you to simply store the entire table in-memory once, just to get rid of the duplicates (I'm not saying that you should do this, I'm just pointing out that caching everything is going to cause more problems than it's likely to solve).

There are many reasons why you shouldn't be trying to cache the unpaginated data as a solution to a pagination problem.

I know I could modify the query that Dapper uses so the pagination filter goes down to the query level, but I find that's too much burden to do

Well if you refuse to paginate the dataset, then you obviously can't expect a paginated dataset. But then you're excluding the correct solution in favor of an easier unRESTful solution; which is likely to create technical debt for the future.

Have a look at this pagination with Dapper example. You'll have to change the subquery to your own query.

SELECT  *
FROM    ( SELECT    ROW_NUMBER() OVER ( ORDER BY InsertDate) AS RowNum, *
          FROM      Posts
          WHERE     InsertDate >= '1900-01-01'
        ) AS result
WHERE   RowNum >= 1 // *your pagination parameters
    AND RowNum =< 10  //*
ORDER BY RowNum

All you would need to do to implement this for yourself is figure out the row limits (1 and 10 in the example). These can be found by using simple calculations.

Note: I'm assuming that the pageNumber is 1-indexed, because that's generally how the UI portrays it. pageSize is 10 in your case.

var row_limit_lower = ((pageNumber - 1) * pageSize) + 1;
var row_limit_upper = (pageNumber * pageSize) - 1;

Figure out which page number is being requested. Based on that (and a page size, which I infer is always 10 for your application), you can calculate and implement the needed pagination in the query itself, thus preventing the retrieval of many unnecessary rows.

Related Topic