C# – handling large datasets in web api & odata

asp.netasp.net-web-apicodatarest

I have been working with asp.net web api over recent weeks with great success. It has really assisted me with producing an interface for mobile clients to programme against over http.

I reached a point where I need some assistance.

I have a new endpoint which will can a database and could return 100K results. I am using OData to filter the data and return a paginated set of the data.

As this could happen for mutliple requests, I am concerned with performance. Returning 100K records from the database every time is not ideal. So I have some ideas.

First one is to cache the 100K results and let OData do its magic on this every time. I am working with AppFabric distributed cache as its a load balanced environment. However caching such an amount of data in AppFabric could result in memory complications so think I am best avoiding this.

Next option is to forget about the magic of OData and send the filters I use in to the database and return only the required data each time. So in other words hit the db every time.

I could look at using a caching handler like the version outlined in this article to cache in the http cache -> http://byterot.blogspot.ie/2012/06/aspnet-web-api-caching-handler.html The drawback of this is if the data gets update via another system which it may, the cached data is not expired.

Any other tips as to how I may handle this scenario, large amount of data, filtered with odata in conjunction with web api?

Best Answer

This is a question that's likely to result in a wide variety of answers. That said, let me put on my pre-MSFT hat and give you my two cents.

A lot of architecture questions are best answered with the consultant's answer, "It depends." The answer depends in your case on a few things specifically. Some developers have a problem with caching layers because there are additional things to think about. An ACID-compliant database buys you a lot of insurance that you have at least a very finite amount of eventual consistency.

If it were me making this decision, I would be considering a few things:

How many rows am I returning on a regular basis?
Are they the same rows over and over?
How big is that in memory? (100k is really not that many rows; you're right about not wanting those 100k rows to hit the disk every time, but it's probably not a problem to keep them all in memory; SQL Server would probably do this for you anyway.)
What am I willing to deal with re: eventual consistency? Do I want some other software to deal with it? (What frequently scares people about caches are things like ensuring that invalidation and insertion get done properly and consistently from different applications/different places in the application.)

Given the information you've already provided (tiered architecture, willingness to try a distributed cache) I think you should pursue a caching layer. There are lots of good caches out there. AppFabric worked fine for us before I worked at Microsoft, but I've also dealt with a variety of other caching layers as well.

Related Solutions

Rest – Understanding REST: Verbs, error codes, and authentication

I noticed this question a couple of days late, but I feel that I can add some insight. I hope this can be helpful towards your RESTful venture.

Point 1: Am I understanding it right?

You understood right. That is a correct representation of a RESTful architecture. You may find the following matrix from Wikipedia very helpful in defining your nouns and verbs:

When dealing with a Collection URI like: http://example.com/resources/

GET: List the members of the collection, complete with their member URIs for further navigation. For example, list all the cars for sale.
PUT: Meaning defined as "replace the entire collection with another collection".
POST: Create a new entry in the collection where the ID is assigned automatically by the collection. The ID created is usually included as part of the data returned by this operation.
DELETE: Meaning defined as "delete the entire collection".

When dealing with a Member URI like: http://example.com/resources/7HOU57Y

GET: Retrieve a representation of the addressed member of the collection expressed in an appropriate MIME type.
PUT: Update the addressed member of the collection or create it with the specified ID.
POST: Treats the addressed member as a collection in its own right and creates a new subordinate of it.
DELETE: Delete the addressed member of the collection.

Point 2: I need more verbs

In general, when you think you need more verbs, it may actually mean that your resources need to be re-identified. Remember that in REST you are always acting on a resource, or on a collection of resources. What you choose as the resource is quite important for your API definition.

Activate/Deactivate Login: If you are creating a new session, then you may want to consider "the session" as the resource. To create a new session, use POST to http://example.com/sessions/ with the credentials in the body. To expire it use PUT or a DELETE (maybe depending on whether you intend to keep a session history) to http://example.com/sessions/SESSION_ID.

Change Password: This time the resource is "the user". You would need a PUT to http://example.com/users/USER_ID with the old and new passwords in the body. You are acting on "the user" resource, and a change password is simply an update request. It's quite similar to the UPDATE statement in a relational database.

My instinct would be to do a GET call to a URL like /api/users/1/activate_login

This goes against a very core REST principle: The correct usage of HTTP verbs. Any GET request should never leave any side effect.

For example, a GET request should never create a session on the database, return a cookie with a new Session ID, or leave any residue on the server. The GET verb is like the SELECT statement in a database engine. Remember that the response to any request with the GET verb should be cache-able when requested with the same parameters, just like when you request a static web page.

Point 3: How to return error messages and codes

Consider the 4xx or 5xx HTTP status codes as error categories. You can elaborate the error in the body.

Failed to Connect to Database: / Incorrect Database Login: In general you should use a 500 error for these types of errors. This is a server-side error. The client did nothing wrong. 500 errors are normally considered "retryable". i.e. the client can retry the same exact request, and expect it to succeed once the server's troubles are resolved. Specify the details in the body, so that the client will be able to provide some context to us humans.

The other category of errors would be the 4xx family, which in general indicate that the client did something wrong. In particular, this category of errors normally indicate to the client that there is no need to retry the request as it is, because it will continue to fail permanently. i.e. the client needs to change something before retrying this request. For example, "Resource not found" (HTTP 404) or "Malformed Request" (HTTP 400) errors would fall in this category.

Point 4: How to do authentication

As pointed out in point 1, instead of authenticating a user, you may want to think about creating a session. You will be returned a new "Session ID", along with the appropriate HTTP status code (200: Access Granted or 403: Access Denied).

You will then be asking your RESTful server: "Can you GET me the resource for this Session ID?".

There is no authenticated mode - REST is stateless: You create a session, you ask the server to give you resources using this Session ID as a parameter, and on logout you drop or expire the session.

Json – How to get ASP.NET Web API to return JSON instead of XML using Chrome

Note: Read the comments of this answer, it can produce a XSS Vulnerability if you are using the default error handing of WebAPI

I just add the following in App_Start / WebApiConfig.cs class in my MVC Web API project.

config.Formatters.JsonFormatter.SupportedMediaTypes
    .Add(new MediaTypeHeaderValue("text/html") );

That makes sure you get JSON on most queries, but you can get XML when you send text/xml.

If you need to have the response Content-Type as application/json please check Todd's answer below.

NameSpace is using System.Net.Http.Headers.

Best Answer

Related Solutions

Rest – Understanding REST: Verbs, error codes, and authentication

Json – How to get ASP.NET Web API to return JSON instead of XML using Chrome

Note: Read the comments of this answer, it can produce a XSS Vulnerability if you are using the default error handing of WebAPI

Related Topic