Advantages of Non-Caching LINQ Implementations

frameworksimplementationslinqnet

This is a known pitfall for people who are getting their feet wet using LINQ:

public class Program
{
    public static void Main()
    {
        IEnumerable<Record> originalCollection = GenerateRecords(new[] {"Jesse"});
        var newCollection = new List<Record>(originalCollection);

        Console.WriteLine(ContainTheSameSingleObject(originalCollection, newCollection));
    }

    private static IEnumerable<Record> GenerateRecords(string[] listOfNames)
    {
        return listOfNames.Select(x => new Record(Guid.NewGuid(), x));
    }

    private static bool ContainTheSameSingleObject(IEnumerable<Record>
            originalCollection, List<Record> newCollection)
    {
        return originalCollection.Count() == 1 && newCollection.Count() == 1 &&
                originalCollection.Single().Id == newCollection.Single().Id;
    }

    private class Record
    {
        public Guid Id { get; }
        public string SomeValue { get; }

        public Record(Guid id, string someValue)
        {
            Id = id;
            SomeValue = someValue;
        }
    }
}

This will print "False", because for each name supplied to create the original collection, the select function keeps getting reevaluated, and the resulting Record object is created anew. To fix this, a simple call to ToList could be added at the end of GenerateRecords.

What advantage did Microsoft hope to gain by implementing it this way?

Why wouldn't the implementation simply cache the results an internal array? One specific part of what's happening may be deferred execution, but that could still be implemented without this behavior.

Once a given member of a collection returned by LINQ has been evaluated, what advantage is provided by not keeping an internal reference/copy, but instead recalculating the same result, as a default behavior?

In situations where there is a particular need in the logic for the same member of a collection recalculated over and over, it seems like that could be specified through an optional parameter and that the default behavior could do otherwise. In addition, the speed advantage that is gained by deferred execution is ultimately cut back against by the time it takes to continually recalculate the same results. Finally this is confusing block for those that are new to LINQ, and it could lead to subtle bugs in ultimately anyone's program.

What advantage is there to this, and why did Microsoft make this seemingly very deliberate decision?

Best Answer

What advantage was gained by implementing LINQ in a way that does not cache the results?

Caching the results would simply not work for everybody. As long as you have tiny amounts of data, great. Good for you. But what if your data is larger than your RAM?

It has nothing to do with LINQ, but with the IEnumerable<T> interface in general.

It is the difference between File.ReadAllLines and File.ReadLines. One will read the whole file into RAM, and the other will give it to you line by line, so you can work with large files (as long as they have line-breaks).

You can easily cache everything you want to cache by materializing your sequence calling either .ToList() or .ToArray() on it. But those of us who do not want to cache it, we have a chance to not do so.

And on a related note: how do you cache the following?

IEnumerable<int> AllTheZeroes()
{
    while(true) yield return 0;
}

You cannot. That's why IEnumerable<T> exists as it does.

Definition

REST (Representational State Transfer) is a software architecture's style for distributed system (like WWW). It is not a standard but it uses a set of standards: HTTP, AJAX, HTML, URI, Mime Type, etc. We are talking about representation of a resource, not about a resource itself. Taken from 'How I explained REST to my wife':

Wife: A web page is a resource?

Ryan: Kind of. A web page is a representation of a resource. Resources are just concepts.

Architecture's Constraints

Client-Server: client and server are separated by the Uniform Interface (described below).
Stateless: server-client communication is done without saving a particular client state on the server.
Cachable: the client might have a cache of responses of requests already made.
Layered System: the client doesn't know if it's directly connected with an end-server or if the communication is done through intermediates.

Uniform Interface

Resources' Identification: each resource has to be identified by a URI.
Protocol: in order to get in communication client and server, a protocol has to be done before. Each request might have the right MIME Type (application/xml, text/html, application/rdf+xml, etc.), the right headers and the right HTTP method (see CRUD description below).

CRUD

Ok, we saw that to identify resources we can use URI, but we need something else for the actions (add, modify, delete, etc): a great welcome to CRUD (Create, Read, Update and Delete).

Create {HTTP: POST} {SQL: INSERT} => create a new resource
Read {HTTP: GET} {SQL: SELECT} => get a resource
Update {HTTP: PUT} {SQL: UPDATE} => modify a resource
Delete {HTTP: DELETE} {SQL: DELETE} => delete a resource

Now, concerning PUT and DELETE, some tech problems could appear (you'll get them with HTML form): often developers bypass this problem using POST for each 'PUT' and 'DELETE' request. Officially, you have to use PUT and DELETE. By the way, do what you want. My experience pushes me to use POST and GET every time.

--- Next part should be used but it isn't a REST's bond: it concerns Linked Data ---

URI

Abstract URI from technical details! Say goodbye to URI as follows:

http://www.example.com/index.php?query=search&id=9823&date=08272012

Re-design URI! Take the link above and change it as follows:

http://www.example.com/search/2012/08/27/9823

That's much better, huh? It could be done by:

server application: a root file that routes each request.
server web: .htaccess file plus rewrite rules
client application: HTML5 history object or fragments (also Twitter uses fragments: http://www.twitter.com/#!/__wilky__)

Another thing: use different URI to represent different resources:

http://www.example.com/about (that's the resource)
http://www.example.com/about.html (that's the HTML representation of the resource)
http://www.example.com/about.rdf (that's the RDF representation of the resource)

Pay attention: about.html and about.rdf are not files! They could be the result of an XSLT transformation!

Content Negotiation

If you've reached this point, congratulations! Probably, you're ready to get more abstract concepts because we're entering in the Semantic Web technical details ;) Well, when your client wants a resource, it typically makes the following request:

GET http://www.example.com/about
Accept: application/rdf+xml

But the server won't respond with the about.rdf because it has a different URI (http://www.example.com/about.rdf). So, let's have a look to the 303 pattern! Server will return this:

303 See Other
Location: http://www.example.com/about.rdf

And the client will follow the link returned as follows:

GET http://www.example.com/about.rdf
Accept: application/rdf+xml

Finally, the server will return the resource requested:

200 OK
about.rdf

Don't worry: your client application won't do anything of this! The 303 pattern must be done by the server application and your browser will do the rest ;)

Conclusion

Often the theory is far, far away from the practice. Yeah, now you know how to design and develop a RESTful application but the guideline above is just a hint. You will find your best way to build web applications and probably it won't be the same as theory wants. Don't give it a damn :D!

Bibliography

RESTful Web Services, Sameer Tyagi

REST APIs must be hypertext-driven, Roy Thomas Fielding

RESTful Web services: The basics, Alex Rodriguez

Webber REST Workflow

C# – Best way to remove list items from an existing record

Just add kind of state property to your records.

Your record can potentially have at least 3 states

Added
Deleted
Changed

When your server/base/whatever.. recives the raw data-table from the UI, it just iterates over the collection and based on state of every record in that data-table executes appropriate operation.

Hope this helps.