.NET LINQ – Reasoning Behind Naming Select (Map) and Aggregate (Reduce)

language-designlinqnet

In other programming languages, I have seen Map and Reduce, and those are cornerstones of functional programming. I could not find any reasoning or history why LINQ has Aggregate (same as Reduce) and Select (same as Map)?

Why I am asking is that it took me a while to understand it is the same thing and I am curious what is the reasoning for this.

Best Answer

This mostly comes down to the history of LINQ.

LINQ was originally intended to be SQL-like, and used (largely, though not exclusively) to connect to SQL databases. This leads to much of its terminology being based on SQL.

So, "select" came from the SQL select statement, and "aggregate" came from SQL aggregate functions (e.g., count, sum, avg, min, max).

For those who question the degree to which LINQ originally related to SQL, I'd refer to (for example) Microsoft's articles on Cω, which was a language devised by Microsoft Research, and appears to be where most of the basics of LINQ were worked out before they were added to C# and .NET.

For example, consider an MSDN article on Cω, which says:

Query Operators in Cω

Cω adds two broad classes of query operators to the C# language:
- XPath-based operators for querying the member variables of an object by name or by type.
- SQL-based operators for performing sophisticated queries involving projection, grouping, and joining of data from one or more objects.

At least as far as I know, the XPath-based operators were never added to C#, leaving only the operators that were documented (before LINQ existed) as being based directly on SQL.

Now, it's certainly true that LINQ isn't identical to the SQL-based query operators in Cω. In particular, LINQ follows C#'s basic objects and function calls syntax much more closely than Cω did. Cω queries followed SQL syntax even more closely, so you could write something like this (again, drawn directly from the article linked above):

 rows = select c.ContactName, o.ShippedDate
      from c in DB.Customers
      inner join o in DB.Orders
      on c.CustomerID == o.CustomerID;

And yes, the same article does talk specifically about using the SQL-based queries to query data coming from actual SQL databases:

To connect to a SQL database in Cω, it must be exposed as a managed assembly (that is, a .NET library file), which is then referenced by the application. A relational database can be exposed to a Cω as a managed assembly either by using the sql2comega.exe command line tool or the Add Database Schema... dialog from within Visual Studio. Database objects are used by Cω to represent the relational database hosted by the server. A Database object has a public property for each table or view, and a method for each table-valued function found in the database. To query a relational database, a table, view, or table-valued function must be specified as input to the one or more of the SQL-based operators.

The following sample program and output shows some of the capabilities of using the SQL-based operators to query a relational database in Cω. The database used in this example is the sample Northwind database that comes with Microsoft SQL Server. The name DB used in the example refers to a global instance of a Database object in the Northwind namespace of the Northwind.dll assembly generated using sql2comega.exe.

So, yes, from the very beginning (or even before the beginning, depending on your viewpoint) LINQ was explicitly based on SQL, and intended specifically to allow access to data in SQL databases.