Vb.net – Group by a column then Add each group to datatable then into a dataset

datasetdatatablegroup-bylinqvb.net

what would be the best approach to such thing?

so here is my Datatable

╔═══════════════╦═══════════════╦═══════════════╗
║ Product Name  ║ Product Price ║ Product Group ║
╠═══════════════╬═══════════════╬═══════════════╣
║ Skirt Red     ║            99 ║             1 ║
║ Jeans Blue    ║            49 ║             2 ║
║ Jeans Black   ║            49 ║             2 ║
║ Skirt Blue    ║            99 ║             1 ║
║ T-shirt White ║            20 ║             3 ║
║ T-shirt Green ║            20 ║             3 ║
║ Jeans Grey    ║            49 ║             2 ║
╚═══════════════╩═══════════════╩═══════════════╝

i will group this datatable by the product group column using LINQ to produce the following groups

Group #1
    ╔═══════════════╦═══════════════╦
    ║ Product Name  ║ Product Price ║ 
    ╠═══════════════╬═══════════════╬
    ║ Skirt Red     ║            99 ║             
    ║ Skirt Blue    ║            99 ║             
    ╚═══════════════╩═══════════════╩

Group #2
        ╔═══════════════╦═══════════════╦
        ║ Product Name  ║ Product Price ║ 
        ╠═══════════════╬═══════════════╬
        ║ Jeans Blue    ║            49 ║             
        ║ Jeans Black   ║            49 ║ 
        ║ Jeans Grey    ║            49 ║
        ╚═══════════════╩═══════════════╩
Group #3
        ╔═══════════════╦═══════════════╦
        ║ Product Name  ║ Product Price ║ 
        ╠═══════════════╬═══════════════╬
        ║ T-Shirt White ║            20 ║             
        ║ T-Shirt Green ║            20 ║ 
        ╚═══════════════╩═══════════════╩

now the questions are

how do i group by the Product group column using LINQ (Done)
how do i remove the Product group column from the resulted group?
how do i add each group to a seperate datatable and then add all
tables to single dataset?(Done)
suppose there were columns that i dont want to show in the resulted
group, how can i hide them?

here is what I've tried so far

    Dim ds As New DataSet
    Dim query = From r In bookedorders Group By key = r.Field(Of Integer)("productgroup") Into Group
    For Each grp In query
        Dim x As New DataTable
        x = grp.Group.CopyToDataTable()
        ds.Tables.Add(x)
    Next

now this is working except i am not sure how to select specific columns, like i don't want to show all columns in the resulted datatables

Best Answer

1.how do i group by the Product group column using LINQ

It sounds like what you are asking for isn't technically called a group in LINQ. Grouping in linq implies that you are taking the values from one column and combining them in some way(SUM, Average) and displaying one record per some unique identifier. In your example if you wanted to show the average price per group then that would require a group but from your explanation it looks like you just need 3 different select statements that would look like:

From r in products where r.groupID = YourControlVariable select r.ProductName, r.ProductPrice

Where YourControlVariable would be each group ID. This LINQ would give you the three tables that you outlined and from there you could call the CopyToDataTable function on what the LINQ returned and set a temporary datatable equal to the output of that function.

2.how do i remove the Product group column from the resulted group?

Refer to the linq from point 1, by listing only the columns you want to select you will essentially be ignoring any of the columns that aren't listed.

3.how do i add each group to a seperate datatable and then add all tables to single dataset?

This would be fairly simple to accomplish in a loop if you know the groupID's you will be working with. you can create a temporary datatable inside a loop and set it to contain the results of the linq from point 1. once you have the results you can then name and add the table to a dataset and once the loop is finished your temporary datatables will be gone but you will be able to reference them by name in the dataset itself.

4.suppose there were columns that i dont want to show in the resulted group, how can i hide them?

Refer to point 1/2.

Edit:

As you've described the problem I think using a group by in the LINQ makes it unnecessarily complicated, in that you have to deal with anonymous types if you want to pull out individual properties. Here is a short snippet that gets all the distinct group ids then uses LINQ to pull out all products of each type, addsd them to a table and then adds the named table to the dataset. You can remove the name portion or change it to be whatever you want it doesn't really matter.

Dim ds As New DataSet
Dim groupIDs = From r in bookedorders Select r.Item("productgroup") Distinct
for each r in groupIDs
    Dim query = From t in bookedorders Where t.Item("productgroup") = r select t.Item("ProductName"), t.Item("ProductPrice")
    If query IsNot Nothing AndAlso query.Any Then
        Dim tempDT as new DataTable
        tempDT.Merge(query.CopyToDataTable)
        tempDt.Name = "ProductID" & r
        ds.Tables.Add(tempDT)
    End IF
Next

I've changed your example a little bit to use the Item collection but that will only work if bookedorders is a datatable.

Edit #2:

After thinking about it for a bit, the above example will still give you a collection of anonymous types as a result. So to get around that you could change the

Dim query = From t in bookedorders Where t.Item("productgroup") = r select t.Item("ProductName"), t.Item("ProductPrice")

line to look like:

Dim query = From t in bookedorders Where t.Item("productgroup") = r select t

and then inside your if statement will look like:

Dim tempDT as new DataTable
tempDT.Merge(query.CopyToDataTable)
tempDT.Columns.Remove("GroupID")
tempDt.Name = "ProductID" & r
ds.Tables.Add(tempDT)

Again this will only work if you are LINQing through a datatable and getting an Enumerable(Of DataRow) as your result, which is preferable.

Related Solutions

C# – Datatable vs Dataset

It really depends on the sort of data you're bringing back. Since a DataSet is (in effect) just a collection of DataTable objects, you can return multiple distinct sets of data into a single, and therefore more manageable, object.

Performance-wise, you're more likely to get inefficiency from unoptimized queries than from the "wrong" choice of .NET construct. At least, that's been my experience.

Mysql – Retrieving the last record in each group – MySQL

MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:

WITH ranked_messages AS (
  SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
  FROM messages AS m
)
SELECT * FROM ranked_messages WHERE rn = 1;

Below is the original answer I wrote for this question in 2009:

I write the solution this way:

SELECT m1.*
FROM messages m1 LEFT JOIN messages m2
 ON (m1.name = m2.name AND m1.id < m2.id)
WHERE m2.id IS NULL;

Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.

For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.

I'll write a query to find the most recent post for a given user ID (mine).

First using the technique shown by @Eric with the GROUP BY in a subquery:

SELECT p1.postid
FROM Posts p1
INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
            FROM Posts pi GROUP BY pi.owneruserid) p2
  ON (p1.postid = p2.maxpostid)
WHERE p1.owneruserid = 20860;

1 row in set (1 min 17.89 sec)

Even the EXPLAIN analysis takes over 16 seconds:

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| id | select_type | table      | type   | possible_keys              | key         | key_len | ref          | rows    | Extra       |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                       | NULL        | NULL    | NULL         |   76756 |             | 
|  1 | PRIMARY     | p1         | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY     | 8       | p2.maxpostid |       1 | Using where | 
|  2 | DERIVED     | pi         | index  | NULL                       | OwnerUserId | 8       | NULL         | 1151268 | Using index | 
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
3 rows in set (16.09 sec)

Now produce the same query result using my technique with LEFT JOIN:

SELECT p1.postid
FROM Posts p1 LEFT JOIN posts p2
  ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

1 row in set (0.28 sec)

The EXPLAIN analysis shows that both tables are able to use their indexes:

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| id | select_type | table | type | possible_keys              | key         | key_len | ref   | rows | Extra                                |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
|  1 | SIMPLE      | p1    | ref  | OwnerUserId                | OwnerUserId | 8       | const | 1384 | Using index                          | 
|  1 | SIMPLE      | p2    | ref  | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8       | const | 1384 | Using where; Using index; Not exists | 
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
2 rows in set (0.00 sec)

Here's the DDL for my Posts table:

CREATE TABLE `posts` (
  `PostId` bigint(20) unsigned NOT NULL auto_increment,
  `PostTypeId` bigint(20) unsigned NOT NULL,
  `AcceptedAnswerId` bigint(20) unsigned default NULL,
  `ParentId` bigint(20) unsigned default NULL,
  `CreationDate` datetime NOT NULL,
  `Score` int(11) NOT NULL default '0',
  `ViewCount` int(11) NOT NULL default '0',
  `Body` text NOT NULL,
  `OwnerUserId` bigint(20) unsigned NOT NULL,
  `OwnerDisplayName` varchar(40) default NULL,
  `LastEditorUserId` bigint(20) unsigned default NULL,
  `LastEditDate` datetime default NULL,
  `LastActivityDate` datetime default NULL,
  `Title` varchar(250) NOT NULL default '',
  `Tags` varchar(150) NOT NULL default '',
  `AnswerCount` int(11) NOT NULL default '0',
  `CommentCount` int(11) NOT NULL default '0',
  `FavoriteCount` int(11) NOT NULL default '0',
  `ClosedDate` datetime default NULL,
  PRIMARY KEY  (`PostId`),
  UNIQUE KEY `PostId` (`PostId`),
  KEY `PostTypeId` (`PostTypeId`),
  KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
  KEY `OwnerUserId` (`OwnerUserId`),
  KEY `LastEditorUserId` (`LastEditorUserId`),
  KEY `ParentId` (`ParentId`),
  CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
) ENGINE=InnoDB;

Note to commenters: If you want another benchmark with a different version of MySQL, a different dataset, or different table design, feel free to do it yourself. I have shown the technique above. Stack Overflow is here to show you how to do software development work, not to do all the work for you.

Best Answer

Related Solutions

C# – Datatable vs Dataset

Mysql – Retrieving the last record in each group – MySQL

Related Topic