Python – extract unique elements from a 2D python list and put them into a new 2D list

listpythonunique

Right now I have a 2D list with three columns and numerous rows, each column contains a unique type of stuff. The first column is UserID, the second column is timestamp, the third column is URL. The list looks like this:

[[304070, 2015:01:01, 'http:something1'],
[304070, 2015:01:02, 'http:something2'],
[304070, 2015:01:03, 'http:something2'],
[304070, 2015:01:03, 'http:something2'],
[304071, 2015:01:04, 'http:something2'],
[304071, 2015:01:05, 'http:something3'],
[304071, 2015:01:06, 'http:something3']]

As you can see, there are some duplicate URLs, regardless of userID and timestamp.

I need to extract those rows which contain unique URLs and put them into a new 2D list.

For example, the second row, third row, forth row and fifth row all have the same URL regardless of userID and timestamp. I only need the second row (first one appears) and put it into my new 2D list. That being said, the first row has a unique URL and I will also put it into my new list. The last two rows ( sixth and seventh ) have the same URL, and I only need the sixth row.

Therefore, my new list should look like this:

[304070, 2015:01:01, 'http:something1'],
[304070, 2015:01:02, 'http:something2'],
[304071, 2015:01:05, 'http:something3']]

I thought about using something like this:

for i in range(len(oldList):
    if oldList[i][2] not in newList:
        newList.append(oldList[i])

but obviously this one does not work, becuase oldList[i][2] is an element, not in newList is checking the entire 2D list, ie, checking every row. Codes like this will just create an exact copy of oldList.

OR, I could just eliminate those rows having duplicate URLs, because using a for loop plus append operator on a 2D list with one million rows really would take a while.

Best Answer

A good way of going about this would be to use a set. Go through your list of lists one at a time, adding the URL to the set if it's not already there, and adding the full list containing that URL to your new list. If a URL is already in the set, discard the current list and move to the next one.

old_list = [[304070, "2015:01:01", 'http:something1'],
            [304070, "2015:01:02", 'http:something2'],
            [304070, "2015:01:03", 'http:something2'],
            [304070, "2015:01:03", 'http:something2'],
            [304071, "2015:01:04", 'http:something2'],
            [304071, "2015:01:05", 'http:something3'],
            [304071, "2015:01:06", 'http:something3']]
new_list = []
url_set = set()

for item in old_list:
    if item[2] not in url_set:
        url_set.add(item[2])
        new_list.append(item)
    else:
        pass

>>> print(new_list)
[[304070, '2015:01:01', 'http:something1'], [304070, '2015:01:02', 'http:something2'], [304071, '2015:01:05', 'http:something3']]

Related Solutions

Python – the difference between old style and new style classes in Python

From New-style and classic classes:

Up to Python 2.1, old-style classes were the only flavour available to the user.

The concept of (old-style) class is unrelated to the concept of type: if x is an instance of an old-style class, then x.__class__ designates the class of x, but type(x) is always <type 'instance'>.

This reflects the fact that all old-style instances, independently of their class, are implemented with a single built-in type, called instance.

New-style classes were introduced in Python 2.2 to unify the concepts of class and type. A new-style class is simply a user-defined type, no more, no less.

If x is an instance of a new-style class, then type(x) is typically the same as x.__class__ (although this is not guaranteed – a new-style class instance is permitted to override the value returned for x.__class__).

The major motivation for introducing new-style classes is to provide a unified object model with a full meta-model.

It also has a number of immediate benefits, like the ability to subclass most built-in types, or the introduction of "descriptors", which enable computed properties.

For compatibility reasons, classes are still old-style by default.

New-style classes are created by specifying another new-style class (i.e. a type) as a parent class, or the "top-level type" object if no other parent is needed.

The behaviour of new-style classes differs from that of old-style classes in a number of important details in addition to what type returns.

Some of these changes are fundamental to the new object model, like the way special methods are invoked. Others are "fixes" that could not be implemented before for compatibility concerns, like the method resolution order in case of multiple inheritance.

Python 3 only has new-style classes.

No matter if you subclass from object or not, classes are new-style in Python 3.

C# – Using LINQ to remove elements from a List

Well, it would be easier to exclude them in the first place:

authorsList = authorsList.Where(x => x.FirstName != "Bob").ToList();

However, that would just change the value of authorsList instead of removing the authors from the previous collection. Alternatively, you can use RemoveAll:

authorsList.RemoveAll(x => x.FirstName == "Bob");

If you really need to do it based on another collection, I'd use a HashSet, RemoveAll and Contains:

var setToRemove = new HashSet<Author>(authors);
authorsList.RemoveAll(x => setToRemove.Contains(x));

Best Answer

Related Solutions

Python – the difference between old style and new style classes in Python

C# – Using LINQ to remove elements from a List

Related Topic