R – SharePoint 2007: List theory Question

sharepointwss

I'm writing a solution around MOSS 2007. And storing fairly large quantities of data in a list.

My first question is: Can lists handle large quantities of data – around 200 000 items. Now I've already read up about it, and it seems like the limitations of lists are on the number of items the views can display (2000). So question is: Is this a recommendation or a real limitation? No documentation actually confirms this.

second question if its a physical limitation in how many items the view can display, Does this mean that its impossible to check for duplicates in a sharepoint list that contains vast quantities of data?

In the sense that to perform a wsList.getListItems you have to pass a view (if the list contains 100 000 records, and the view can only contain 2000 records) how is it possible to check for duplicates?

Thanks

Best Answer

Huge list performance

You may want to read "Scaling to Extremely Large Lists and Performant Access Methods" and "Best Practices for LARGE SharePoint Lists and Documents Libraries".

Another thing this article does not mention that adding list items with SPList.Items.Add, because on large list it's a huge performance penality. What you do is build efficient query that returns no items and then add item to that collection (somwhere i was reading that webservices perform good on adding item, however i can't find that article no more).

You can also see some tests (or other tests) on how huge lists perform.

As for duplicates

You may want to create Scheduled job (SPJobDefinition) that runs somwhere at night and checks for duplicates.

Better idea than looping all SPListItem's and then Query list for each item to check for duplicates would probably be to get a DataTable (SPListItemCollection.GetDataTable()) for all items and use some technique to determine duplicates.

As for views

Filter items, order to see relevant ones and define your RowLimit. That's the key for views - you just need most relevant items, don't you?