Maximum size of an indexing service catalog

capacityindexing-servicewindows-server-2008

Does anyone know what the maximum size on an indexing service index is on Windows 2008? We are having all sorts of problems with the index hanging and not processing new documents.

I just deleted the catalog and and recreated it. I've added in 4 of the folders that should be indexes, but have 8 more to add. The index has gone up to ~3 Gigs for the 4 folders that are being indexed.

So far the Indexing service has stayed up and running for several days now. (Knock on wood.) I'm now thinking that the Indexing service doesn't like it when the network share that it is looking at fails over. The file server is an active passive cluster and all network shares are a cluster resource within there own cluster group (cluster application using Windows 2008 terms). The Index service is also a clustered resource within it's own application so that it can fail over independently of the file shares.

From what I can tell the indexing service only seams to really have a panic attack when one of the nodes fails over (granted this happens every time that Microsoft releases a patch as the nodes reboot).

I'm considering putting a script in each clustered application which forces the indexing service to go offline then come back online when any of the monitored network shares fail over. If I go this route I'll have to be careful that when multiple network shares failover at once they don't start failing if the index service is already in the process of failing over.

Best Answer

It's been awhile since you posted this question. Can you drop in an update on the behavior / performance you're seeing?

I hate to say it, but I'm going to guess that you're into "benchmark it yourself and see" territory. I'm not aware of any published "limits" on the Indexing Service. Indeed, the "Microsoft Index Server", which is an ancestor to the modern-day "Indexing Service", was specifically cited has having no built-in limits (see http://msdn.microsoft.com/en-us/library/dd582938(office.11).aspx for details) to numbers of documents or, presumably, catalog size. The behavior of the Indexing Service is highly dependent on the type and composition of the documents being indexed, so there isn't an easy "maximum size" number.

When you say "...there are ~500 files...", are you talking about 500+ files laying around in the catalog directory? That makes it sound like CiSvc isn't doing merges, for some reason. The vast majority of the files laying around should get merged into the main Catalog.WCI file and be deleted. There is a daily "master merge" that should be occurring, at minimum, to combine all the shadow indices created by the CiDaemon processes into the master index. Perfmon can show you more about what's going on inside.

The rule of thumb for the index size we always used back in the NT 4.0 days was roughly 40% of the size of the corpus of documents being indexed. Does that jibe with the files you're indexing?

If you don't mind that searches can't span multiple catalogs (unless you code something up to submit the same serach on multiple catalogs and aggregate the results yourself) you could break up your corpus into multiple catalogs if you start hitting performance issues.

It's interesting, to me, to hear that you're using Indexing Service. It's venerable, dating all the way back to the Windows NT 4.0 Option Pack-- even further if you consider that it was part of the "Cairo" initiative way back (codename Tripoli, at that time). You're making remember "master merges" and "shadow merges" and all kinds of little details of the old "Microsoft Index Server" that I thought I'd forgotten... >smile< It makes me sad that Microsoft didn't throw more effort at it, as a product, because it could've easily been the basis for an enterprise distributed search system. Oh, well... paths not taken, I suppose.

Edit:

You're in a scale territory that I've never used Indexing Service in before. Multiple catalogs (or even multiple instances of Indexing Service on multiple boxes) are probably your next place to go when perf suffers. Hopefully you don't need to go there.

I'm not sure how it "knows" to "panic" when the shares fail over, and I daresay it'd take looking at the source to figure out why. That sounds like one of those "Doctor, it hurts when I do this." "Well, don't do that." kind of things. To that end, your plan re: handling failover of shares is probably a good one.

30% or less index-to-corpus ratio is definitely better than Microsoft always said to plan for, back in the day. Sounds like the files you're indexing, being mainly text, don't have the overhead of OLE properties to be cached like Office docs (which was, I believe, Microsoft's basis for the rule of thumb nuber of 40%). (As an aside, you can have your devs code filters for these various types of files and gain the ability to do property-specific searches, if you are so inclined. Show me all emails from xxxx, etc... heh heh. That will, of course, grow the property cache.)

The 500+ files in the catalog did finally clean up and merge, didn't they?

What does it do when it "panics", anyway? Does it just stop "seeing" new documents and indexing them?

Related Topic