SharePoint Search: processing filenames containing underscores

searchsharepointsharepoint-2007

We use SharePoint Server 2007 to allow employees to search network file shares, but it seems that underscores in filenames are not treated as word separators when indexing the files.

As a result, a search for chocolate will:

  • match "chocolate milkshake.doc"
  • but not match "chocolate_cake.doc"

(Of course, this is a simplified example; in practice the content of the second file might include the word "chocolate" and match on that instead of the filename. But the problem itself is real enough, because a common scenario in a corporate environment is that a user knows the the partial name of the file they are looking for and expects to see matching filenames at the top of the search results. And using underscores in filenames is a widely used convention within our company).

Underscores are not treated as word separators in the file content either, although this is less of a concern for us. The root cause of this problem is possibly related to the behaviour of the word breakers that SharePoint uses (i.e. the language-specific DLLs that implement the IWorkBreaker interface), although I haven't confirmed this yet.

Does anyone know of a workaround for this issue? I have tested with Search Server 2008 Express too (which is based on the same technology), and it is also affected. I do not know whether the problem is fixed in SharePoint 2010 or not.

Best Answer

I don't think underscores are treated as delimiters, and there's a bit of traffic on social.technet that seems to confirm this. If (since) that's the case, you'll need a partial/wildcard search to match 'chocolate' from 'chocolate_cake.doc', which the core results web part won't do. However, there's a codeplex web part for 2007 that does just that.

FYI, the 2010 version of this same web part notes that SharePoint 2010 adds wildcard searches, provided the user types the asterisk.