Removing Paths/ Landing Pages From SharePoint Search Results

searchsharepoint

We've been asked by a client to remove a number of pages from being shown up in their public website search results page. I've been into the SSP and created Crawl Rules to remove these pages. All seemed to have worked ok but we have an issue in that landing pages are still showing up in their "www.domain.com/sitearea/" form but not in their "www.domain.com/sitearea/pages/default.aspx".

For each of this type of page we have created one rule to "Exclude" the "aspx" path and another rule to include the "/" path but to "Follow links on the URL without crawling the URL itself". We tried adding rules to exclude the "/" format but that only resulted in all results underneath that being excluded.

Does anybody know how to remove the "area/pages/default.aspx" and the "area/" pats from Search Results?

I'm not sure if it's the "done thing" to ask 2 questions in one but this is in a similar vein so it should be ok. I was wondering if anyone knew of a tool (or if it is possible) to allow site admins to exclude pages from search results (not via SSP/Crawl Rules). I know they can do it at the site level but I was wondering if anything out there enabled this to be done at the page level through either Page or Site Settings?

Best Answer

I'm not sure I understand -- are we talking about excluding pages from public search engines like Google, or from a internal Sharepoint-specific search function?

Well, in both cases robots.txt should work for excluding webpages from indexing by search engines. I'm no Sharepoint expert, but a quick googling seems to show that Sharepoint Search does obey robots.txt, so this would be my first pick.

Here is the main documentation for the format of robots.txt. This doc from Microsoft seems to describe Sharepoint Search management quite well. It says:

SharePoint Portal Server 2003 and SharePoint Server 2007 automatically obey the restrictions that are contained in the Robots.txt file.

- which I again take to mean that Sharepoint Search will obey a robots.txt file.

If your site is publicly accessible, then you might want to open a Google Webmaster Tools account. They have some nice tools troubleshooting various crawling issues, and seeing how your robots.txt will work for your site.