GitHub Wiki – How to Make Crawlable by Search Engines

githubsearch-enginewiki

While using the W3C link checker, I found out that my Github Wiki can't be crawled:

https://github.com/aegif/CmisSync/wiki/Getting-started-with-CmisSync-development
Status: (N/A) Forbidden by robots.txt

This is unfortunate, as I would like people to easily find this Wiki on search engines.

QUESTION: How can I make my Github Wiki crawlable by search engines?
Or am I mistaken and Github's robots.txt is actually OK?

Best Answer

The GitHub robots.txt does explicitly disallow crawling of the wiki pages, for example in the Googlebot section:

User-agent: Googlebot
Allow: /*/*/tree/master
Allow: /*/*/blob/master
...
Disallow: /*/*/wiki/*/*

As this is the site-wide robots file, there isn't any getting around it.

It is an interesting choice, since GitHub describes wikis as a place to "share long-form content about your project". Since by default public wikis are editable by any user, perhaps it is a heavy handed protection from spammers.