BYBLACK DUCK

Ohloh and the Long Tail

We have always felt that Ohloh indexes the most important open source projects. Recognizing that “most important” is a subjective qualifier, we recently decided to alter our policy to simply index all open source projects. To achieve this, we’ve deployed a crawler to fetch the “very long tail” of less-visible open source projects.

Our crawler works in tandem with the existing openly-editable project wiki. The crawler recognizes that human edits are most important. It never overwrites anyone’s edits. It only fills in gaps of missing information. It is also remembers what it has crawled, and keeps these facts up-to-date. Should anyone correct the information provided by the crawler, it will cease to update that information and preserve the manual changes.

So far we’ve indexed hundreds of thousands of projects from major forges. We believe this will make Ohloh a more valuable tool in discovering and evaluating open source choices.

  • http://echoreply.us tinkertim (Tim Post)

    Is there a list of forges that are crawled? I’m assuming sf.net and github are on the list, what about other smaller (but popular) forges like ShareSource?

  • http://blog.sfartz.com ObjectIsAdvantag (ObjectIsAdvantag)

    Is CodePlex crawled ?
    Can you make the list of crawled forges public ?

  • jnareb (Jakub Narębski)

    If you are crawling repo.or.cz, git.kernel.org and gitweb.freedesktop.org, or searching the web for gitweb / cgit installations, what enhancements would you like to have (for example in form of microformats like rel=vcs-* microformat proposal) to make finding projects easier?

  • http://nascent.freeshell.org he_the_great (he_the_great)

    Hey, it has been three years, who do you crawl?