BYBLACK DUCK

Measuring Project Activity

We like to tout how many projects we have in Ohloh. It is right on the home page – today, the number is 550,869. Thats a lot of projects! By any measure, the world of FOSS is a big one. GitHub has over 4.7 million repositories. SourceForge hosts almost 325,ooo projects. But really, is the raw number of projects in Ohloh a good measure of the actual magnitude of FOSS development efforts?

As of the end of March, Ohloh has code analysis for 271,372 projects – i.e. there has been a working code repository at some point in the project’s past history on Ohloh. That is a little under half the projects. Yup – just half the projects on Ohloh have ever had code associated with them. That cuts things down considerably! So, how many of the projects with code on Ohloh have recent activity? Lets have a look:

  • 96,824 with a commit in the past 2 years.
  • 46,883 with a commit in the past year.
  • 29,303 with a commit in the past 6 months.
  • 21,251 with a commit in the past 3 months.
  • 12,870 with a commit in the past month.
  • 5,629 with a commit in the past week.
  • 1,224 with a commit in the past day (3/30-3/31, a weekend)

This is still a mammoth amount of development activity! And, there is no denying the value of all that code out there, even the code that is not being actively developed now. This source code commons is an amazing gold mine of value, built over decades by developers around the world, and available under FOSS licenses to be forged into new code, new projects, and new innovations by future developers, some of whom haven’t even learned to program yet – or even been born.

But the real work of FOSS is happening on a small subset of this code commons at any given time. For the sake of discussion, lets focus attention on just the active projects, and lets define “active” as having had a commit in the last year. 46,883 projects have had a commit in the past year – just 17.3% of the projects with a code analysis.

This analysis confirms the conventional wisdom that FOSS plants many seeds, but only a small percentage really take root and thrive over time.

How many of these active projects have a team of developers working on them? If we look at the all-time number of committers for these active projects we can see another important factor at work. Lets take the most generous definition of “community” we can imagine – at least two developers working on a project. How many active projects have a community? A little over half of the active projects have never had more than a single committer. 49.3% of active projects have had at least two committers over the lifetime of their code repositories. This is 8.5% of all analyzed projects, and just 4.2% of all the projects on Ohloh.

Thats right – just 4.2% of projects on Ohloh, or a little more than 23,000 projects are active, and have a community of at least two. Most of the “famous” projects we all know about are in this group – very few projects indeed are highly used but have such a mature code base that there are no commits at all to their repositories.

In my presentation at the Linux Collaboration Summit last week where I discussed these results, I proposed a new metric which I called “liveness” that captures this concept. Wouldn’t it be useful to have a score for projects that places them on a continuum in sensible relation to each other, that spreads out the values so that well-known very active projects do not all bunch up at the high end of the scale, and lesser-known but still active projects still have a meaningful spread? Such a score would let developers gauge the relative energy going into projects – a key factor in assessing whether to adopt a project, either as a contributor or user.

If projects with no activity in the past year, or fewer than two committers get a liveness score of zero, we can weight more recent months’ activity higher and less recent activity lower. Using an exponential time-weighted decay, and normalizing the score such that the most “live” project (the Linux Kernel of course!) is at 1000, we get scores for well-known projects that seem to pass a “sniff test” – they line up roughly with expectations.


The audience at my talk had some excellent feedback: first – “liveness” is a value-laden term – it implies that projects are either alive or dead and who wants to even look at dead projects? So they suggested I come up with a better name. Several developers commented also that basing an activity score on commits has some inherent problems: some projects make only a few large commits to their main trunk, with much of the development activity going on in branches. If all commits are counted equally, such projects will have a skewed score that is too low by comparison to projects that have many smaller commits. They suggested that perhaps some blend of LOC delta and commit count might yield a more useful metric.

What do you think? Would you want to see Ohloh report on such a metric? What would you call it? How would you like to see it presented? Any ideas on how to keep projects from artificially inflating their score or for Ohloh to filter out such spurious activity? Please comment!

About Rich Sands

I'm the Director of Developer Communities at Black Duck Software, and the product manager for Ohloh.
  • Marc Laporte

    Hi!

    How about calling it the “Ohloh project activity index”? (“Ohloh index” for short)

    And it could become a composite score which has other metrics factored in, such as number of users having this project in their stack. If a mature project is “very stable” and doesn’t have a lot of recent commits but deserves recognition, it presumably has a decent user base.

    I suggest this be added as a sort option for tags. So say I want to see the most lively VOIP projects, I should click on a button/link here: https://www.ohloh.net/tags/voip  (currently, it’s sorted by number of users)
    Keep up the great work!

    M ;-)

  • Marc Laporte

    Hi!

    How about calling it the “Ohloh project activity index”? (“Ohloh index” for short)

    And it could become a composite score which has other metrics factored in, such as number of users having this project in their stack. If a mature project is “very stable” and doesn’t have a lot of recent commits but deserves recognition, it presumably has a decent user base.

    I suggest this be added as a sort option for tags. So say I want to see the most lively VOIP projects, I should click on a button/link here: https://www.ohloh.net/tags/voip  (currently, it’s sorted by number of users)
    Keep up the great work!

    M ;-)

  • Nohbdy

    About this “2 or more people” for a team (and also to get a non-zero liveliness score)…

    How fair is this to, say, the next “Notch Persson”?

    One-man teams shouldn’t be lesser, it is the future after all.

  • Pingback: Open Source by numbers: Measuring activity of FLOSS projects — 48techblog

  • Pype

    I noted that your activity analysis only used CVS statistics on one of my sourceforge project, and failed to detect the switch to SVN. I bet other projects may similarly “suffer” of under-estimated activity.

    • richsands

      Hi Pype,

      Ohloh does not detect SCM switches like the one you describe. Which project is it? You can easily add new enlistments or remove old ones on your project’s enlistments page. Let us know if you need any help with that, and thanks.

      • Pype

         that’s dsgametools (http://www.ohloh.net/p?q=dsgametools) It was on ohloh before I got in, so I have no idea how it got added. Imho, it would make sense if you checked commits/activity on *all possible* SCMs, and not only on the one that was in use first time the project was listed on ohloh, as you can’t be sure that people will survey and update their own project here too, leading to misguiding statistics (and I understood you’re doing research out of those stats).

  • Pingback: A Beehive of Activity | Ohloh Meta