We published this article originally on Opensource.com.
This article was co-authored by Vinod Ahuja, Don Marti, Georg Link, Matt Germonprez, and Sean Goggins.
Conventional metrics of open source projects lack the power to predict their impact. The bad news is, there is no significant correlation between open source activity metrics and project impact. The good news? There are paths forward.
Let’s start with some questions: How do you measure the impact of your open source project? What value does your project provide to other projects? How is your project important within an open source ecosystem? Can you predict your project’s impact using open source metrics that you can follow day to day?
If these questions resonate, chances are you care about measuring the impact of your open source project. On Opensource.com, we have already learned about measuring the project’s health, the community manager’s performance, the tools available for measuring, and the right metrics to use—and we understand that not all metrics are to be trusted.
While all these factors are critical in building a comprehensive picture of open source project health, there is more to the story. Indeed, many metrics fail to provide the information we need in a timely fashion. We want to use predictive metrics on a daily basis—metrics that are correlated with, and that act as predictors of, the outcomes and impact metrics that we care about.
Most open source project metrics focus on project metadata, such as contributor and commit counts, without addressing whether the project impacts a broader open source ecosystem. Unfortunately, a project that has a great number of contributors and an active flow of contributions may not be, and might never be, relevant to other projects in an open source ecosystem. To better understand the impact of a project, it is important to consider the broader context of an open source ecosystem. This article introduces the V-index as a measure of impact (see Regression Analysis of Open Source Project Impact: Relationships with Activity and Rewards).
Who cares about project impact?
Sponsors of open source projects care about their impact. A foundation that’s hosting an open source project likely wants it to be widely used, for example, or an organization that’s paying developers to work on a project will want to ensure that their efforts are making a difference. Consequently, software developers or project managers may need to use metrics to make the case that the time and effort spent on an open source project is creating real value for their employer.
Open source project members also care about the impact of their project. High-impact projects can be a source of pride and motivation for developers. Within the open source ecosystem, it means that people are interested in new development and ready to report bugs. High impact means that projects need the code base to be maintained and vulnerabilities to be addressed, which is an incentive to support project members.
Open source project impact
An effective way to understand an open source project’s impact is through its software libraries. A software library certainly impacts the projects in which it is used, and popular libraries have also changed the way software is developed by providing functionality across a variety of software projects.
The jQuery/Bootstrap example demonstrates how software libraries can have an impact. Within the open source ecosystem, jQuery is an upstream project to Bootstrap, which itself is an upstream project to many websites and web frameworks, as shown below:
Many metrics are being developed to measure the impact of an open source project. These include the number of users, downloads, installs, mentions in media (e.g., blogs, news, YouTube videos, and job postings), the availability of commercial offerings, and the number of add-on products. But such metrics isolate impact within that specific project and don’t fully demonstrate the impact of a software library within an open source ecosystem.
To measure the impact of an open source project within the open source ecosystem, let’s borrow a metric from academia: the h-index. This determines the impact of an author through the relationship of how many publications he or she has produced, and how many other authors have cited these publications. We propose, therefore, that a project’s impact in an open source ecosystem can be determined by downstream dependencies (i.e., how many downstream open source projects use them and how often those downstream projects are themselves used).
A downstream dependency exists when a software library is used within another piece of software. The V-index, which encapsulates our proposed measure of impact, is the maximum number of first-order downstream dependencies that themselves have at least an equal number of second-order downstream dependencies. The first-order dependency is the number of open source projects that use the library. The second-order downstream dependency is determined by how often a first-order dependent project is used within other open source projects.
The V-index is elaborated in three different scenarios:
First-order dependencies Second-order dependencies First-order dependencies Second-order dependencies First-order dependencies Second-order dependencies Dependency 1 0 Dependency 1 4 Dependency 1 40 Dependency 2 0 Dependency 2 4 Dependency 3 0 Dependency 3 4 Dependency 4 0 Dependency 4 4
Project A has a V-index of 0.
The project has four projects that depend on it. No other project depends on these projects. The V-index of Project A is 0 because zero first-order dependencies have any second-order dependencies.
Project B has a V-index of 4.
The project has four projects that depend on it. Each of these projects has four projects that depend on them. The V-Index of Project B is four because each of the four first-order dependencies have at least four second-order dependencies.
Project C has a V-index of 1.
The project has one project that depends on it. This project has 40 projects that depend on it. The V-Index of Project C is 1 because it has one first-order dependency that has at least one second-order dependency.
Looking at a practical example, jQuery has a V-index of 98. It has 13,848 first-order dependencies, of which Bootstrap is one, with 5,005 second-order dependencies. Of the 13,848, only 98 first-order dependencies have 98 or more second-order dependencies, as shown below:
Increase impact with new metrics
How do you increase your open source project’s impact? Well, you need to convince other projects to use your project. Unfortunately, there is no single activity that will make this happen. However, there are steps you can take to make a project impactful, and there are ways to measure how well you do them. Let’s look at which of these measures are correlated with impact.
We summarize the findings below based on previous correlation analysis. The correlation analysis used a sample of metrics for three kinds of open source metrics:
- Activity metrics measure metadata such as contributor or commit counts. Project contributors can increase these metrics by doing more work on the project and getting more people involved.
- Reward metrics measure how well the project is meeting contributor’s expectations. They may improve with faster acceptance of contributions.
- Impact metrics measure the impact on users and other projects.
The V-index was developed to measure impact metrics. The correlation was tested for 604 projects that were started in 2014 or 2015, that used the Rust programming language, that were listed in GHTorrent and Libraries.io (the data sources), and that had at least one downstream dependency.
The findings show that none of the conventional open source activity metrics correlate with impact. This lack of predictive activity metrics means that we have no good predictors to manage our open source projects.
Does this mean all is lost? We think not. Several open source projects are building next-generation metrics that project sponsors, maintainers, and downstream users might be able to rely on in the future. Here are four paths to finding the predictive metrics we need to boost the impact of our open source projects:
1. Add software quality metrics
The first idea is to combine open source activity metrics with conventional software engineering metrics, such as code coverage. Conventional open source activity metrics focus heavily on the development dynamics within the project. The focus on activity metrics excludes software quality factors, which might be more important for people choosing a software library. Conventional open source activity metrics make it difficult to distinguish productive activity from unproductive activity. Combining a software engineering metric with an open source activity metric could make the latter more valuable.
2. Understand the user community
The second idea involves using natural language processing to determine the sentiment within an open source project, especially where users of the software participate. Conventional open source activity metrics rely only on metadata. Knowing the number of interactions does not help us understand the quality and substance of community. FOSS Heartbeat, while currently not maintained, offers a solution.
3. Market mechanisms
The third idea is to draw a connection between impact and the value of a software library. Existing valuation methods focus on the project itself (i.e., development costs) rather than the value others derive from it. A problem that open source faces is the absence of price signals that can inform the value users receive from a software library. To draw a connection between impact and value, we need new market mechanisms, like the ones proposed by Bugmark.
4. Shared understanding of metrics
The fourth idea is to build more knowledge in the open source ecosystem about how metrics can help us understand the impact and health of open source projects. The Linux Foundation initiated the CHAOSS (Community Health Analytics Open Source Software) project to bring open source projects and other stakeholders together to build a shared understanding of metrics and of the software tools to capture and analyze said metrics. This blog post is based on research conducted as part of the CHAOSS project.
This article is based on the whitepaper Regression Analysis of Open Source Project Impact: Relationships with Activity and Rewards by Vinod K. Ahuja. Graphics were prepared by Kevin M. Lumbard. This work is supported by Mozilla and the Alfred P. Sloan Foundation.