New Paper: “Eight Observations and 24 Research Questions About Open Source Projects: Illuminating New Realities”

I am excited about this paper because we point out ways in which open source is evolving. And let me tell you, open source is changing a lot. This is relevant for researchers, because it shapes the story we can tell and the kind of questions most interesting. In fact, we identify 24 research questions we find intriguing.

Paper Abstract:

The rapid acceleration of corporate engagement with open source projects is drawing out new ways for CSCW researchers to consider the dynamics of these projects. Research must now consider the complex ecosystems within which open source projects are situated, including issues of for-profit motivations, brokering foundations, and corporate collaboration. Localized project considerations cannot reveal broader workings of an open source ecosystem, yet much empirical work is constrained to a local context. In response, we present eight observations from our eight-year engaged field study about the changing nature of open source projects. We ground these observations through 24 research questions that serve as primers to spark research ideas in this new reality of open source projects. This paper contributes to CSCW in social and crowd computing by delivering a rich and fresh look at corporately-engaged open source projects with a call for renewed focus and research into newly emergent areas of interest.

Read more..
This paper is open access and available from the ACM Digital Library.

 Full reference:

Germonprez, M., Link, G. J.P., Lumbard, K., & Goggins, S. (2018). Eight Observations and 24 Research Questions About Open Source Projects: Illuminating New Realities. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), 57:1–57:22. https://doi.org/10.1145/3274326

How to measure the impact of your open source project

We published this article originally on Opensource.com.

This article was co-authored by Vinod Ahuja, Don Marti, Georg Link, Matt Germonprez, and Sean Goggins.

Conventional metrics of open source projects lack the power to predict their impact. The bad news is, there is no significant correlation between open source activity metrics and project impact. The good news? There are paths forward.

Let’s start with some questions: How do you measure the impact of your open source project? What value does your project provide to other projects? How is your project important within an open source ecosystem? Can you predict your project’s impact using open source metrics that you can follow day to day?

If these questions resonate, chances are you care about measuring the impact of your open source project. On Opensource.com, we have already learned about measuring the project’s health, the community manager’s performance, the tools available for measuring, and the right metrics to use—and we understand that not all metrics are to be trusted.

While all these factors are critical in building a comprehensive picture of open source project health, there is more to the story. Indeed, many metrics fail to provide the information we need in a timely fashion. We want to use predictive metrics on a daily basis—metrics that are correlated with, and that act as predictors of, the outcomes and impact metrics that we care about.

Most open source project metrics focus on project metadata, such as contributor and commit counts, without addressing whether the project impacts a broader open source ecosystem. Unfortunately, a project that has a great number of contributors and an active flow of contributions may not be, and might never be, relevant to other projects in an open source ecosystem. To better understand the impact of a project, it is important to consider the broader context of an open source ecosystem. This article introduces the V-index as a measure of impact (see Regression Analysis of Open Source Project Impact: Relationships with Activity and Rewards).

Who cares about project impact?

Sponsors of open source projects care about their impact. A foundation that’s hosting an open source project likely wants it to be widely used, for example, or an organization that’s paying developers to work on a project will want to ensure that their efforts are making a difference. Consequently, software developers or project managers may need to use metrics to make the case that the time and effort spent on an open source project is creating real value for their employer.

Open source project members also care about the impact of their project. High-impact projects can be a source of pride and motivation for developers. Within the open source ecosystem, it means that people are interested in new development and ready to report bugs. High impact means that projects need the code base to be maintained and vulnerabilities to be addressed, which is an incentive to support project members.

Open source project impact

An effective way to understand an open source project’s impact is through its software libraries. A software library certainly impacts the projects in which it is used, and popular libraries have also changed the way software is developed by providing functionality across a variety of software projects.

For example, the Bootstrap library revolutionized website interfaces and has become a de facto standard. But Bootstrap depends on another widely used library: jQuery. jQuery simplifies the use of JavaScript in website development. The impact of jQuery on Bootstrap, and on web development as a whole, cannot be overstated, and this impact is evident in the library dependency relationship between the two.

The jQuery/Bootstrap example demonstrates how software libraries can have an impact. Within the open source ecosystem, jQuery is an upstream project to Bootstrap, which itself is an upstream project to many websites and web frameworks, as shown below:

downstream depedency depiction for jQuery and Bootstrap

Figure 1: An open source project dependency within an open source ecosystem: The jQuery project is the upstream project to Bootstrap and many other projects, which themselves may be upstream to more projects. (Graphic by Kevin M. Lumbard, licensed CC-BY-SA-3.0. River delta background by Messer Woland, licensed CC-BY-SA-3.0. Logos are property of respective right owners.)

Measuring impact

Many metrics are being developed to measure the impact of an open source project. These include the number of users, downloads, installs, mentions in media (e.g., blogs, news, YouTube videos, and job postings), the availability of commercial offerings, and the number of add-on products. But such metrics isolate impact within that specific project and don’t fully demonstrate the impact of a software library within an open source ecosystem.

To measure the impact of an open source project within the open source ecosystem, let’s borrow a metric from academia: the h-index. This determines the impact of an author through the relationship of how many publications he or she has produced, and how many other authors have cited these publications. We propose, therefore, that a project’s impact in an open source ecosystem can be determined by downstream dependencies (i.e., how many downstream open source projects use them and how often those downstream projects are themselves used).

V-index

A downstream dependency exists when a software library is used within another piece of software. The V-index, which encapsulates our proposed measure of impact, is the maximum number of first-order downstream dependencies that themselves have at least an equal number of second-order downstream dependencies. The first-order dependency is the number of open source projects that use the library. The second-order downstream dependency is determined by how often a first-order dependent project is used within other open source projects.

The V-index is elaborated in three different scenarios:

Scenario A

Scenario B

Scenario C

First-order dependencies Second-order dependencies First-order dependencies Second-order dependencies First-order dependencies Second-order dependencies
Dependency 1 0 Dependency 1 4 Dependency 1 40
Dependency 2 0 Dependency 2 4
Dependency 3 0 Dependency 3 4
Dependency 4 0 Dependency 4 4

Project A has a V-index of 0.

The project has four projects that depend on it. No other project depends on these projects. The V-index of Project A is 0 because zero first-order dependencies have any second-order dependencies.

Project B has a V-index of 4.

The project has four projects that depend on it. Each of these projects has four projects that depend on them. The V-Index of Project B is four because each of the four first-order dependencies have at least four second-order dependencies.

Project C has a V-index of 1.

The project has one project that depends on it. This project has 40 projects that depend on it. The V-Index of Project C is 1 because it has one first-order dependency that has at least one second-order dependency.

Looking at a practical example, jQuery has a V-index of 98. It has 13,848 first-order dependencies, of which Bootstrap is one, with 5,005 second-order dependencies. Of the 13,848, only 98 first-order dependencies have 98 or more second-order dependencies, as shown below:

V-Index graphical depiction

Figure 2: V-index of jQuery: The x-axis represents the downstream open source projects (first-order downstream dependencies) sorted by the number of their own downstream dependencies. The y-axis represents the number of downstream dependencies of each first-order open source project on the x-axis (second-order downstream dependencies). The V-index is the number of first-order downstream dependencies that have at least the same number of second-order downstream dependencies. (Graphic by Kevin M. Lumbard, licensed CC-BY-SA-3.0. Logos are property of respective right owners.)

Increase impact with new metrics

How do you increase your open source project’s impact? Well, you need to convince other projects to use your project. Unfortunately, there is no single activity that will make this happen. However, there are steps you can take to make a project impactful, and there are ways to measure how well you do them. Let’s look at which of these measures are correlated with impact.

We summarize the findings below based on previous correlation analysis. The correlation analysis used a sample of metrics for three kinds of open source metrics:

  1. Activity metrics measure metadata such as contributor or commit counts. Project contributors can increase these metrics by doing more work on the project and getting more people involved.
  2. Reward metrics measure how well the project is meeting contributor’s expectations. They may improve with faster acceptance of contributions.
  3. Impact metrics measure the impact on users and other projects.

The V-index was developed to measure impact metrics. The correlation was tested for 604 projects that were started in 2014 or 2015, that used the Rust programming language, that were listed in GHTorrent and Libraries.io (the data sources), and that had at least one downstream dependency.

The findings show that none of the conventional open source activity metrics correlate with impact. This lack of predictive activity metrics means that we have no good predictors to manage our open source projects.

Does this mean all is lost? We think not. Several open source projects are building next-generation metrics that project sponsors, maintainers, and downstream users might be able to rely on in the future. Here are four paths to finding the predictive metrics we need to boost the impact of our open source projects:

1. Add software quality metrics

The first idea is to combine open source activity metrics with conventional software engineering metrics, such as code coverage. Conventional open source activity metrics focus heavily on the development dynamics within the project. The focus on activity metrics excludes software quality factors, which might be more important for people choosing a software library. Conventional open source activity metrics make it difficult to distinguish productive activity from unproductive activity. Combining a software engineering metric with an open source activity metric could make the latter more valuable.

2. Understand the user community

The second idea involves using natural language processing to determine the sentiment within an open source project, especially where users of the software participate. Conventional open source activity metrics rely only on metadata. Knowing the number of interactions does not help us understand the quality and substance of community. FOSS Heartbeat, while currently not maintained, offers a solution.

3. Market mechanisms

The third idea is to draw a connection between impact and the value of a software library. Existing valuation methods focus on the project itself (i.e., development costs) rather than the value others derive from it. A problem that open source faces is the absence of price signals that can inform the value users receive from a software library. To draw a connection between impact and value, we need new market mechanisms, like the ones proposed by Bugmark.

4. Shared understanding of metrics

The fourth idea is to build more knowledge in the open source ecosystem about how metrics can help us understand the impact and health of open source projects. The Linux Foundation initiated the CHAOSS (Community Health Analytics Open Source Software) project to bring open source projects and other stakeholders together to build a shared understanding of metrics and of the software tools to capture and analyze said metrics. This blog post is based on research conducted as part of the CHAOSS project.

Acknowledgments

This article is based on the whitepaper Regression Analysis of Open Source Project Impact: Relationships with Activity and Rewards by Vinod K. Ahuja. Graphics were prepared by Kevin M. Lumbard. This work is supported by Mozilla and the Alfred P. Sloan Foundation.

New Paper: “Open Data Standards for Open Source Software Risk Management Routines: An Examination of SPDX”

I presented our paper Open Data Standards for Open Source Software Risk Management Routines: An Examination of SPDX at the ACM GROUP conference in Florida. GROUP is a single-track conference with a great group of participants. I enjoyed the interactions and presentations. GROUP is definitely worth going again. Also, single-track conferences may be my new preferences, because I do not have to decide which of several interesting session to go to.

Paper Abstract:

As the organizational use of open source software (OSS) increases, it requires the adjustment of organizational routines to manage new OSS risk. These routines may be influenced by community-developed open data standards to explicate, analyze, and report OSS risks. Open data standards are co-created in open communities for unifying the exchange of information. The SPDX® specification is such an open data standard to explicate and share OSS risk information. The development and subsequent adoption of SPDX raises the questions of how organizations make sense of SPDX when improving their own risk management routines, and of how a community benefits from the experiential knowledge that is contributed back by organizational adopters. To explore these questions, we conducted a single case, multi-component field study, connecting with members of organizations that employed SPDX. The results of this study contribute to understanding the development and adoption of open data standards within open source environments.

Read more…
The paper is Open Access and is available in the ACM Digital Library.

Full reference:

Gandhi, R., Germonprez, M., & Link, G. J. P. (2018). Open data standards for open source software risk management routines: an examination of SPDX. In Proceedings of ACM GROUP ’18 (pp. 219–229). Sanibel Island, Florida, USA: ACM. https://doi.org/10.1145/3148330.3148333

New Paper: “Contemporary Issues of Open Data in Information Systems Research: Considerations and Recommendations”

We hosted a workshop in Dublin before ICIS 2016. The workshop was on open data in information systems research. I lead the write up of our workshop report and am proud to say that we published it in the Communications of the Association for Information Systems journal.

Paper Abstract:

Researchers, governments, and funding agencies are calling on research disciplines to embrace open data – data that is publicly accessible and usable beyond the original authors. The premise is that research efforts can draw and generate several benefits from open data, as such data might provide further insight, enabling the replication and extension of current knowledge in different contexts. These potential benefits, coupled with a global push towards open data policies, brings open data into the agenda of research disciplines – including Information Systems (IS). This paper responds to these developments as follows. We outline themes in the ongoing discussion around open data in the IS discipline. The themes fall into two clusters: (1) The motivation for open data includes themes of mandated sharing, benefits to the research process, extending the life of research data, and career impact; (2) The implementation of open data includes themes of governance, socio-technical system, standards, data quality, and ethical considerations. In this paper, we outline the findings from a pre-ICIS 2016 workshop on the topic of open data. The workshop discussion confirmed themes and identified issues that require attention in terms of the approaches that are currently utilized by IS researchers. The IS discipline offers a unique knowledge base, tools, and methods that can advance open data across disciplines. Based on our findings, we provide suggestions on how IS researchers can drive the open data conversation. Further, we provide advice for the adoption and establishment of procedures and guidelines for the archival, evaluation, and use of open data.

Full reference:

Link, G. J. P., Lumbard, K., Conboy, K., Feldman, M., Feller, J., George, J., … Willis, M. (2017). Contemporary Issues of Open Data in Information Systems Research: Considerations and Recommendations. Communications of the Association for Information Systems, 41(Article 25), 587–610. Retrieved from http://aisel.aisnet.org/cais/vol41/iss1/25/

Master Thesis Published in Journal

I am ecstatic about publishing my master thesis in a journal. I thank my co-authors who mentored me throughout the master thesis process and helped me achieve this goal. When I started the thesis project, I aimed for a conference publication and never dreamed that I would produce journal quality research on my fist attempt. The paper title is Anchored Discussion: Development of a Tool for Creativity in Online Collaboration.

Paper Abstract:

Open innovation and crowdsourcing rely on online collaboration tools to enable dispersed people to collaborate on creative ideas. Research shows that creativity in online groups is significantly influenced by the interaction between group members. In this paper, we demonstrate how theory can be effectively used to design and evaluate a tool for creative online collaboration. Specifically, we use the body of knowledge on creativity support systems to inform the development of a tool to support anchored discussions. Anchored discussions represent a new mode for creative interaction. In anchored discussion every comment is tied to some aspect of an idea. We evaluated the anchored discussion tool in a laboratory experiment, which generated insights for additional and refined research. Our results indicate that anchored discussion leads to a more structured discussion amongst group members and consequently to more creative outcomes. In a post session survey, participants made several suggestions on how to improve anchored discussion. This paper concludes that anchored discussion is promising as a new tool to aid online groups in creative collaboration. This paper extends a previous version presented at CRIWG 2015 [Link, 2015].

Read more…
The full paper is available open access from the J.UCS website.

Full reference:

Link, G. J.P., Siemon, D., de Vreede, G.-J., & Robra-Bissantz, S. (2016). Anchored Discussion: Development of a Tool for Creativity in Online Collaboration. Journal of Universal Computer Science, 22(10), 1339–1359. https://doi.org/10.3217/jucs-022-10-1339