Software and free and open source software in open science

From NI4OS wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Software and free and open-source software in open science

Software is one of the pillars of open science, along with publications and data. Software is ubiquitous nowadays – it is used as a general scientific tool that facilitates research and supports and integrates various elements of open science (OS). Sometimes software is an integral part of a particular scientific investigation, essential for its conduct, reproducibility and follow-up research. It may even be a primary outcome of the research.

Software is used to process data. Research data can be better understood with software that provides an analytical or comparative view of it. Software also facilitates access not only to research data but also to data about research. It runs the open science infrastructure, including repositories, archives, catalogues, databases and platforms for collaboration. The software-based services and infrastructure of OS are so important that it is safe to say that OS would not exist today without software, and, for a large part of that claim, without free and open source software (FOSS).

Despite many differences in their topics, methods and culture, open science and open software work harmoniously towards similar goals. Therefore, contemporary researchers must be familiar with the specifics of science-related software, its use in open science and especially the use of FOSS.

FOSS solutions and platforms make up a large part of the generic and domain-specific solutions in NI4OS-Europe. Subsequently, end-users and service and resource providers are questioning the best approach to FOSS in their work and the relationship between OS and FOSS. The issues they are most concerned about are the proper use of FOSS, licensing, referencing, preservation and sustainability. These issues are relevant to research software used or developed in research communities as well as to the original NI4OS-Europe developments. This report aims to provide clear, basic guidance on the use of FOSS software in research, OS and related services, currently and in the near future, within the NI4OS-Europe community and beyond.

Use of software in science

Software in science is typically split into research software and software in research.

Research software includes source code, algorithms, scripts, computational workflows and executables created during the research process or with research as its purpose. It is typically domain-specific. Research also directly employs computational and data processing workflows, which may combine several pieces of software and other resources, and self-contained executable notebooks that integrate data, analytical methods, procedures, code, results, visualizations and narrative. Research software can be treated as an OS resource that is managed together with other parts of research but it still needs to be distinguished as a separate entity of a special type. Furthermore, all its elements may be aggregated in a standalone OS research product, especially if it is to be used by several research projects.

Software in research includes software that is used before, during or after research and that was specifically created to assist, track, share or manage research or to facilitate participation in OS. Such software is often provided as a service or locally used tool that interacts with external services. The provided services may combine several software components or involve human contribution or intervention. Sometimes, even a general-purpose software or service can be adopted by a community within a discipline and thus become a tool used in research.

A distinction between research software and software in research is often difficult to make, as some services are exclusively used in specific scientific domains and communities. Also, specialised research software may be integrated with more general services and resources.

Finally, both software in research and research software is just a thin layer on top of a huge stack of other software and platforms that research depends upon. Although the top layer fully depends on it, this other software it is not considered as software in research or research software. Still, the licences and rules imposed on that software may be applicable, given the nature of the dependence.

Publications are typically issued only once and are used as such for a long time, as they may be updated only with minor corrections. Even though publications may be overridden, relativized or partially replaced by subsequent research and publications, they remain usable and used in their original form. Research-related products other than software are typically linked to individual publications. Datasets can be reused in many research efforts but, apart from format adaptations, are substantially unchanged, even if combined and used together with other datasets. On the other hand, software is constantly modified – even small changes in it may have a huge on its functioning. While the lifetime of one version of the software is generally shorter than that of data [], it is quite opposite if its lifecycle and patterns of use are considered across several versions.

Research software often has a life cycle that is independent of the research in question, rules for managing scientific artifacts and involves people who are not directly involved in the research in question. Therefore, software usually needs to be handled differently than publications or data. However, these particularities are not yet consistently taken into account, as software is perceived and managed differently from research. As researchers, groups and organisations use software to support their activities and needs and have distinctive priorities and capabilities from service developers or supporters, they also handle various aspects of software management differently. The primary aspect of all software is human, while machine interpretation and execution are only parts of its life cycle, which is primarily characterised by human reading, analysis and occasional modification. Changing and maintaining software is often very difficult and time-consuming. The code can be seen as a capture of valuable knowledge in which the target discipline, logic, processing steps and technical implementation are integrated. There is also a sophisticated and evolving developer community with its conventions and culture, where expectations, conventions, standards and preferences change over time. Coding practices and practices related to documentation, testing and quality assurance vary between scientific disciplines and researchers and often differ from those of professional software engineers, but even they are not uniform in their approach. Licences and copyright apply and must be respected, even if the software is FOSS.

The most successful software products are often used for many years and often go through dozens or even hundreds of changes and versions, especially if an agile development process and automated deployment are used. Software is managed over years by many people and through steps and processes that are not part of the scientific process; the organisations that manage, maintain and modify it may also change. During this time, the earlier versions can become completely unusable or even dangerous. The history of software changes is critical to code understanding and maintenance. As a piece of code evolves, the causes, reasons and intentions of a change, together with all affected pieces of code, are often key to its interpretation and the basis for further modification. The amount of source code in successful software projects can increase dramatically and is often measured in millions of lines. At the same time, the code is fragile due to the many dependencies and mistakes in understanding and implementing interactions between components that are easy to make but extremely difficult to detect. A small error in a single line of code can invalidate or compromise a large system and affect many related software components. In addition to the code itself, software metadata, configuration and customisations, licensing information and corresponding service policies need to be defined, prepared, published and tracked. Therefore, the structure of software is much more complex than that of a typical dataset, as it contains many dependencies and other relationships, most of which are not directly related to the subject of research.

Software used in open science or by research infrastructures and services is not necessarily free and open source. All types of software (private, FOSS and proprietary) are usually kept and managed in software repositories that are conceptually and functionally quite different from the repositories used in open science. The scientific community expects research software artifacts to be properly archived, referenced, identified, described, cited and credited, but also discoverable, visible, accessible and reused as needed. Software quality, reproducibility and traceability need to be managed differently from other, less changeable research-related products. Software also needs a policy framework for dissemination, reuse, evaluation and recognition that includes funding and software-related incentives, a sustainability framework with organisational arrangements, legal instruments and economic models. Furthermore, both the scientific and software communities would benefit from a strategic framework that would combine different approaches and methods for consolidation across scientific disciplines and technical communities, which should include harmonisation, technology transfer and collaboration with industry.

FOSS and open science

Open source software is a key element of many tools and services from OS. These services support OS and contribute to its digital ecosystem by enabling, supporting or streamlining the exchange and use of research information and shared data. Open science and open software share the same ethos and drive for openness of knowledge and information sharing, partake in opportunities and support each other, but are essentially independent and have a different practical focus. In particular, open source and permissive FOSS licences are often used in open science but are sometimes not perceived as essential to open science. Although open source software is popular because of its potential to be more easily applied and maintained in the scientific community, the actual settings and tools needed may differ and therefore include proprietary solutions. About half of the tools used in science are FOSS or at least free to use [SOSP-FR report], and even in the social sciences, there are a similar number of paid and free tools [1], which would be rather presumed for the natural and technical sciences.

Both FOSS and proprietary software used in science often use FOSS components. The concern about access to and licensing of software is initially related to guidance on how research software should be deposited [2], cited and referenced [3]. Software Heritage, also known as SWH and HAL [4], is one of the related supporting initiatives and services that aim to address software in open science. Nevertheless, most researchers rely on GitHub, while some use Zenodo as a comprehensive depositing service. Depositing practices are associated with the specification of software location and providing of licences and other metadata and citation that make research software findable and accessible.

Poor citation practices contribute to inadequate visibility and accessibility of research software. The used software tools and packages are not mentioned or insufficiently identified in academic papers, even when their names are unique enough. Often, researchers mention software they use in the methodology section or footnotes. This complicates finding the tool-related research and does not provide direct credit. But authors are often shy to go into such “technical details”. Even when they mention used software, they may be asked by reviewers to remove that part. Creators of some tools ask for a specific paper to be cited, which facilitates tool finding by prospect users and warrants credit to developers. However, this is not enough. Software references are still not standardised and refer to many kinds of sources, predominantly via URLs, which is a practice that is not persisting and interoperable in the long run. FORCE11 Software Citation Working Group defined the basic software citation principles []

  • Importance – Software should be considered a legitimate and citable product of research.
  • Credit and attribution – Software citations should facilitate giving scholarly credit and legal attribution to all contributors to the software in a suitable way.
  • Unique identification – A citation should include identification that is machine actionable, globally unique, and interoperable and is recognized by at least the discipline community.
  • Persistence – Identifiers and metadata should persist beyond the lifespan of the software.
  • Accessibility – Citations should facilitate access to the software and associated metadata, documentation and other materials necessary for informed use.
  • Specificity – Citations should facilitate identification of, and access to, the used version of the software.

Starting from these principles, additional guidelines on how to cite software were developed [; Recognizing the value of software: a software citation guide].

It should be noted that the background, attitudes and motivations and goals of researchers and professional software developers differ. As both groups develop and operate research-related software, these differences should be accounted for.

Even the most popular licences significantly differ, although sometimes the licences that are popular in the OS world are applied for software. However, the most often used software licences are designed to support the goals of the open-source movement and were developed independently and with different objectives than those that are typically used in OS.

Typically, there is a lack of time, effort and knowledge that are necessary to address FAIR concerning software, and even IPR and licensing in general by software developers. A joint RDA/FORCE11/ReSA working group on FAIR for Research Software (FAIR4RS), established in 2020 reviewed and redefined FAIR guiding principles for software and related computational code-based research products and published its adaptation of the general FAIR principles for research software in March 2022. Although the minimal software metadata has been discussed, defined and collected by often-used registries for a long time, the first comprehensive guide for describing and cataloguing software materials was developed in 2020 and published in February 2022 by Software Preservation Network’s Metadata Working Group []

The association between FOSS and OS extends to broader guidelines on FOSS software and licence usage and governance, but also elaboration and establishment of strategic orientation in the management of tools and services. This relationship also includes shared governance: ensuring continual investment in software development, investment pay-off, control over software evolution, and its long-term usability, maintenance and sustainability. Therefore, the management of software as scientific assets is becoming a critical part of OS governance that should be fused with related practices coming from software engineering, where IPR governance, FOSS licensing and licence compatibility is very important and current subject.

Many existing links between software and various aspects of research and existing explain why software, access to source code provided by FOSS, need to access other software for reproducibility illustrate why the strengthening of the productive relationship between research and software communities is so important. As many international organisations and collaborations call for closer use of FOSS in research and supporting infrastructure, much more needs to be done in terms of recognition of software as fundamentally different from research data, and establishment of associated conventions, infrastructure, rules, evaluation and support. At the same time, the members of the software community who are participating in OS development, need to firmly adhere to its requirements and practices, which they, due to the desire to circumvent obstacles and a sense of entitlement, may try to evade.

Software in OS practices (SOSP-FR report)

The report "State of open science practices in France" [SOSP-FR, in French: Pratiques et usages des outils numériques dans les communautés scientifiques en France] from 2022 describes the practices and use of digital tools in scientific communities in France. The study is based on a survey of 1089 researchers in various fields.

About two-thirds of the respondents use free and open source software (FOSS), and the same proportion uses paid software. Other types of software are used far less. These include partially or completely free proprietary software (21%) and software created for or resulting from research (17%). In particular, respondents who are 35 years old or younger are more inclined to FOSS. Physics, mathematics and computer science in particular are decidedly inclined towards FOSS, with this inclination amounting to 80% for the latter two fields. Literature, social sciences, humanities and life sciences use FOSS and paid software in equal proportions, while chemistry, engineering and medicine lean more towards paid software (at about 55%).

General authoring tools (such as MS Excel and Word) are used most often. This is followed by more technical languages and platforms commonly used in data science and analysis: R, Python and MATLAB. The reason for the high frequency of general tools (word processing, spreadsheets, visualisation and presentation) is that they are used by all disciplines. Like the general public, researchers most often use common office applications. The versatility and neutrality of the analytical platforms are also the reasons for their high ranking. The two most widely used tools for data analysis, Excel and R, symbolise the two main directions of digital data analysis. Excel is a long-established and popular general-purpose paid software, while R is an open-source programming language and environment for statistical computing and graphics. However, R is newer, less commonly used and its audience is still limited to (often younger) researchers.

The only well-ranking discipline-specific tool is a FOSS geographic information system QGIS. It is followed, with almost the same usage as R, researcher-developed software, in-house software of the organisation or laboratory, and LibreOffice as another office suite. This strongly indicates that researchers and organisations do not want to develop software unless they are forced to do so. It is also interesting to note that all locally developed software is mentioned much less frequently than R, Python or MATLAB, implying that their usage is not considered programming.

  • MS Excel: 219
  • MS Word: 143
  • R: 112
  • Python: 105
  • Matlab: 80
  • Qgis: 49
  • Software designed by user: 35
  • Libre Office: 33
  • Internal software of the organisation: 33
  • ImageJ: 32
  • LateX: 27
  • FileMaker: 27
  • Origin: 26
  • Photoshop: 26
  • Lime survey: 26
  • Powerpoint: 23
  • Illustrator: 23
  • SPSS: 22
  • Arcgis: 19
  • Access: 18
  • C / C++: 18
  • SAS: 16
  • Zotero: 15
  • Labview: 15
  • NVivo: 14
  • Sphinx: 14
  • Stata: 14
  • Prism: 13
  • RStudio: 13
  • Oxygen: 12
  • GraphPad: 11
  • Mathematica: 10
  • Iramuteq: 10
  • Chemdraw: 10
  • Gimp: 10
  • Inkscape: 10

Next are several commonly needed task-specific tools that can be used across communities. They are used for image processing, database management, publication preparation, surveys, statistical analysis, data acquisition and instrument automation and control. The most commonly mentioned open source tools are ImageJ, LaTeX, Lime Survey, Zotero, Sphynx, RStudio, Iramuteq (in French), GIMP and Inkscape. Proprietary tools include FileMaker, Origin, Photoshop, PowerPoint, Illustrator, SPSS, ArcGIS, Access, SAS, LabVIEW, NVivo, Stata, Prism & GraphPad, OXYGEN (from DEWETRON), Mathematica and ChemDraw. Among them are also C and C++, which are far from their former glory. Most of these tools and platforms are decades old, so researchers stick with what they know and are comfortable with. Some of these tools may be part of the shared culture and are hard to replace. Zotero, used for managing bibliographic data and related materials, is first software mentioned that is directly related to OS.

Almost no software tied to a specific scientific field is mentioned in the joint list. Such tools are too diverse and dispersed across communities. Nevertheless, some community-related patterns can be identified for the tools listed above:

  • The physical sciences, mathematics, computer science and engineering tend to use the tools that they can adapt to their needs (MATLAB, Python, LaTeX, software developed by researchers)
  • The humanities, arts and social sciences make extensive use of software from two major vendors, Microsoft and Adobe; they also use QGIS and SPSS. The social sciences also use R.
  • Biology and chemistry often use software specialised in image processing and graphical presentation. The life sciences also use R.
  • In medicine, there are few specific tools. Graphical and statistical tools are most commonly used (ImageJ, SAS and GraphPad products such as Prism).

The authors of the study also emphasise in their analysis that:

  • Python users also frequently use Linux (49%, compared to 22.5% in the surveyed population) and prefer open and free software (91.2%, compared to 69.5% of all respondents). They are more often men (68.2%, compared to 51.9% of men among all respondents).
  • Programmatic solutions and collaborative environments and tools are mainly used by younger researchers who work in small groups (2 to 5 members) and usually also use Linux.
  • The use of FOSS software is related to its free availability, especially for young researchers and those who do not have sufficient financial resources (SOSP-FR highlights this factor for the humanities, arts and social sciences, but it is generally applicable). This could be a stronger driver for FOSS than use in open science.
  • Not participating in the use of digital tools that are popular in the wider community ties researchers to proprietary environments. Linux is popular, but software vendors should not disregard other systems.
  • Using open scientific tools and information or open source software does not mean consciously participating in either movement. Users are not necessarily aware that the practices or rules they follow belong to open science. The use of FOSS tools is both opportunistic and due to accepting the open source philosophy, but it is unclear which of these two factors is more important.

Based on the differences between age groups, the authors of the report hypothesise that the dissemination and use of research tools driven by teaching influence later digital research practices. Software such as GitLab, programming languages and free software such as R are present in university education because teachers tend to base training on freely available software due to the limited availability of licences in the educational context. Thus, open source tools are becoming default research environments for future young researchers. Such tools, as well as collaborative environments and executable notebooks (also known as computable documents; as provided by Jupyter Notebook/JupyterLab), may therefore soon have an even greater impact on research results publication and communication practices. These tools are also made available to scientific communities through research infrastructures and online services. It would therefore be beneficial to better map and quantify accessibility and training needs related to research infrastructures for those who are not involved in large research collaborations or are at the beginning of their careers. The use of software tools and especially open source software and research infrastructures should be assessed independently of open science and related digital practices.

However, based on our less formal December 2022 survey, the use of FOSS in academic training is considered less important by researchers and developers than factors such as availability without the need to pay, the natural link of open source to open science, and the principles and development model of FOSS. The usage FOSS in academic training is the second group of motivators, which also include the avoidance of bureaucratic obstacles related to procurement, use by the community and maintainability.

Searching for research software

A good list of research software registries, organised by scientific field and other criteria, can be found at []. It is covering:

  • Astrophysics
  • Computational Fluid Dynamics
  • Grid Computing middleware
  • Earth Sciences
  • Humanities
  • Life Sciences / Biology / Medical
  • Mathematics
  • Machine Learning
  • Nano Technology
  • Social and Ecological Sciences
  • Generic tools
  • Registries by country
  • Registries by organization
  • Registries by programming language

When it comes to general tools for scholarly publishing, an analysis and catalogue of open source publishing tools and platforms can be found at []. A white paper produced by the OPERAS Special Interest Group on Tools Research and Development for Scholarly Communication is available at [].

Tools for social sciences

This section describes the open source tools for the social sciences, as this is an area where FOSS software is less commonly used.

The Directory for Social Sciences summary [] and associated white paper [] describe and list many tools for social science research and the trends associated with them. For example, organisations are joining consortia to support the development and sustainable management of these tools. The number of research tools available has grown rapidly since 2004, from around 50 to more than 400 at the time of writing in 2019, likely due to researchers adopting digital tools and software development skills and advances in tools' usability and accessibility. The number of paid and free tools is similar, but the number of free tools is growing slightly faster. This could be due to the adoption of open source and a greater number of individuals developing their tools. This directory currently contains about 600 software packages/tools at [].

Surveying and sourcing participants

There are also many free and paid online platforms.

Annotating, labelling and coding text

Open-source tools for text annotation include

Social media research

Some free and paid tools can be found at [5].

Most of these tools work with Twitter. Facebook and Instagram have more active users, but Twitter had offered (until 9 February 2023) an API that made its data much more accessible than other platforms. Moreover, LinkedIn and Facebook even prohibit the use of their API for research purposes. Nevertheless, there are research tools that provide access to the content of various social media platforms. Facebook's reputational issues have led the company to launch the non-profit partnership Social Science One and provide selected researchers with access to their data through grants. Similarly, LinkedIn has launched the Economic Graph Research Program.

More than half of the social media tools are either free (in the form of applications or as freely available packages on GitHub, usually open source). These tools provide analytics, data collection, monitoring, network visualisation, platform management, sentiment analysis, text analysis and visualisation. Only a few tools have limited free functionality.

Free tools that can access multiple social media platforms:

  • NodeXL, Social Media Research Foundation
  • SMaPP Toolkit, New York University
  • Vader, MIT
  • Social Feed Manager, George Washington University Libraries
  • Webometric Analyst, University of Wolverhampton
  • Just Twitter

A very popular commercial tool is NVivo. Some Twitter-specific tools are:

  • academictwitteR, The University of Edinburgh
  • DocNow, Shift Design and University of Maryland, University of Virginia
  • rtweet, University of Missouri

Recommendations on research software and engineering in open science

  • Software must be recognised as a first-class citizen of the research ecosystem and appropriate software-related research practices need to be put in place.
  • Since the primary motivations for accepting OS and using FOSS are different, it is best to promote them by emphasising both the practical benefits and the deeper motivations. For some researchers, one may work better than the other, while for those influenced by both, the synergistic effect can be multiplicative.
  • To encourage the adoption of FOSS, start with task-oriented OS tools for which habits are not very strong but are available for different platforms, and wait to see if their growing adoption will be followed by the use of more general tools such as LibreOffice, which currently challenge established proprietary packages.
  • Initiatives aimed at women in science should also popularise FOSS. Women are less likely to use it, so the payoff can be higher.
  • Promote the use of Linux, as its regular use is closely related to the orientation towards open software and the use of the collaborative tools associated with it.
  • As a dynamic entity, software needs to be appropriately cited and identified in references in a way that links research and software more practically and reliably, including the used and actual software versions in dedicated software repositories.
  • Establish and promote a dedicated infrastructure for research software, as developers need to use different tools and services together with the services used in OS.
  • Standardise and automate the interaction between repositories, cross-referencing and checks, and the exchange of metadata and provenance information.
  • Harmonise and update existing tools, and develop new tools to address emerging issues.
  • Modern software engineering and runtime management put data, software, configuration and other elements of the execution context into containers, they use technologies for continuous deployment and high availability. This requires additional technical skills in both software engineering and infrastructure management, and a more effective combination of skills of researchers, software developers and other IT professionals.
  • Researchers need to be familiar with the governance practices of the software industry; software engineers need to be trained in the requirements and practices of OS; both groups need to be trained in the upcoming norms for software in OS.

FOSS licences

This and the next section are based on work carried out by the author of this report as part of the GN4 Phase 3 project funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 856726 (GN4-3).

The issues of software licences, their compatibility and selection are complex and therefore often neglected even by professional software developers, let alone those who develop software to support their actual work, such as research.

  • Our work is increasingly dependent on Free and Open Source Software (FOSS), which always comes with a specific FOSS licence.
  • FOSS licences help keep FOSS software alive.
  • Licence compliance is important for legal reasons and to ensure better cooperation.
  • Licensing considerations are of much greater importance in software distribution, as FOSS licences are subject to certain conditions.

Trust in research is based on the peer review system, which requires the ability to reproduce the experiment or analysis described. As the software is often a fundamental part of this work, its validity cannot be confirmed without the ability to access and use it. Furthermore, since part of the analysis process is codified in the software, the trustworthiness of the analysis and conclusions may depend on the ability to examine and reuse not only the input data but also the software used and its internal logic. This does not mean that the source code has to be reviewed by the reviewers – it is usually sufficient to make it available for an independent audit, which is a side effect of shared software development, use and maintenance for commonly used software tools with a large community of users and maintainers.

As pointed out in [], the lack of access to the underlying software makes it much more difficult to develop new results based on existing research, but the practices of open source software and the increasing acceptance of it have enabled the general open science agenda. The principles of open science affect the life cycle of research, the way science is conducted and its results – including software – are published, evaluated, discovered and monitored.

The frequency of use of FOSS licences is tracked in the Black Duck Top List [].

Rank Licence Usage Licensing Risk
1 MIT 32% Low
2 GPL 2.0 18% High
3 Apache 2.0 14% Low
4 GPL 3.0 7% High
5 BSD 2 6% Low
6 ISC 5% Low
7 Artistic (Perl) 4% Medium
8 LGPL 2.1 4% High
9 LGPL 3.0 2% High
10 Eclipse (EPL) 1% Medium

Advantages of FOSS

The market is supposed to equalise the total costs of equivalent software products. Nevertheless, proprietary software vendors often claim that after a while their products are easier and cheaper to use than FOSS. Some organisations do not want to invest in the expertise and costs they often associate with using and maintaining FOSS, even if the effort required is negligible. They often want to have an external organisation they can hold accountable for software risks and problems. However, experience has shown that it is very difficult to hold a commercial software provider liable unless the software has been customised for a single client and very strict contractual commitments have been made.

On the other hand, the features of FOSS make the software very attractive to many users and organisations. In addition, many commercial developers of open source software often offer additional services as part of their business model. There are even third-party companies that offer support for open source software.

The open source model is very different from that of traditional licensed proprietary and commercial software because it allows both distribution and modification. The first restriction also does not apply to freeware (free) and shareware (free trial), at least for the binary form, but the ability to modify the source code is crucial and has had a huge impact, bringing several significant improvements over traditional proprietary licensing models.

Availability and lower costs – For sophisticated or specialised requirements such as those encountered in research, new technologies and heavy workloads, the market is not large enough for a suitable commercial product to be available and last. The immediate and free availability of open source is also associated with the need to develop in-house expertise, which organisations and individuals often want to do anyway or to hire external support and maintenance. Many users – especially advanced ones – want to customise the software and are often happy to debug and maintain it. In research and education, the obligation to pay for software, the overheads associated with obtaining and renewing commercial licences, and the obligations involved are barriers that students, early career researchers or small research groups cannot overcome. Access to source code can also shorten their learning curve and enable them to use the technology more effectively. In addition, the cost of highly specialised or tailor-made software is often unacceptable even for large organisations, especially when a working solution can be achieved by combining and customising even several freely available building blocks. These factors have led to the widespread adoption of open source in more advanced and specialised communities, the emergence of entirely new usage scenarios, and the introduction of large toolsets or architectures with many components based on microservices, alongside the commercialisation and componentisation of resources and runtime environments.

Innovation and flexibility – The easy access to source code allows developers to add the features they need and bring in new ideas, algorithms and usage scenarios. They are not constrained by the organisational, commercial or strategic limitations of proprietary software vendors, where users would have to wait for the vendor to decide to provide and implement what they need. The more useful a new feature or solution is, the more developers and companies will participate and contribute. The more programmers contribute, the better, more useful and more valuable the result will be. This has been shown to work even when contributors are not required to share their work. Moreover, people who inherently need a certain improvement are more likely to provide a creative and innovative solution than those who are just paid to produce something.

Security and reliability through transparency – When source code is available, many people can review and debug it. It is more likely that security and functional flaws will be uncovered and suboptimal solutions or vulnerabilities will be fixed. Contributors are aware that other experts will look at and review their code, and are therefore more likely to hold to higher quality standards. Any new reviewer can see what has been done before and what went wrong. If they try to fix the problem, they may have a greater incentive to find a better solution than someone who has had to adapt to strict deadlines and other constraints or priorities. Also, someone who is in the right context is often in a better position to identify and fix a problem or bug than the original author because of their prior knowledge, experience or previous involvement with a similar problem. In many open source projects, some special reviewers review changes before they are incorporated into the main codebase. They are not testers, but experts who care about the quality of the code and can work at their own pace. All these factors greatly improve the security and reliability of software, and that is the main reason why the vast majority of services on the Internet today run on Linux. In open source, when a security vulnerability is discovered in an active open source project, interested community members usually fan out and patch it quickly. However, only those who keep up to date with the latest recommended versions, including the libraries they depend on, benefit from this dynamic activity.

Longevity – Commercially licensed software may be discontinued by its vendor who no longer supports it. The vendor may go out of business without selling its software to another company. In such situations, there is no way to update the discontinued software, fix bugs or adapt it to new applications or platforms. Support, patches and other related services are no longer available. Therefore, its usability can deteriorate quite quickly and the user has to decide when to invest in and switch to new software rather than live with the growing problems. In contrast, open source software can evolve continuously because anyone can access and contribute to the source code. Even after it has not been used for a while, anyone can revive, adapt, repair or repurpose it. Many useful and widely used FOSS have large, active and stable user communities whose members include individuals and research groups, but also large and small companies.

Types of FOSS licences

Open source licences allow for software to be freely used, modified and redistributed []. Although many “free software” and “open source software” licences are recognised by the Free Software Foundation (FSF) and approved by the Open Source Initiative (OSI), there are few licences that are popular, widely used or have strong communities. However, a single software project may contain several components, which in turn include other components, so there may be many software licences. To be recognised as open by OSI, software licences must meet some criteria []. But even open licences differ in terms of the rights and restrictions they contain. These often relate to derivative works, such as the modification of the original code or its use by the new code. Based on the scope of the code to which they apply, there are two main groups of FOSS licences – permissive and copyleft licences, which differ in whether modifications or the code using the licensed code must be published under the same licence or a similar licence may be used. The applicability of a licence may be limited to modifications of existing files, additions or even any use of a software library. However, all FOSS licences require to some extent the disclosure of existing and new code to external users or when the software is distributed, but not for private use. Most licences require that the licence text and copyright notice accompany the licensed material. Some also require documentation of changes made to the licensed material. It should also be noted that some licences also consider access via the network as use, which includes the right to obtain the source code.

The open source rules are designed so that those who receive copies of the software must themselves be able to redistribute the original and create derivative works from it while allowing others to do the same. Some licences prevent open source code from “getting closed” and require that users and contributors to the code accept the values of open source by redistributing their modifications or additions (derivative works) on the same terms as the original. This means that those who receive copies of these works must be able to redistribute the original and make derivatives under the same conditions.

Unlike the Creative Commons CC-ND and CC-NC licences, an open source software licence must allow modifications or commercial uses to be considered truly open. If the licence prohibits the use of the licensed material and derivatives for commercial or (for example) military purposes, it is not considered a free software licence because it restricts who can use a program or for what. The OSI does not allow discrimination against any person, group, or line of work in the use of the software, so it can be used for any purpose, including any business.

Portions of new software may modify or extend fragments of existing software, which is similar to the creation of “derivatives” in CC licences. To preserve the integrity of the original work and ensure its maintenance, open source licences often require that derivative works be distributed under the same conditions under which the licensee was permitted access to the original work, such as the source code used (incorporated, copied, or modified) and the use of the resulting components, such as software libraries. While the CC-ND licences allow sharing and reuse of content on condition that it remains unchanged, software may be used in many ways without requiring modification or even actual incorporation of the software used. A piece of software may depend on other software by relying on its definitions, specifications and interfaces or by invoking them through dynamic or static linking, network communication and various types of interfaces and connectors. Therefore, software licences differ from those for other types of works in that they focus on the different ways in which software is used and, when a particular type of use occurs, how it affects the licensing of other software that uses it and the extent of that impact. These conditions will include all the terms and obligations set out in the licence of the software being used. If the requirements are few, the licence is called permissive, as opposed to restrictive, or, more correctly, copyleft. If the scope is narrow, it does not extend to all extensions of the work used or all software that uses the licensed material. Instead, it is limited to requiring the availability of modifications to the original work or the modified existing files. The term used in this case is ‘weak copyleft’. When the scope is broad, the licence of the component used and its associated terms must be applied to all software that uses it. This is referred to as ‘strong copyleft’.

Public domain licences offer the most permissive model. Anyone can modify and use software without any restrictions. But even if a component is free and without legal restrictions, one should always make sure it is secure before including it in the codebase.

Permissive licences contain minimal requirements on how the software may be modified or redistributed. Users do not have to republish the changes they make and usually only need to credit the original authors. They contain a disclaimer and often require that modifications be described. This type of licence is used by almost two-thirds of the open source software in circulation []. Permissive licences are popular because of the flexibility they offer to those using such licenced software and the low IPR risk. These licences include the MIT (the most popular, short and simple), Apache 2.0 (requires notice of changes, grants a licence to patents unless challenged in court, and mentions the preservation of trademark rights), BSD (some versions require the inclusion of a disclaimer) and ISC (along with its OpenBSD variant is a further simplification of MIT and BSD). Artistic Licence (used for Perl and in several variants of versions 1.0 and 2.0) is permissive but includes compensation for damage.

Copyleft licences, also known as reciprocal licences or restrictive, protective and even viral licences, allow the modification of the code and the distribution of new works based on it as long as the requirements for redistribution under the same conditions are met. This is to ensure that the rights from which the user or modifier has benefited are preserved in derivative works by prohibiting contributors from appropriating their modifications, which would place them in an asymmetrical position vis-à-vis upstream contributors. This usually means that anyone who modifies the code must also release their modifications under the same licence. Copyleft licences are often in commercial settings considered to be riskier, as they can limit potential business value or jeopardise the secrecy of intellectual property. All copyleft licences are used by more than a third of open source software.

Weak copyleft licences have a library or file cope. Examples are the LGPL (Lesser GNU General Public License; 2.1 cleans up the text of 2.0 and allows dynamic linking without enforcing copyleft; 3.0 allows patent use, is not compatible with LGPL 2.0 but is compatible with Apache 2.0 and the end user must be able to install a modified version – it prohibits closed devices, DRM or hardware encryption or patent retaliation); EPL (Eclipse Public License 1.0 and 2.0); MPL (Mozilla Public License 1.0, 1.1 and 2.0 – it is simple, allows static linking and licence variants with additional conditions); Ms- PL (Microsoft Public License), Ms-RL (Microsoft Reciprocal License) and CDDL (Common Development and Distribution License 1.0 and 1.1) require only the release of the modified code, allowing the use of open source libraries in proprietary software. MPL, Ms-RL and CDDL require this only for modifications to existing files. Libraries under LGPL, EPL and Ms-RL allow proprietary licences for the code that uses them, but the original licence extends to new files in a modified library.

On the other hand, strong copyleft licences often require the release of the entire project or product under a licence that is the same or similar to that of the work used. Among copyleft software, the use of strong copyleft licences clearly predominates. These licences aim to get everyone on the same page and stop the ‘free-riding’ that is still possible with permissive and weak copyleft licences. By introducing these restrictions, the creators of strong copyleft licences wanted to expand the presence of open source software, ensure the sustainability of the open source software ecosystem and strengthen the open source software movement. The most common and widely used licence is the GPL (GNU General Public License; 2.0 is more commonly used; 3.0 grants the use of patents, is compatible with Apache 2.0 and the end user must be able to install modified software).

AGPL (Affero General Public License 3.0) is similar to GPL, but it is also network protective. Use over a network is considered distribution, so modified code must be available to external users. It is becoming increasingly popular because it closes the ‘ASP/SaaS loophole’ of the GPL that allows software under the GPL to be used without disclosure since SaaS software by its nature is not distributed to users. The AGPL is, as its preamble states, “specifically designed to ensure cooperation with the community in the case of network server software”.

Source-available and ‘fauxpen’ licences

There are also non-FOSS restrictive licences, often presented or perceived as similar to FOSS, but which impose restrictions that prevent them from being considered open source according to the Open Source Initiative (OSI) and free according to the Free Software Foundation. Source available licences (or shared source licences) are proprietary licences that allow the source code to be viewed and, in some cases, modified and redistributed. They make the code available for viewing to facilitate scenarios such as inspection, understanding how it works, debugging, integration, or testing of external components. Examples of such restrictive licences include Business Source License (BSL), Microsoft Limited Public License (Ms-LPL), Microsoft Limited Reciprocal License (Ms-LRL), and Microsoft Reference Source License (Ms-RSL). Some of these licences grant rights only to developers of Microsoft Windows-based software, while the Ms-RSL allows viewing of source code for reference and debugging purposes.

The user has no rights to use, redistribute, modify or (sometimes) even compile the code. On the other hand, FOSS is not just about access to the source code, but about full freedom to use it, even for commercial or objectionable purposes, as long as the same freedom is preserved for those who use or even pay for the code in question.

‘Fauxpen’ licences are similar to source-available licences. They are presented as open, but a closer look reveals that the licensed software or product is actually under the strict control of the vendor. These hybrid licences are intentionally deceptive and confusing. The Server Side Public License (SSPL) is a strong copyleft licence that requires the public release of the source code of service management layers when a service is provided. This prevents cloud providers from offering software licensed under the SSPL to third parties as a service, as they must release all source code, APIs and other software required by a user to run an instance of their service under the SSPL. The SSPL also makes it impossible to use the Linux kernel, which is under the incompatible GPLv2-only licence. Thus, the SSPL discriminates against a particular field of use. ELv2 (Elastic License v2) is a non-copyrighted licence that prohibits making the products available to others as a managed service, circumventing the functionality of the licence keys, or removing or disabling features protected by licence keys.

Open Source Rule 6 (“no discrimination against fields of endeavour”) and the FSF’s Freedom Zero (“the freedom to run the program as you wish, for any purpose”) indicate that ‘fauxpen’ and open source licences are not FOSS. Vendors of software who switch to such licences effectively convert projects that started as open source to proprietary licences and admit that their business models are not compatible with open source. They claim that they want to protect their work from unfair exploitation by cloud providers and other free riders who would use their software without paying for its creation and maintenance. At the same time, they appropriate the contributions of outside developers who have donated their time and energy by contributing to projects when they were open source. In addition, these companies often use code from other open source projects to run their business.

When a vendor switches to a proprietary commercial licence, it can determine the timing, terms and costs. The future cost of the software and even its future availability are uncertain, as with any other proprietary software. When formerly open software is embedded or modified in a proprietary product, users must agree to the terms of a proprietary licence, be left with an unmaintained version, or fork the last open version of software and bear the associated burden of maintenance forever.

Products subject to this bait-and-switch became popular because they were marketed as FOSS, as developers prefer to have control over what runs in their programs and fix it or have other people fix it, even though they are not affiliated with the original maker of the tool or component. In addition, such platforms are gaining traction because they tend to offer free, one-stop solutions, while expensive licences for commercial alternatives often add up and open source replacement solutions are less integrated. Even developers in large enterprises prefer to use FOSS rather than go through the slow, bureaucratic and multi-layered approval and procurement processes.

The only advantage of the earlier FOSS status is the ability to fork the last open version, but even this window of opportunity can effectively close after some time if the community stays with the vendor and its proprietary changes. Forks are also difficult because of the resources required and the need to change brands, while people do not easily switch from one brand to another.

All this means that access to updates to software under permissive licences and those with the ‘sublicense’ option can be volatile in the long run if the software is controlled by a single company, as the cases of Elasticsearch and MongoDB show. This is why it is so important to choose software that is guaranteed, or, at least, highly likely to remain FOSS.

Copyrights, patents and warranties

Copyright is a form of intellectual property that allows the creator of an original work to license that work to the extent governed by copyright law. No registration or official notice is required to declare copyright in a work, only a clear and visible statement of copyright and a definition of its subject. Copyright can also be easily transferred to another subject, typically by contract or statement. Since open source software licences already by definition make the work available to everyone under clear conditions, copyright as such is not an issue, but the actual details of licensing conditions. The licence is affected by whether copyright is stated in the licence text and how the text of the licence and the copyright notice (containing the original copyright and attribution notices) should be included and presented in the licenced material. The licence may require the copyright notice for source code only, or may also require it for binaries.

Patents are a much more complex form of intellectual property. An organisation or individual that has invented something substantial, new and useful proves this through a regulated, expensive and time-consuming patent registration procedure. If this process is successful, the patent holder is granted the right that excludes others from making, using, selling, offering or making available the patented invention for a predetermined period (e.g., 20 years), and fees may be charged for using the patent. A patent can earn the owner the associated royalties in the form of financial compensation for the use of the invention, while its infringements are internationally enforceable and can be prosecuted in court. Patent owners try to extend the boundaries of their patent while seeking infringements to maximise royalties and penalties to cover the costs of invention development, filing or acquiring and maintaining the patent, scanning for infringements and litigation. Therefore, there is always a risk of possible and even unwitting infringement of a patent by the licensor of software and subsequently by its licensees.

Some licences describe how to deal with potentially applicable patents and royalties, which removes at least some of the patent-related uncertainty. A licence may state that it does not grant rights to contributors’ patents, or it can explicitly grant contributors’ patent rights. Both models remove some uncertainty, but they do not solve the problem of patenting. The latter approach is an attempt to prevent the appropriation of innovation and software through patents. But no software licence would protect a licensee from being sued by a third-party patent holder, since licensors can only license works that they own. Because software patents are often too vague, abstract and ambiguous, they can easily be used as a weapon; they may even protect concepts or methods of interacting with a system. Neither the licensor nor the licensee may be aware of such a patent, so a patent troll or a competitor with a corresponding patent can appear at almost any moment.

Even if a patent holder has licensed the patent for use in open source software, or the applied FOSS licence waives all patent-related obligations, that patent may later be narrowed or revoked through litigation by a holder of a competing patent. In this case, even the software licensee who has fully complied with the terms of the original licence and the licensor’s patent may be held liable for infringement of a competing patent if it continues to use affected software. Since the narrowing or cancellation of the original patent would not only affect its owner but also its licensees, licensees may wish to participate in the protection of the licensor’s patent. Since this can be expensive, licensees should only engage in such an endeavour if their business would be seriously affected by the competing patent holder’s claims.

Other constraints and rights

Most licences require that the licence text and copyright notice accompany the licenced material. Some also require documenting the changes made to the licensed material.

Some licences:

  • Describe the circumstances under which the source code must be made available
  • Indicate whether the changes must be documented
  • Describe permission or prohibition to use contributors’ names, trademarks or logos
  • Declare whether they include a limitation of liability. Some clearly state that there is no warranty and that the software producer cannot be held liable for damages. They explicitly state that they offer no guarantees or warranties for the use of the code, so the author cannot be held liable if the code does not work well in a particular case.
  • Are peculiar about the use of software or even restrict the type or field of usage (e.g., by prohibiting commercial, military or use over a network), which prevents them from being considered true FOSS licences.

Contributor agreements

Copyleft licences in principle prevent code from being incorporated into or relicensed as proprietary code. However, a licence change may still be possible, as contributor agreements open a loophole. The terms that are typically used are Contributor License Agreement, Copyright Transfer Agreement or Copyright Assignment Agreement. These agreements are used by organisations that own or use contributions as custodians of software. They often involve a transfer of copyright. However, if these agreements involve the transfer of unrestricted reproduction rights, permit unrestricted distribution, or expressly permit relicensing and even sublicensing, the contributed code may be relicensed at the discretion of the custodians regardless of the copyright.


Software relicensing is done for commercial reasons or to improve licence compatibility.

  • In the first case, the change is typically towards a proprietary licence, often a ‘fauxpen’ or source available licence. The consequences of such relicensing are the elimination of some previously allowed usages, the appropriation of prior contributions and the restriction of access to further improvements of the software.
  • Relicensing for better licence compatibility is done when the current licence is incompatible with those of other jointly used components so that a larger combined work could be licenced.

Relicensing is possible:

  • Due to the previous use of a permissive FOSS licence or another licence that allows sublicensing;
  • When the contributors grant the custodian organisation the right to sublicense or relicense software through contributor or copyright agreements which allow redistributing the work under a different licence;
  • If the owner of the proprietary code so decides.

Adding an alternative licence is not relicensing, as the old licence remains fully valid for those who choose it. Multi-licensing is therefore a better way to improve licence compatibility than relicensing. Furthermore, it is not necessary for all contributors to have previously signed a licence or agreement. Licences in the style of ‘Or later’ are a concisely expressed form of multi-licensing in which all subsequent versions of said licence are accepted in advance, including those that do not currently exist.

Managing FOSS licences

Use of FOSS licences depending on project intent

For internal use

  • One can use any FOSS and not worry about licences – they own their code and do not share it with others, which is OK with all FOSS licences.
  • The code is kept private, but the internal use is very limited – the use of software can easily evolve into sharing or use in commercial contexts where other parties are involved.
  • What if the creators later decide to offer the software to others? Without considering the licences of the components used, they may end up with components with incompatible licences and not be able to choose one for their product/project. Therefore it is important to:
    • Start early to consider licences and the general attitude towards FOSS licences.
    • Find out about the licences of the components used and determine which licences are acceptable for the project.
    • Determine the potential future licence if the way software is used changes.

Sharing software with someone

  • With permissive licences of components, modifiers do not have to provide source code.
  • For copyleft components, access to some or all of the source code must be permitted.
  • When sharing, the same or a compatible licence must be used for the modified code or even the entire project.
  • With several strong copyleft components, creators may not be able to pick up a licence compatible with all components.
  • Licence compatibility has become an important and very topical issue in the wider software community.
  • One should think twice about software under a permissive licence that is effectively controlled by a single entity, especially if software may be used in a service. Some modifiers or their customers might therefore prefer copyleft so that they would be more protected from licence changes.

For a service, the provider is safe if it uses any FOSS except the AGPL or similar licence. But even that is OK if the provider does not mind users getting its code or are not interested in relying on cloud providers, who may be prohibited from hosting a service based on such software. The same applies to ‘fauxpen’ licences such as SSPL or ELv2.

Impact of licences on community, quality, longevity and sustainability

Projects often follow a natural cycle of creation, a burst of intense activity, long period of steady use and productivity, and fading as they are replaced by new projects in the same space but with a more advanced technological base; this happens through the slow or rapid migration of the community. Factors that influence the sustainability and longevity of software are often analysed [,]. The longer a project exists, the more likely it is to persist. The activity of the community (number of contributions and active contributors) and the quality of the core members are more important for the sustainability of software than the size of the user base. An analysis of Ohloh data, now at [], across a large number of FOSS projects [] shows that:

  • The larger the project, the more likely it is that its licencing issues were resolved and a licence specified. The portion of projects without a specified licence decreases with the number of monthly committers – it starts at about 50% for a single committer, drops to 40% for five, and stabilises at about 20% for projects with more than contributors.
  • Permissively licenced projects are evenly distributed regardless of size. They start at 20% for up to 10 monthly contributors, peak at about 25% for 20-30 contributors, and then return to baseline.
  • The use of copyleft licences coincides with the size of the active community. It starts at about 20%, increases to about 35% after 10 committers, and ends at about 40% for projects with many contributors.

The lack of a clear licence is an indication that developers consider licencing unimportant, confusing or too time-consuming for their purpose. Such projects are usually not long-lasting and do not build a large community.

The utility of software is maximised when the largest possible number of users can appropriate its benefits. But FOSS, like many other parts of the digital infrastructure, suffers from a free-rider problem: “Resources are offered for free, and everybody (whether individual developer or large software company) uses them, so nobody is incentivised to contribute back, figuring that somebody else will step in.” []. Free riders have a competitive advantage because they did not have to invest in the initial development and can instead invest in developing additional benefits and services. While free riders do not exclude others from using the code because it is not an exclusive resource, they can diminish the original creator’s access to users. Users become customers and customers are an exhaustible shared resource because they tend to become attached to a provider. Customers contribute to the provider’s revenue in several ways determined by its business model. Although most see free riding as deeply unfair, it is still better to have someone using your open source software than someone else’s. The presence of free riders makes it more likely that others will also use software and some of them will contribute. Therefore, free riders, including competitors who capitalise on the work of others, can have a positive overall effect by acting as intermediaries towards other contributors and customers. A large user community brings the contributors and paying customers and even the sponsors who would otherwise not show up. However, for this to happen, competing with free riders must be prevented from suffocating the main contributor. This means that the original contributor’s offering (beyond software) must be exclusive in some way to create an incentive for users and customers to join.

  • Permissively licensed software can start small, stay that way, or increase in activity, but seems somewhat constrained by the optional return of modifications to software and the community.
  • Weak copyleft licences are suitable for libraries and other components where popularity and usefulness would be significantly affected by the expansive licensing rules of strong copyleft licences.
  • Strong copyleft licences are suitable for large or standalone projects such as operating systems and specialised or productivity tools.

There are a growing number of companies whose business model is based on FOSS. This model is called commercial open source software. Their commercial offerings usually take the form of proprietary or closed-source intellectual property that may include a combination of premium features and hosted services that provide performance, scalability, availability, productivity and security assurances. This is referred to as the ‘open core business model’. Some of them also offer professional services, including maintenance and support assurances.

The obligation to keep all modifications under the same or a compatible copyleft licence works exceptionally well for projects like the Linux kernel. This is especially the case when the licence does not preclude the use of the software to run other software that is under other licence types. Therefore, the use of a copyleft licence can be a great advantage for software, especially if it does not restrict its use in normal usage scenarios. For this reason, weak copyleft licences such as the LGPL have been developed for uses where it is more important to increase the number of contributors (as in research software) than to increase the popularity of software by making the terms of use as liberal as possible or to maintain the competitive advantage by keeping the code proprietary. Furthermore, a combination of (often copyleft) open source with additional proprietary add-on components or services is a frequently used approach that reconciles openness and sustainability.

If a large enough member of the community has sufficient influence over the platform, they may decide to fork it under a proprietary ‘fauxpen’ licence that significantly limits its use, while redirecting most of the current users to the fork and taking full control of the new developments. The inadequacy of the original business model or the emergence of competing vendors is the reason why some open source producers, who were seen by the communities as their guardians, are taking this move. Of course, this is only possible with permissive licences, which usually allow appropriation through relicensing. The appropriator does not even have to be the main creator of the software, but the one to whom most users refer, for example by offering support or popular commercial add-ons. A similar result can be caused by the extensive use of software as part of a cloud offering, where the cloud provider effectively distributes and monetises open source software without making any meaningful contribution to it, or by providing proprietary add-ons that are usually limited to facilitating access to the platform within its cloud offering. The original provider with the ‘open core’ business model may then move to a network protective or ‘fauxpen’ licence. However, some projects with permissive licences, such as the Apache web server, have extremely long lifespans and large communities.

It is not possible to empirically determine whether software longevity benefits more from copyleft or permissive licences, as this tends to depend on other circumstances. The choice of a licence that supports software sustainability and longevity depends primarily on the attitude of the developers and the community, as well as the primary use scenarios. Of course, such a choice may not be possible at all due to organisational, funder or dependency requirements. Interestingly, when this choice is available, the actual selection depends more on intrinsic motivations and perceptions of fairness than on extrinsic motivations such as expectations of reputation or economic gain []. On the other hand, if developers invest in interoperability and open standards, this can significantly boost the acceptance of the project, regardless of the licence.

Multi-licensing under permissive and copyleft, or copyleft and proprietary terms is also a viable solution, as it increases licence compatibility if software is to be combined with other components to create new products. At the same time, it allows for a wider range of users and stimulates future contributions, at least to some extent.

Not to be neglected is also the user base, which contributes significantly to the sustainability of permissive FOSS. Users determine the functionality, identify the bugs and set the direction of a project to meet their needs. This can lead to sophisticated products that 'just work' without much configuration and customisation, as long as the target audience is large enough and other factors contribute to the product's ecosystem. That said, it is often very difficult to determine the size and engagement of the user community. What is often much easier to assess (in selection) but difficult to stimulate (in development) is the broader ecosystem around a project created by the involvement of other providers who may offer support, consultancy, customisation, hosting or bundling with their products or services.

In addition to the often-emphasised doubts about the business models of FOSS, some authors [Lanier, Jaron, You Are Not a Gadget: A Manifesto] decry the expropriation of intellectual production through open source and open content as a form of ‘digital Maoism’ that stifles small business and destroys middle-class opportunities to finance content creation, resulting in a concentration of wealth among a few companies and individuals who set themselves up as concentrators of content and services. However, instead of FOSS, this criticism should rather be directed at the centralisation of distribution and advertising platforms and the model of "free services" paid for by the resale of personal data, user profiles and targeted marketing. The big concentrators rely on FOSS like everyone else, but their core components are always proprietary. On the other hand, large tech companies often create FOSS platforms and tools for consumers and developers to tie customers to their ecosystems and technologies. Typical examples include some very popular application development tools, runtime environments, non-SQL data storage and processing platforms, and AI platforms that are usually conveniently connected to the companies' cloud offerings. So whenever a large technology company offers an elegant FOSS component or platform, developers should think twice about becoming ‘products’ and part of the company's camp again.

Despite the success of FOSS, scaling and sustaining open source projects remains a challenge.

Licence selection and compliance

Copyleft licences provide stability in licensing, while permissive software can be forked and relicensed by a key contributor or a company that provides popular free or commercial services or products based on that software. Such an organisation can also strongly influence the development of software and usage patterns.

  • The options available may be prescribed or recommended by the institution, project management or funder.
  • The constraints of other involved parties and coauthors must be respected.
  • The constraints imposed by the dependencies' licences must be respected.
  • There may be some typical and established software licensing practices in the community.

Personal preferences and attitudes of software authors, who should also take into account desirable public messages and non-mandatory institutional, project or funder preferences regarding software licensing and open source.

The choice is usually quite simple. Most often, existing constraints dictate the type of licence. If institutional or other policies prohibit the use of copyleft licences, this also means that the software may not use components under such licences. However, if it is allowed and such components are needed and useful, then a compatible copyleft licence must be used.

The possibility of a relatively free choice exists in a situation where all important components used have either permissive or weak copyleft licences. If components with weak copyleft licences are modified, these modifications must retain the original or use a compatible licence.

Steps for managing software licences

Collect and document information

  • Note the licence of the ‘product’ (whole bundle of created components) or the ‘project’ (one program or standalone component), if specified
  • Create an open source inventory of the components used
  • Identify vulnerable open source components (for removal or replacement)
  • Identify obsolete open source libraries (for replacement)
  • Identify the licences of the components used (in-licences)
  • Clarify ambiguities or doubts, e.g., when using or changing libraries
  • A tool may not be able to identify a licence correctly - software composition analysis (SCA) tools may indicate likely licences or ambiguous licence versions
  • Information about the licence used may be incorrect, unclear or contradictory
  • Some licences may be recognised under multiple names
  • Some (permissive) licences (BSD, Artistic...) have unnumbered variants or are sometimes edited by the authors
  • The applicability of 'or later' licences may be unclear or edited in the licence text
  • Document collected information - SCA tools provide this through reports, UI and data exports
  • Document decisions - some of them can be refined during remediation


  • Make initial improvements
    • Refine partial licence information
    • Update or replace vulnerable open source components
    • Update outdated open source libraries (where possible)
    • Ask component authors to clarify or relicense their licence
    • Pay for the proprietary licensed software you need
  • Choose a product/project licence (out-licence) that is compatible with key dependencies
  • Choose between dual licences of components and record decisions
  • Identify the remaining incompatible licences
  • Decide what to do with the components that use these licences and record decisions
    • Remove the component and corresponding functionality
    • Replace it with an existing equivalent
    • Move the component to the server side (central service)
    • Write an in-house replacement
  • Enforce compliance with the open source licence (e.g., provide all necessary compliance artefacts)
  • Accept some risks

Create compliance artifacts (to ensure compliance)

  • As required by the applied policy

Software composition analysis (SCA) and licence selection tools

Several available tools analyse the dependencies of software projects, the libraries used, their licences and the licences declared or distributed with the source code. Ideally, the software composition and the licences of components and parts of the code should be continuously monitored as part of the build process.

Commercial SCA services

FOSS SCA solutions

Licence selection tools

Sustainability of FOSS in science

Researchers are often overwhelmed by the variety of software tools available, unsure of their quality and questioning the quality of their calculations, so they resort to developing tools tailored to their specific use cases. This leads to a large number of tools and packages that have limited support, a short lifespan and a small number of users. Often developers simply do not want to look elsewhere because they are paid to code, can justify it, or think their case is unique or very special. Releasing software as open source mitigates, but does not solve this problem.

Despite the success of FOSS, scaling and sustaining open source projects remains a challenge. Sometimes researchers or developers manage to maintain their tools as a side project (and sometimes build a whole community while keeping the tool free and open source. An example of this is Gephi. Small open-source communities can also rely on volunteers and self-governance.

It is more interesting to look at large open source projects to see how individuals and companies that make a living from FOSS can be financially sustainable. More and more researchers are finding ways to develop tools that are both open source and can generate revenue:

  • Organisations can be funded through institutional membership and fundraising. Lyrasis, with its open source platforms ArchivesSpace, CollectionSpace, DSpace, Fedora and VIVO, has more than 1000 members and launched the DSpace Development Fund (DDF) in 2022.
  • Some projects follow an open core model (e.g., RapidMiner) by licensing parts of the code that allow scaling to enterprise levels.
  • Based on its strong community and the influence that comes from its open source software, an organisation can offer commercial services. Lyrasis provides certification of partners and other providers, hosting, consultancy, training, digitisation, preservation and fiscal services. It also mediates in content creation and acquisition and grant applications.
  • The main contributor to some open source software may also use it or related services as a means of promotion and visibility, or to engage in a larger collaboration. This requires funding for the operation, support, maintenance and development of software and services from other sources, such as its other businesses, institutional budgets, projects or national grants for scientific infrastructures.
  • The project may offer its solution as a service to be used within a larger scientific infrastructure, platform or collaboration, and in return share part of its revenue, regardless of its business model.
  • Access to venture capital (VC) and private investors may be appropriate for teams that intend to commercialise or profit from tools that cover a wider market than the academic sector.

Scientific open source and related services cannot expect to generate enough traffic to sustain themselves on advertising. Crowdsourcing is suitable for projects that deal with the research of diseases or therapies for these diseases. It is also possible to raise small donations from interested individuals or large companies if the topic is attractive and many people are interested or passionate about it, such as climate change, astronomy or long-standing mathematical problems. This can be extended further by encouraging the public to participate, by offering individuals to become citizen scientists, or by making their computing resources available, as is the case with distributed computing projects. Examples include Folding@home, iThena,, PrimeGrid, World Community Grid (WCG), Rosetta@home, Cosmology@Home, SETI@Home, (CPDN) and LHC@home, most of which are based on the LGPL-licensed Berkeley Open Infrastructure for Network Computing (BOINC) []. These platforms do not seek to monetise even a part of the resources they receive, as this would repel their contributors, especially those who provide resources that belong to someone else (e.g. their employer). It should also be noted that the processing load on these platforms is often intentionally proprietary to protect the integrity of computation, as analyses performed may be vulnerable to bombardment by fake results from tampered worker nodes, which would then require verification in a controlled environment.

The open source operating models described above offer different advantages and disadvantages and require different levels of commitment. Therefore, there is no single approach that will solve the sustainability challenge. However, options that have already been successfully implemented, recipes and examples from a wider OS community and especially COSS (Commercial Open Source Software) [Popp, Karl. (2015). Best practices for commercial use of open source software: Business models, Processes and Tools for Managing Open Source Software;] are helpful. The Software Sustainability Institute provides guidance and support around open source code for research. The Apereo Foundation is a membership organisation that provides guidance and incubation opportunities to teams working on open source technologies for learning and research in higher education.

Managing a growing FOSS project that is changing its business model can be extremely difficult. Self-governance, centralisation of originally distributed projects and privatisation or commercialisation require clear rules for membership, contribution and appropriation rights. In the case of self-governance, these rules require oversight and enforcement by an external agent or several members of the group, or an agent for centralisation and privatisation.

OS in science has a smaller user base, but may be more attractive for institutional sponsors. There are also many examples of organisations forming consortia or setting up a non-profit organisation to support the development and sustainable management of scientific OS tools. It is also good to note that OS for science can easily emerge in the context of international or university research projects and, after the initial incubation, maturation and testing, can continue to operate in a broader environment. The software tools developed within NI4OS-Europe (LCT, RoLECT and RePol) are examples of this. In the long term, FOSS, currently being challenged by COSS, may prove more appropriate for scientific software than for use in the commodified environment of commercial cloud-based services.

FAIR-related guidelines for software creators

The list offered here is a mix of "Four Simple Recommendations to Encourage Best Practices in Research Software" [] and "Five Recommendations for FAIR Software" []:

  1. Make the source code public and use the publicly available and versioned repository from the beginning (relate is the practice of depositing software in archives due to changes in journal policies – the primary goal here is the reproducibility of results by preserving the research environment. For this reason, software tends to be deposited in specialised repositories that have been developed and evolved independently of scientific ones. These platforms offer long-term benefits and support the improvement of software as living products maintained by multiple contributors by providing specific features, access mechanisms and integrations). But even putting software on GitHub does not do much for reusability without a clear licence and README information as primary enablers and indicators of reusability.
  2. Adopt a licence and meet the licensing requirements of all dependencies and contributors.
  3. Provide basic metadata by registering software in a relevant community registry to make it easy to find (it is sometimes described in the registry documentation, but you can check if this could be done with a tool for populating the registry).
  4. Create clear and transparent workflows for contributions, communication and governance.
  5. Enable citation of software using services that meet the software citation requirements.
  6. Meet the standards of the domain community. The lowest is to adhere to the conventions expected by the community in terms of formats used to read and write data, but also in terms of features provided, terminology, conventions and practices of other domains. Even if this is not directly needed for the immediate purpose, it increases adoption and reuse, and the chances for sustainability and external contributions. Do not limit yourself to software metadata and the community's minimal expectations on documentation.
  7. Use a software quality checklist to assess components and your research software [6]:
    1. Community support and adoption (with factors such as popularity, reputation, size, communication channels and participation)
    2. Documentation
    3. Costs (licence, training, support, etc.)
    4. Licensing conditions
    5. Operational characteristics such as independence from other software, development language, portability, compliance and testability
    6. Maturity level
    7. Quality aspects such as reliability, performance, modularity, maintainability, code quality and architecture
    8. Perceived risks related to confidentiality, integrity, availability, etc.
    9. Trustworthiness of components, architecture and platform, the reputation of the provider, third-party evaluations

Reporting on open source in NI4OS-Europe Agora

Data on the use of open source software and technologies are collected in NI4OS-Europe Agora according to the EOSC Portal Profiles v4.00 specification [] in the field labelled ERP.MTI.5 or "Open Source Technologies" (in "EOSC Resource Profile Tables / Data Model / Maturity Information / Open Source Technologies"). This field is used to create a "List of open source technologies included in the resource". The specification states that this field is for specific technologies and not for general technologies such as HTTP or Linux. This field is optional and can contain multiple values of up to 100 characters. The validation criterion is quite simple: "Check that the technologies mentioned/projects exist".

However, as the use of free and open source software (FOSS) is an important factor in the promotion of Open Science and related services, we are asking you, in the context of NI4OS-Europe, to report in more detail and more extensively on the use of FOSS by your services. Therefore, please provide a one-line description for each major component of your service. If possible, follow the identifying name of your component (version numbers should rather be avoided) with the comma-separated name or SPDX code of the corresponding software licence []. If you deem it necessary, you can also specify corresponding URLs; however, make sure that the entire line is no longer than 100 characters. If there is enough space, you can also provide a short description of the purpose of the software by separating it with ‘ – ’. Here are some examples of valid descriptions:

To assist you in providing this information, below are the names, licences, URLs and uses of “Open Source Technologies” most frequently mentioned in the NI4OS-Europe Agora (as of July 2022):

Name URL Licence Licence URL Purpose
DSpace BSD-3-Clause (permissive) Repository
PostgreSQL PostgreSQL (permissive)
Apache HTTP Server Apache License 2.0 (permissive) Web server
Java EE (now Jakarta EE)

Eclipse Public License 2.0 (weak copyleft)

or GNU General Public License 2 with the GNU Classpath Exception (weak copyleft), Enterprise Java

Server Side Public License (SSPL) v1.0 after October 16, 2018 (proprietary).

GNU AGPL v3.0 before October 16, 2018 (network protective strong copyleft),,

old versions:

Non-relational DB
MySQL GPL v2+ (strong copyleft) or proprietary

Relational database
Spring Boot Framework Apache License 2.0 (permissive) Application framework
Angular MIT (permissive) Application framework
Apache Tomcat Apache License 2.0 (permissive) Web server
Google BERT Apache License 2.0 (permissive) Language model for NLP
Numpy BSD (permissive) Scientific computing library
OpenStack Apache License 2.0 (permissive) Cloud platform
Python PSF Licence Agreement, GPL compatible (permissive) Programming language
PyTorch BSD (permissive) Machine learning framework
Scikit-Learn BSD 3-Clause ("New" or "Revised") (permissive) Machine learning library
TensorFlow Apache License 2.0 (permissive) Machine learning platform/library

NI4OS-Europe service administrators often specify ‘Linux’ and ‘XML’. As mentioned earlier, please do not indicate the use of such generic or general-purpose technologies.

Sometimes the primary software product used by your service may rely on other software for which there are some alternatives. Please try to indicate these too in separate entries under “Open Source Technologies”. For example, repositories based on DSpace and its supporting tools use several components where there are some choices. Highlighted are the components chosen for the repositories managed by the UoB, along with their possible alternatives:

  • Java-environment: OpenJDK (GPL-2.0-only with linking exception) instead of Oracle's Java (Oracle No-Fee Terms and Conditions (NFTC))
  • Web server: Apache Tomcat (Apache License 2.0), Jetty (Apache License 2.0 and Eclipse Public License 1.0), or Caucho Resin (GPLv3 or proprietary)
  • Relational database: PostgreSQL (PostgreSQL License, similar to BSD or MIT); a less preferred alternative would be Oracle (Oracle Database XE, without stored Java procedures and limited in terms of data size and use of only one core)
  • Reverse proxy: NGINX (2-clause BSD licence or proprietary) or Apache (Apache License 2.0)
  • Non-relational DB: Solr (Apache License 2.0) instead of Elasticsearch (Elastic License 2.0 (ELv2) or Server Side Public License (SSPL), which are a source-available ‘fauxpen’ licence and a common proprietary licence)


  1. A Fresh Look at FAIR for Research Software,
  2. Barthonnat, Céline, Blotière, Emilie, Gingold, Arnaud, Mas, François-Xavier, Stanić, Nikola, Pierno, Alessandro, Szulińska, Agnieszka, Armando, Lorenzo, Pochet, Bernard, de Santis, Luca, MacGregor, James, Pozzo, Riccardo, & Pogačnik, Aleš. (2021). OPERAS SIG on Tools for Open Scholarly Communication: White Paper 2021. Zenodo,
  3. Black Duck Open Hub,
  4. Chue Hong, N. P., Katz, D. S., Barker, M., Lamprecht, A-L, Martinez, C., Psomopoulos, F. E., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., Honeyman, T., et al. (2022). FAIR Principles for Research Software version 1.0. (FAIR4RS Principles v1.0). Research Data Alliance.,
  5. Discover your next tool for social media analysis – A list of tools and software to support the collection and analysis of social media data (dataset),
  6. Donnie Berkholz, The size of open-source communities and its impact upon activity, licensing, and hosting, April 22, 2013
  7. Duca, D. (2019), The ecosystem of technologies for social science research (dataset). doi: 10.5281/zenodo.3555207,
  8. Duca, D., & Metzler, K. (2019). The ecosystem of technologies for social science research (White paper). London, UK: Sage. doi: 10.4135/wp191101,,,
  9. Duca, D., Developing a comprehensive directory of tools and technologies for social science research methods,]
  10. Five recommendations for fair software,,
  11. Ford Fondation, Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure,
  12. HAL,
  13. Hasselbring, Wilhelm, Carr, Leslie, Hettrick, Simon, Packer, Heather and Tiropanis, Thanassis. "From FAIR research data toward FAIR and open research software" it – Information Technology, vol. 62, no. 1, 2020, pp. 39-47.,,
  14. How to evaluate the sustainability of an open source project, January 22, 2014,
  17. Ben Rometsch, Interview with Joseph "JJ" Jacks: Founder and General Partner, OSS Capital’s Vision for Open Source Software, May 25, 2021,]
  18. Jackson, 2019,
  19. Jiménez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations to encourage best practices in research software (version 1). F1000Research 2017, 6:876,,
  20. John W Maxwell, Erik Hanson, Leena Desai, Carmen Tiampo, Kim O'Donnell, Avvai Ketheeswaran, Melody Sun, Emma Walter, Ellen Michelle, A Landscape Analysis of Open Source Publishing Tools and Platforms, Mind the Gap, Simon Fraser University, July 2019,
  21. Katz DS, Chue Hong NP, Clark T et al. Recognizing the value of software: a software citation guide (version 2). F1000Research 2021, 9:1257,
  22. Kiselka, B. (2015). Software project longevity – a case study on open source software development projects [Master Thesis, Technische Universität Wien]. reposiTUm.,
  23. Lamprecht, Anna-Lena et al., Towards FAIR principles for research software, Data Science, vol. 3, no. 1, pp. 37-59.,,,
  24. Lanier, Jaron, You Are Not a Gadget: A Manifesto. New York, Vintage Books, 2011.
  25. Mariannig Le Béchec, Aline Bouchard, Philippe Charrier, Claire Denecker, Gabriel Gallezot, et al., Pratiques et usages des outils numériques dans les communautés scientifiques en France. [Rapport de recherche] Comité pour la science ouverte. 2022, 112 p. hal-03545512,,,
  26. Martinez, et al. (2022), A Survey on Adoption Guidelines for the FAIR4RS Principles: Dataset (1.0) (dataset), Zenodo,
  27. Michael Jackson. (2018). Software Deposit: Guidance for Researchers (1.0). Zenodo.,
  28. Nicolas Suzor, What motivates free software developers to choose between copyleft and permissive licences?, August 8, 2013,
  29. Open Source Initiative, Licenses & Standards,
  30. Open Source Initiative, The Open Source Definition,
  31. Popp, Karl. (2015). Best practices for commercial use of open source software: Business models, Processes and Tools for Managing Open Source Software
  32. Projects that use Berkeley Open Infrastructure for Network Computing (BOINC), (dataset)
  33. R. D. Cosmo, M. Gruenpeter and S. Zacchiroli, "Referencing Source Code Artifacts: A Separate Concern in Software Citation," in Computing in Science & Engineering, vol. 22, no. 2, pp. 33-43, March-April 2020, doi: 10.1109/MCSE.2019.2963148.,
  34. Sanchez-P. Jorge-A. (2021). EOSC Portal Profiles v4.00 (v4.00). Zenodo.,
  35. Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. 2016. Software citation principles. PeerJ Computer Science 2:e86,
  36. SPDX License List,
  37. Top open source licenses and legal risk for developers, July 13, 2022,