Software and free and open source software in open science
Software and free and open-source software in open science
Software is one of the pillars of open science, along with publications and data. Software is nowadays omnipresent – it is used as a general scientific tool that facilitates research and supports and integrates various elements of open science (OS). Sometimes, software is an integral part of a specific scientific investigation essential for its execution, reproducibility and follow-up research. It may be even a primary result of research.
Software is used to process data. Research data is better understood with software that provides analytical or comparative views of it. Software also facilitates access not only to data from research, but also data about research. It runs the open science infrastructure, including repositories, archives, catalogues, databases and collaboration platforms. Software-based OS services and infrastructure are so important that it is safe to say that without software, and most likely, without free and open-source software (FOSS), OS would not exist today.
Despite many differences in their subjects, methods and culture, open science and open software work harmoniously towards similar goals. It is therefore crucial for contemporary researchers to be familiar with the specifics of science-related software, its use in open science and the use of FOSS in particular.
FOSS solutions and platforms represent a prevalent part of generic and domain-specific solutions in NI4OS-Europe. Subsequently, the end-users and service and resource providers are questioning the best attitude towards FOSS in their work and the relationship between OS and FOSS. The issues they are the most concerned with are proper use of FOSS, licensing, referencing, preservation and sustainability. These issues are of concern for both research software used or developed or within research communities, but also for the original NI4OS-Europe developments. This report aims to offer clear basic guidance on using FOSS software in research, OS and related services, currently and in the near future, within the NI4OS-Europe community and beyond.
Use of software in science
Software in science is typically split into research software and software in research.
Research software includes source code, algorithms, scripts, computational workflows and executables created during the research process or with research as its purpose. It is typically domain-specific. Research also directly employs computational and data processing workflows, which may combine several pieces of software and other resources, and self-contained executable notebooks that integrate data, analytical methods, procedures, code, results, visualizations and narrative. Research software can be treated as an OS resource that is managed together with other parts of research but it still needs to be distinguished as a separate entity of a special type. Furthermore, all its elements may be aggregated in a standalone OS research product, especially if it is to be used by several research projects.
Software in research includes software that is used before, during or after research and that was specifically created to assist, track, share or manage research or to facilitate participation in OS. Such software is often provided as a service or locally used tool that interacts with external services. The provided services may combine several software components or involve human contribution or intervention. Sometimes, even a general-purpose software or service can be adopted by a community within a discipline and thus become a tool used in research.
A distinction between research software and software in research is often difficult to make, as some services are exclusively used in specific scientific domains and communities. Also, specialised research software may be integrated with more general services and resources.
Finally, both software in research and research software is just a thin layer on top of a huge stack of other software and platforms that research depends upon. Although the top layer fully depends on it, this other software it is not considered as software in research or research software. Still, the licences and rules imposed on that software may be applicable, given the nature of the dependence.
Publications are typically issued only once and are used as such for a long time, as they may be updated only with minor corrections. Even though publications may be overridden, relativized or partially replaced by subsequent research and publications, they remain usable and used in their original form. Research-related products other than software are typically linked to individual publications. Datasets can be reused in many research efforts but, apart from format adaptations, are substantially unchanged, even if combined and used together with other datasets. On the other hand, software is constantly modified – even small changes in it may have a huge on its functioning. While the lifetime of one version of the software is generally shorter than that of data [https://content.iospress.com/articles/data-science/ds190026], it is quite opposite if its lifecycle and patterns of use are considered across several versions.
Research software often has a lifecycle that is independent of given research, rules for the governance of scientific artifacts and involves people who are not a direct part of related research. Therefore, software typically needs to be handled differently than publications or data. However, these specificities are not yet consistently addressed, due to a huge difference in the way software is perceived and managed. As researchers, groups and organisations use software to support different activities and address different needs and have distinctive priorities and capabilities, they also differently handle various aspects of software management. The primary aspect of every piece of software is human, while machine interpretation and execution are just parts of its lifecycle primarily marked by human reading, parsing and occasional modification. Software modification and maintenance are often very difficult and time-consuming. The code can be seen as the capture of precious knowledge, where the target discipline, logic, processing steps and technical implementation are integrated. There is also a sophisticated and evolving developer community with its conventions and culture, in which the expectations, conventions, standards and tastes change over time. Coding practices and those related to documentation, testing and quality assurance vary across scientific disciplines and researchers and often differ from the ones of professional software engineers, but even they are not uniform in their approach. The licences and copyright law apply and must be adhered to even if the software is FOSS.
The most successful pieces of software are often used for many years and often go through dozens or even hundreds of modifications and versions, especially if an agile development process and automated delivery are used. Software is managed over years by many people and through steps and processes that are not a part of the scientific process; even organisations that govern, maintain and modify it may change. During that time, their earlier versions may become completely unusable or even dangerous. The history of software changes is crucial for its understanding and maintenance. As a piece of code evolves, the causes, reasoning and intent of a change, along with all affected locations, are often a key to its interpretation and base for further modifications. The size of source code in successful software projects may grow dramatically and is often measured in millions of lines. At the same time, the code is fragile, due to many dependencies and easy to make and extremely difficult to detect mistakes in understanding and implementing interactions between components. A small mistake in a single line of code may invalidate or jeopardise a large system and impact many related software components. Besides the code itself, software metadata, configuration and customisations, licence information and corresponding service policies also need to be defined, captured, provided and tracked. Therefore, the structure of software is much more complex than one of the typical datasets as it includes many dependencies and other relationships which for the most part are not directly related to the subject of research.
Software that is used in open science or by research infrastructures and services is not necessarily free and open source. All types of software (private, FOSS and proprietary) are commonly preserved and managed by using software repositories which conceptually and functionally significantly differ from repositories used in open science. Science expects research software artifacts to be properly archived, referenced, identified, described, cited and credited, but also discoverable, visible, accessible and reused when needed. Software quality, reproducibility and traceability have to be handled differently than other less mutable research-related products. Furthermore, software needs a policy framework for dissemination, reuse, evaluation and recognition that includes funding and software-related incentives, a sustainability framework with organisational schemes, legal tools and economic models. On top of all this, both science and software communities would benefit from a strategic framework that would combine various approaches and methods for consolidation across scientific disciplines and technical communities, which should include harmonisation, technology transfer and industry collaboration.
FOSS and open science
Open-source software is a key element of many OS tools and services. These services support OS and contribute to its digital ecosystem, by enabling, supporting or streamlining the exchange and use of research information and shared data. Open science and open software share the same ethos and drive or openness of knowledge, information and opportunities and often support each other, they are substantially independent and have a different practical focus. Open source and permissive FOSS licences, in particular, are often used in open science, but they are not required to be used in open science. Although open-source software is popular due to its potential to be more easily applied and maintained in the scientific community, the actual settings and needed tools may differ and therefore include proprietary solutions. About half of the tools used in science are FOSS or at least free to use [SOSP-FR report], and even in social science, there is a similar number of paid and free tools available [https://group.sagepub.com/white-paper-archive/the-ecosystem-of-technologies-for-social-science-research ], which is something that would be easily presumed for natural and technical sciences.
Both FOSS and proprietary software used in science often use FOSS components. The concern about access to software and its licensing is initially related to the guidance on how to deposit [https://zenodo.org/record/1327310] and cite or reference [https://doi.org/10.7717/peerj-cs.86] research software. Software Heritage, also known as SWH and HAL [https://hal.archives-ouvertes.fr/] is one of the related supporting initiatives and services aiming to address software in open science. Still, when depositing, most researchers rely on GitHub, while some use Zenodo as a catch-all archiving service. Depositing practices are associated with the specification of software location and providing of licences and other metadata and citation that make research software findable and accessible.
Poor citation practices contribute to inadequate visibility and accessibility of research software. The used software tools and packages are not mentioned or insufficiently identified in academic papers, even when their names are unique enough. Often, researchers mention software they use in the methodology section or footnotes. This complicates finding the tool-related research and does not provide direct credit. But authors are often shy to go into such “technical details”. Even when they mention used software, they may be asked by reviewers to remove that part. Creators of some tools ask for a specific paper to be cited, which facilitates tool finding by prospect users and warrants credit to developers. However, this is not enough. Software references are still not standardised and refer to many kinds of sources, predominantly via URLs, which is a practice that is not persisting and interoperable in the long run. FORCE11 Software Citation Working Group defined the basic software citation principles [https://doi.org/10.7717/peerj-cs.86]
- Importance – Software should be considered a legitimate and citable product of research.
- Credit and attribution – Software citations should facilitate giving scholarly credit and legal attribution to all contributors to the software in a suitable way.
- Unique identification – A citation should include identification that is machine actionable, globally unique, and interoperable and is recognized by at least the discipline community.
- Persistence – Identifiers and metadata should persist beyond the lifespan of the software.
- Accessibility – Citations should facilitate access to the software and associated metadata, documentation and other materials necessary for informed use.
- Specificity – Citations should facilitate identification of, and access to, the used version of the software.
Starting from these principles, additional guidelines on how to cite software were developed [https://www.software.ac.uk/how-cite-software; Recognizing the value of software: a software citation guide].
It should be noted that the background, attitudes and motivations and goals of researchers and professional software developers differ. As both groups develop and operate research-related software, these differences should be accounted for.
Even the most popular licences significantly differ, although sometimes the licences that are popular in the OS world are applied for software. However, the most often used software licences are designed to support the goals of the open-source movement and were developed independently and with different objectives than those that are typically used in OS.
Typically, there is a lack of time, effort and knowledge that are necessary to address FAIR concerning software, and even IPR and licensing in general by software developers. A joint RDA/FORCE11/ReSA working group on FAIR for Research Software (FAIR4RS), established in 2020 reviewed and redefined FAIR guiding principles for software and related computational code-based research products and published its adaptation of the general FAIR principles for research software in March 2022. Although the minimal software metadata has been discussed, defined and collected by often-used registries for a long time, the first comprehensive guide for describing and cataloguing software materials was developed in 2020 and published in February 2022 by Software Preservation Network’s Metadata Working Group [https://www.softwarepreservationnetwork.org/wp-content/uploads/2022/01/Software_Metadata_Recommended_Format_Guide-1.pdf]
The association between FOSS and OS extends to broader guidelines on FOSS software and licence usage and governance, but also elaboration and establishment of strategic orientation in the management of tools and services. This relationship also includes shared governance: ensuring continual investment in software development, investment pay-off, control over software evolution, and its long-term usability, maintenance and sustainability. Therefore, the management of software as scientific assets is becoming a critical part of OS governance that should be fused with related practices coming from software engineering, where IPR governance, FOSS licensing and licence compatibility is very important and current subject.
Many existing links between software and various aspects of research and existing explain why software, access to source code provided by FOSS, need to access other software for reproducibility illustrate why the strengthening of the productive relationship between research and software communities is so important. As many international organisations and collaborations call for closer use of FOSS in research and supporting infrastructure, much more needs to be done in terms of recognition of software as fundamentally different from research data, and establishment of associated conventions, infrastructure, rules, evaluation and support. At the same time, the members of the software community who are participating in OS development, need to firmly adhere to its requirements and practices, which they, due to the desire to circumvent obstacles and a sense of entitlement, may try to evade.
Software in OS practices (SOSP-FR report)
The report “State of open science practices in France” [SOSP-FR, in French: Pratiques et usages des outils numériques dans les communautés scientifiques en France] from 2022 describes practices and the use of digital tools in scientific communities in France. It is based on a survey of 1089 researchers in various fields.
About two-thirds of respondents use free and open-source software (FOSS), and the same portion uses paid software. Other types of software are much less used. These include partially or completely free proprietary software (21%) and software created for research or resulting from research (17%). In particular, respondents who are 35 years old or younger are more inclined toward FOSS. Physics, mathematics and computer science, in particular, are resolutely oriented toward FOSS, with this inclination approaching 80% for the latter two. Literature, social, humanistic and life sciences are equally split in the use of FOSS and paid software, while chemistry, engineering sciences and medicine are more inclined to paid software (at around 55%).
Most used are general-purpose authoring tools (such as MS Excel and Word). Next are more technical languages and platforms commonly used in data science and analysis R, Python and MATLAB. The reason for the high frequency of general tools (word processing, spreadsheets, visualisation and presentation) is that they are used by all specialities. As the general public, researchers most often use common office applications. Similarly, the versatility and neutrality of analytical platforms are the reasons for their high ranking. The two most used tools for data analysis, Excel and R, symbolise the two major flavours of digital data analysis Excel is a long-established paid software for general use with a very strong reputation, while R is an open-source programming language and environment for statistical computing and graphics. Still, R is more recent, two less used and its audience is still limited to groups of (often younger) researchers.
The only well-ranked discipline-specific tool is a FOSS geographic information system QGIS. Next are, with almost equal use that jointly equals R: software developed by the researchers, in-house software of the organisation or laboratory and LibreOffice, another office package. This strongly indicates that researchers and organisations do not want to develop dedicated software unless they have to. It is also interesting that all locally developed software is much less mentioned than any of R, Python and MATLAB, which means that their use is not considered programming.
- MS Excel: 219
- MS Word: 143
- R: 112
- Python: 105
- Matlab: 80
- Qgis: 49
- Software designed by user: 35
- Libre Office: 33
- Internal software of the organisation: 33
- ImageJ: 32
- LateX: 27
- FileMaker: 27
- Origin: 26
- Photoshop: 26
- Lime survey: 26
- Powerpoint: 23
- Illustrator: 23
- SPSS: 22
- Arcgis: 19
- Access: 18
- C / C++: 18
- SAS: 16
- Zotero: 15
- Labview: 15
- NVivo: 14
- Sphinx: 14
- Stata: 14
- Prism: 13
- RStudio: 13
- Oxygen: 12
- GraphPad: 11
- Mathematica: 10
- Iramuteq: 10
- Chemdraw: 10
- Gimp: 10
- Inkscape: 10
Next are various often-needed task-specific tools that are can be used across communities. They serve for image processing, database management, preparation of publications, surveys, statistical analysis, data acquisition, and instrument automation and control. The most often mentioned open source ones are ImageJ, LaTeX, Lime Survey, Zotero, Sphynx, RStudio, Iramuteq (in French), GIMP and Inkscape. Proprietary tools are FileMaker, Origin, Photoshop, PowerPoint, Illustrator, SPSS, ArcGIS, Access, SAS, LabVIEW, NVivo, Stata, Prism & GraphPad, OXYGEN (by DEWETRON), Mathematica and ChemDraw. Among them are also C and C++, which is a far cry from their past glory. Most of these tools and platforms are decades old, so researchers stick to what they know to use and are satisfied with. Some of these tools may be parts of the shared culture and hard to replace. Zotero, used for the management of bibliographic data and related materials, is the first mentioned software that is directly related to the OS.
Almost no software tied to a particular scientific area is mentioned in the joint list. Such tools are too diverse and dispersed across communities. Still, some community-related patterns for the above-listed tools can be recognised:
- The physical sciences, mathematics, computer and engineering sciences are inclined to the tools they can adapt to their needs (MATLAB, Python, LaTeX, software designed by researchers)
- Humanities, arts and social sciences widely use software from two major publishers, Microsoft and Adobe; they also use QGIS and SPSS. Social sciences also use R.
- Biology and chemistry often use software specialised for image processing and graphical presentation. Life sciences also use R.
- Medicine has a low specificity of tools, with the most common use of graphic and statistical tools (ImageJ, SAS and GraphPad products such as Prism).
The authors of the study in their analysis also emphasize that:
- Python users often use also Linux (49%, compared to 22.5% in the respondent population) and prefer open and free software (91.2%, compared to 69.5% of all respondents). They are more often men (68.2%, compared to 51.9% of men among all respondents).
- Programmatic solutions and collaborative environments and tools are most used by younger researchers who work in small groups (2 to 5 members) and also typically use Linux.
- The use of FOSS software is related to its free availability, especially for early researchers and those without sufficient funding (the SOSP-FR report highlights this factor for humanities, arts and social sciences, but it is generally applicable). This may be a stronger driver for FOSS than its use in open science.
- Not partaking in the use of digital tools popular in the wider community locks researchers into a proprietary environment; they may not be aware of this before they try to use new software available only for an operating system they do not have practice with. Linux is popular, but software vendors should not disregard other systems.
- The use of open science tools and information or open-source software does not mean that a person is deliberately partaking in either movement. The practitioners are not necessarily fully aware that the practices or rules they follow belong to open science. The use of FOSS tools is both opportunistic and due to open-source philosophy but is unclear which of those two factors is more significant.
Based on differences between age groups, the authors of the report hypothesize that dissemination and use of research tools driven by teaching influence later digital research practices. Software like GitLab, programming languages and free software like R are present in university training as teachers tend to develop training on freely available software because of the limited availability of licences in the educational context. Thus open-source tools become default research environments for future young researchers. Such tools, as well as collaborative environments and executable notebooks (also known as computable documents; as those provided by Jupyter Notebook/JupyterLab), may therefore soon even more strongly influence the practices of publication and communication of research results. These tools are also made available to scientific communities via research infrastructures and online services. It would therefore be beneficial to better map accessibility and training needs on research infrastructures and quantify them for those who are not participating in large research collaborations or at the start of their careers. The use of software tools and open-source software in particular as well as research infrastructures should be assessed independently from open science and related digital practices.
However, based on our survey less formal survey from December 2022, the use of FOSS in academic training is considered by researchers and developers to be less significant than factors such as availability without the need to pay, open source’s natural link to open science, and FOSS principles and development model. it is, in fact, in the second group of motivators that include avoiding bureaucratic obstacles associated with procurement, use by the community and maintainability.
Finding research software
A good list of research software registries, classified by scientific domains and other criteria is available at [https://github.com/NLeSC/awesome-research-software-registries]. It is covering:
- Astrophysics
- Computational Fluid Dynamics
- Grid Computing middleware
- Earth Sciences
- Humanities
- Life Sciences / Biology / Medical
- Mathematics
- Machine Learning
- Nano Technology
- Social and Ecological Sciences
- Generic tools
- Registries by country
- Registries by organization
- Registries by programming language
On the other hand, when it comes to general tools for scientific publishing, an analysis and catalogue of open-source publishing tools and platforms are available at [https://mindthegap.pubpub.org/]. A white paper produced by the OPERAS Special Interest Group on Tools Research and Development for Scholarly Communication is available at [https://doi.org/10.5281/zenodo.5654319].
Tools for social sciences
This document covers all scientific domains. But it can illustrate the use of FOSS in social sciences, as one domain where this software is less frequently used.
The Directory for Social Sciences summary [https://forrt.org/educators-corner/003-developing-tools/] and related white paper [https://static1.squarespace.com/static/5d5ad9e0100bdf0001af0f5e/t/5ed0ea0631c1a80efe375fe5/1590749710566/The+Ecosystem+of+Technologies+for+Social+Science+Research.pdf] describe and list many tools used and for social science research and the related trends. For example, organizations are coming together in consortia to support the development and sustainable management of these tools. The number of research tools available has grown rapidly since 2004, from about 50 to more than 400 at the time of writing the white paper in 2019. This is probably due to researchers’ adoption of digital tools and software development skills and advances in computer science and its accessibility. The number of paid and free tools is similar, but the number of free tools is growing at a slightly faster pace. This may be due to the adoption of open source and a greater number of individuals developing their tools. This directory currently contains about 600 software packages/tools at [https://github.com/sagepublishing/sage_tools_social_science/blob/master/data/master_tools_current.csv].
Surveying and sourcing participants
- REDCap, https://www.project-redcap.org/
- Bristol Online Surveys
- SmartSurvey Academic
There are also many free and paid online platforms.
Annotating, labelling, and coding text
Open-source text annotation tools are
- GATE, https://gate.ac.uk/
- Open Calais (running on Drupal), https://www.drupal.org/project/opencalais
- Brat, https://brat.nlplab.org/
- TAMS Analyze, https://tamsys.sourceforge.io/
- Gephi, visualization and exploration software for graphs and networks, https://gephi.org/
- RapidMiner, https://rapidminer.com/
Social media research
Some free and paid tools are listed at [https://socialmediatools.pory.app/].
Most of these tools work with Twitter. Facebook and Instagram have more active users, but Twitter provides an API that makes its data much more accessible than other platforms. Furthermore, LinkedIn and Facebook even have a policy against using their API for research purposes. Still, there are research tools that provide access to content from several social media platforms. Facebook’s reputational problems pushed it to set up Social Science One non-profit partnership and provide selected researchers with grants-based access to their data. Similarly, LinkedIn set up the Economic Graph Research Program.
More than half of the social media tools are either free (in a form of applications or as freely available packages on GitHub, typically as open source). These tools offer analysis, data collection, monitoring, network visualisation, platform management, sentiment analysis, text analysis and visualisation. Only a few tools have limited free functionality.
Free tools that can access several social media platforms:
- NodeXL, Social Media Research Foundation
- SMaPP Toolkit, New York University
- Vader, MIT
- Social Feed Manager, George Washington University Libraries
- Webometric Analyst, University of Wolverhampton
- SOCRATES, NSF
- Just Twitter
A very popular commercial tool is NVivo. Some Twitter-specific tools are:
- academictwitteR, The University of Edinburgh
- DocNow- Shift Design, University of Maryland, University of Virginia
- rtweet, University of Missouri
Strategic recommendations for adoption of FOSS in science
- Start with task-oriented OS tools for which the habits are not very strong while they are available for various platforms, and see whether their growing adoptions will be followed by the use of tools such as LibreOffice, which are currently facing entrenched proprietary packages.
- Initiatives oriented to women in science should also popularise FOSS. They are less inclined to use it, so the potential gains may be higher.
- Promote the use of Linux, as its regular use is closely correlated to the orientation toward open software and the use of associated collaborative tools.
- As the primary motivations behind the acceptance of OS and the use of FOSS vary, it is best to promote them by emphasizing both their practical benefits and deeper justification. For some researchers, one may work better than the other, while for those who are influenced by both the synergetic effect may be multiplicative.
FOSS licences
This and the next two sections are based on the work that the author of this report conducted within the GN4 Phase 3 project funded by the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 856726 (GN4-3).
The topics of software licenses, their compatibility and selection are complex and are therefore often neglected by even professional software developers, not to mention those who develop software to support their primary work, such as research.
- Our work is increasingly dependent on Free and Open-Source Software (FOSS) which always comes with a specific FOSS licence.
- FOSS licences help in keeping FOSS software alive.
- Licence compliance is important for legal reasons and to ensure better collaboration.
- Licensing considerations are far more significant when there is a distribution of software, as FOSS licences come with specific conditions.
Trust in research relies on the peer review system, which assumes the ability to reproduce the described experiment or analysis. As the software is often a fundamental part of this work, its validity cannot be confirmed without the opportunity to access and use that software. Furthermore, as part of the analytical process is codified in the software, the trustworthiness of the analysis and conclusions may depend on the ability to scrutinise and reuse not only the input data but also the used software and its internal logic. This does not mean that the source code must be reviewed by the reviewers – it is usually sufficient to make it available for an independent audit, which, in the case of often-used software tools with a large community of users and maintainers, is a side effect of joint software development, use and maintenance.
Furthermore, as stated in [https://doi.org/10.1515/itit-2019-0040], lack of access to the underlying software makes it significantly more difficult to build new results on top of the existing research, but open-source software practices and increasing adoption of this software have enabled the more general open science agenda. Open Science principles affect the research life-cycle, in the way science is performed, and its results – including software – published, assessed, discovered, and monitored.
The frequency of FOSS licences usage is tracked in the Black Duck Top List [https://www.synopsys.com/blogs/software-security/top-open-source-licenses/]
Rank | Licence | Usage | Licensing Risk |
1 | MIT | 32% | Low |
2 | GPL 2.0 | 18% | High |
3 | Apache 2.0 | 14% | Low |
4 | GPL 3.0 | 7% | High |
5 | BSD 2 | 6% | Low |
6 | ISC | 5% | Low |
7 | Artistic (Perl) | 4% | Medium |
8 | LGPL 2.1 | 4% | High |
9 | LGPL 3.0 | 2% | High |
10 | Eclipse (EPL) | 1% | Medium |
Benefits of FOSS
The market is supposed to even out and equalise the total costs of ownership of equivalent software products. Still, vendors of proprietary software often claim that after some time the use of their products will be simpler and less costly than with FOSS. Some organisations do not want to invest in expertise and costs they often associate with the use and maintenance of FOSS, even when the needed effort is neglectable. They often may want to have an external organisation they can hold accountable for software risks and problems. But the experience has shown that it is very hard to hold a vendor of commercial software liable for it unless it has been tailor-made for a single customer and very strict contractual obligations were put in place.
On the other hand, the characteristics of FOSS make it a very attractive option for many users and organisations. Also, many commercial developers of open-source software often provide additional services as part of their business model. There are even third-party companies which provide support for open-source software.
The open-source model differs greatly from the one used operation of traditional licensed proprietary and commercial software by permitting both distribution and modification. The first limitation is also eliminated in freeware (free of charge) and shareware (free trial) software, at least for its binary form, but the possibility to modify the source code is crucial and has a huge impact, which has brought several substantial improvements over traditional proprietary licensing models.
Availability and lower costs – For highly advanced or specialised needs such as those related to research, new technologies and high load, the market is not large enough for a suitable commercial product to be available and sustained. The immediate and cost-free availability of open source is also associated with the need to develop internal expertise, which is often something that organisations and individuals want to do anyway or hire external support and maintenance. Many users – advanced ones in particular – want to be able to customise and adapt the software and are often happy to debug and maintain it. In research and education, the obligation to pay for software, overheads related or procurement and renewal of commercial licences, and related commitments are barriers that cannot be overcome by students, early researchers or small research groups. Access to the source code may also reduce their learning curve and allow them to be more effective with the technology. Furthermore, the cost of highly specialised or tailor-made software is often unacceptable even for large organisations, especially if a working solution can be achieved by combining and customising even several freely available building blocks. These factors have led to the wide adoption of open source in more advanced and specialised communities, the emergence of entirely new usage scenarios, and, along with the commodification and componentisation of resources and runtime environments, the introduction of large toolsets or architectures with many components, such as those based on microservices.
Innovation and flexibility – The easy access to the source code enables developers to add the features they need and incorporate novel ideas, algorithms and usage scenarios. They are not constrained by the organisational, commercial or strategic limitations of vendors of proprietary software, where the users would have to wait for the vendor to decide to provide and implement what they need. The more useful a new feature or solution is, the more developers and companies will join and contribute. The more programmers contribute to it, the better, more useful and more valuable the result is going to be. This has been proven to work even if the contributors are not required to share their derivative works. Furthermore, people who inherently need a specific improvement are more driven to provide a creative and innovative solution than those who are just paid to produce something.
Security and reliability through transparency – When the source code is available, many people can scrutinise or debug it. The security and functional flaws are more likely to be spotted and suboptimal solutions or vulnerabilities fixed. Contributors are aware that other experts will be looking at and reviewing their code, and will therefore tend to stick to higher standards of quality. Every new reviewer can see what has been done before and what went wrong with it. If they try to fix it, they may be more incentivised to come up with a better solution than someone who had to adapt to strict deadlines and other limitations or priorities. Also, someone in the right context due to their combination of prior knowledge, experience or earlier dealing with a similar problem is often in a better position to detect and address an issue or bug than the original author. Many open-source projects have dedicated reviewers who check modifications before they are included in the main codebase. They are not testers, but experts who care about the quality of the code and can work at their own pace. All these factors greatly improve software security and reliability, and this is the main reason why the great majority of services on the internet are nowadays running on Linux. With open source, when a vulnerability is identified in an active open-source project, the interested community members usually swarm in and quickly patch it. However, only those who keep up with the latest recommended versions, including the libraries they rely on, will benefit from this dynamic activity.
Longevity – Commercially licensed software can be retired by its vendor who no longer supports it. The vendor may go out of business without selling its software to another company. In such situations, there is no way to update abandoned software, fix bugs, or adapt it to new uses or platforms. Support, patches and other related services are not available anymore. Therefore, its usability can deteriorate quite rapidly and the user must decide when to invest in new software and migration to it instead of living with the growing problems. In contrast, open-source software is free to evolve continuously as anyone can access the source code and contribute to it. Even after it has been abandoned for some time, anyone can revive, adapt, fix or repurpose it. Many useful and widely used FOSS have wide, active and stable user communities, the members of which include individuals and research groups, but also large and small companies.
Types of FOSS licences
Open source licences are licences that allow the software to be freely used, modified, and shared [https://opensource.org/licenses]. Although many "Free software" and "open-source software" licences are recognised by Free Software Foundation (FSF) and approved by Open Source Initiative (OSI), there are just a few licences that are popular, widely used, or have strong communities. However, a single software project may include several components, which include transitively other components, which may result in the presence of many software licences. To be approved as open by OSI, software licences must meet some criteria [https://opensource.org/osd]. But even open licences differ in terms of the rights and limitations they include, which are often on derivative works, such as modification of the original code or its use from the new code. Based on the scope of the code they apply to, there are two primary groups of FOSS licences – permissive and copyleft ones, which differ in whether modifications or the code using the licensed code must be released under the same licence or a similar licence may be used. The applicability of a licence may be limited to modifications of existing files, additions or even any uses of a software library. However, all FOSS licences to some extent require disclosure of some existing and new code to external users or when distributing the software but they do not require this for private use. Most licences require the licence text as well as the copyright notice to be included with the licensed material. Some also require documenting the changes made to the licensed material. It should be also noted that a few licences also consider access over the network as the use that implies the right to receive the source code.
The open-source rules are designed so that those who receive copies of the software must themselves be able to redistribute the original and make derivative works from the original, at the same time allowing others to do the same. Some licences prevent open-source code from “getting closed” and require that users and contributors to the code propagate put up with open-source values by sharing their modifications or additions (derivative works) on the same terms as the original. This means that those who receive copies of these works must be able to redistribute the original and make derivatives ad infinitum.
As opposed to Creative Commons CC-ND and CC-NC licences, an open-source software licence must allow modifications or commercial uses to be considered truly open. If the licence prohibits the licensed material and derivatives from being used for commercial or (for example) military purposes, is considered to not be a free software licence since it limits who can use a program or for what. The OSI does not allow discrimination against persons, groups, or fields of endeavour in using the software, so it can be used for any purpose, including any business.
Pieces of software may modify or extend fragments of already existing software, which is similar to the creation of “derivatives” in CC licences. To preserve the integrity and ensure maintenance of the original work, open-source licences often require derivative works to be distributed on the same terms under which the licensee was permitted access to the original work, such as the source code that was used (incorporated, copied or potentially modified) and the use of the resulting components, such as software libraries. While CC-ND licences condition sharing and reuse as long as the content is left unchanged, a piece of software may other software may use other software in many ways which do not require any modification or even actual inclusion of the used software. A piece of software may depend on other software by relying on its definitions, contained specifications, and interfaces, or by invoking it through dynamic or static linking, network communication and different types of interfaces and connectors. Therefore, software licences differ from those used for other types of works as they are focused on different ways software may be used, and, when a specific type of use occurs, how it affects the licensing of other software that is using it, and the scope of that impact. These terms include all conditions and obligations defined by the licence of used software. If demands are low, the licence is called permissive, as opposed to restrictive, or, more correctly, copyleft. If the scope is narrow, it does not extend to all extensions of the used work or the entire software that uses the licensed material. Instead, it is limited to requiring the availability of modifications of the original work or the changed existing files. The term that is used in this case is 'weak copyleft'. If the scope is broad, the licence of the used component and related conditions must be applied to all software that uses it. This is called 'strong copyleft'.
Public domain licences offer the most permissive model. Anyone can modify and use the software without any restrictions. But even if a component is free and comes without any legal strings attached, one should always make sure it is secure before adding it to the codebase.
Permissive licences contain minimal requirements about how the software can be modified or redistributed. Users do not have to republish any changes they make and typically only have to give attribution to the original authors. They provide a disclaimer and, often, require describing changes. This type of licence is used by almost two-thirds of open-source software in circulation [https://www.synopsys.com/blogs/software-security/top-open-source-licenses/]. Permissive licences are popular due to the flexibility they offer to those who use such licensed software and low IPR risk. These licences include the MIT (the most popular one, short and simple), Apache 2.0 (requires notice of changes, grants licence to patents unless litigating and mentions preservation of trademark rights), BSD (some versions require including the disclaimer) and ISC (along with its OpenBSD variant is a further simplification of MIT and BSD). Artistic License (used for Perl and in several variants of versions 1.0 and 2.0) is permissive, but it also includes compensation for damage.
Copyleft licences, also known as reciprocal licences or restrictive, protective, and even viral licences, allow modification of the code and distribution of new works based on it, as long as the requirements for redistribution under the same conditions are met. The intent is to ensure that rights that the user or modifier has benefited from are preserved in derivative works by disallowing the contributors to appropriate their changes and come in an asymmetric position to upstream contributors. This typically means that anyone who changes the code also has to release their modifications under the same licence. Copyleft licenses are often considered to be riskier in commercial settings, as they can limit the potential business value or threaten the secrecy of intellectual property. All copyleft licences are used by more than one-third of open-source software.
Weak copyleft licences have a library or file scope. Examples are the LGPL (Lesser GNU General Public License; 2.1 cleans text of 2.0 and allows dynamic linking without enforcing copyleft; 3.0 grants use of patents, it is not compatible with LGPL 2.0 but is with Apache 2.0 and the end-user must be able to install a modified version – it prohibits closed devices, DRM or hardware encryption or patents retaliation), EPL (Eclipse Public License 1.0 and 2.0), MPL (Mozilla Public License 1.0, 1.1 and 2.0 – it is simple, allows static linking and licence variants with additional terms), Ms-PL (Microsoft Public License), Ms-RL (Microsoft Reciprocal License), and CDDL (Common Development and Distribution License 1.0 and 1.1) require releasing the modified code only, thus allowing the use of open- source libraries in proprietary software. MPL, Ms-RL and CDDL require this only for the modifications of existing files. Libraries under LGPL, EPL and Ms-RL allow proprietary licences for the code that is using them, but the original licence also extends to new files in a modified library.
On the other hand, strong copyleft licences often require releasing the entire project or product under the licence that is the same or similar to the one of the used work. Among the copyleft software, the use of strong copyleft licences significantly prevails. These licences intend to keep everyone on the same page and disallow ‘free ride’ which is still possible with permissive and weak copyleft licenses. By introducing these restrictions, the creators of strong copyleft licences wanted to expand the presence of open-source software, ensure the sustainability of the open-source software ecosystem and strengthen the open-source-software movement. The most common and widely used licence is the GPL (GNU General Public License; 2.0 is more often used; 3.0 grants the use of patents, it is compatible with Apache 2.0 and the end-user must be able to install modified software).
AGPL (Affero General Public License 3.0) is similar to GPL, but it is also network protective. Use over a network is considered a distribution thus requiring modified code to be available to external users. It is becoming increasingly popular, as it closes the ‘ASP/SaaS loophole’ of GPL, which allow the software under GPL to be exploited without disclosure, as SaaS software by its nature is not distributed to users. AGPL is, as stated in its preamble, “specifically designed to ensure cooperation with the community in the case of network server software”.
Source-available and ‘fauxpen’ licences
There are also non-FOSS restrictive licences which are often presented or perceived as being similar to FOSS, but which introduce limitations that prevent them from being open-source according to the Open Source Initiative (OSI) and free to the Free Software Foundation. Source available licenses (or shared source licences) are proprietary licenses that allow for source code to be viewed and only in some cases modified and redistributed. They make the code available for viewing to facilitate scenarios such as inspection, understanding of functioning, debugging, integration or testing of external components. Examples of such restrictive licences are Business Source License (BSL), Microsoft Limited Public License (Ms-LPL), Microsoft Limited Reciprocal License (Ms-LRL), and Microsoft Reference Source License (Ms-RSL). Some of those only grant rights to developers of Microsoft Windows-based software, while Ms-RSL allows viewing the source code for reference and debugging.
The user has no rights to use, share, modify or even compile the code. Having just access to source code is not the point of FOSS, but full freedom to use it, including for commercial or disreputable purposes, as long as the same freedom is preserved for those who use or even possibly pay for the code in question.
‘Fauxpen’ licences are similar to source-available licences. They are presented as open, but under closer scrutiny, it becomes clear that licensed software or product is effectively under the strict control of its vendor. These hybrid licences are intentionally deceptive and confusing. Server Side Public License (SSPL) is a strong copyleft source available license that requires public release of the source code of service management layers when providing a service. This restricts cloud providers from offering SSPL-licensed software to third parties as a service as this requires them to release under the SSPL the entirety of their source code, APIs and other software required for a user to run an instance of their service. SSPL also makes it impossible to use the Linux kernel, which is under the incompatible GPLv2-only licence. SSPL is therefore discriminatory toward a specific field of use. ELv2 (Elastic License v2) is a non-copyleft license prohibiting providing the products to others as a managed service, circumventing the license key functionality or removing or disabling features protected by license keys.
Open Source’s rule 6 (“no discrimination against fields of endeavour”) and FSF’s freedom zero: “the freedom to run the program as you wish, for any purpose” clearly indicate that 'fauxpen' and source-available licences are not FOSS. Providers of software who switch to such licences effectively transition projects that started as open source into proprietary licences and admit that their business models are inconsistent with open source. They claim that they want to protect their work from unfair exploitation by cloud providers and other free riders who would use their software without backing its creation and maintenance. At the same time, they appropriate the contribution of external developers who have donated their time and energy by adding to projects while they still were open source. Also, these companies most often use code from other open-source projects to power their businesses.
When a provider is switching to a proprietary commercial license, it can choose the time, terms and cost. The future costs of software and even its future availability are unknowable, as with any other proprietary software. When previously open software is embedded in or changed into a proprietary product, its users have to agree to the terms of a proprietary license, be left with an unmaintained version, or fork the last open version of the software and carry the associated burden of maintenance forever.
Products subjected to this bait-and-switch move became popular because they had been marketed as being FOSS, as developers prefer to be able to control whatever runs in their programs, and fix it or to have other people fix it, although they are not affiliated with the original maker of the tool or component. Also, such platforms gain traction as they typically provide free and one-stop solutions, where expensive licenses for commercial alternatives tend to add up, and open-source substitutes are less integrated. Even the developers in big enterprises prefer using FOSS to going through the slow, bureaucratic and multi-layered approval and procurement procedures.
The only advantage of the prior FOSS status is the possibility to fork the prior version, but even this window of opportunity may effectively close after some time if the community sticks with the vendor and its proprietary changes. Forks are also hard due to the resources needed and the necessary switch in branding, whereas people do not switch easily from one brand to another.
All this implies that access to updates of software under permissive licences and those with the 'sublicense' option may be volatile in the long run if the software is controlled by a single entity, as demonstrated in the cases of Elasticsearch and MongoDB. This why it is so important to carefully choose software that is guaranteed or at least most likely to remain FOSS.
Copyrights, patents and warranties
A copyright is a form of intellectual property which allows a legal creator of an original work to license that work to the extent governed by copyright law. Declaring copyright on some work does not require any registration or official notice but just to clearly and visibly declare the copyright and define its subject. The copyright can be also easily transferred to another subject, typically through a contract or statement. Since open-software licences by definition already open up the work to anyone under clear conditions, copyrights as such are not a problem, but the actual details of these conditions. The licensing concern related to copyright is whether the licence states it and how the text of the license and copyright notice (with the original copyright and attribution marks) should be included with the licensed material and presented. For example, the inclusion of the copyright may be required only for the source code or may also include binaries.
Patents are a much more complex form of intellectual property. An organization or individual who invented something substantial, novel and useful proves this in a regulated, expensive and time-consuming patent registration procedure. If this process is successful, a patent owner is granted the right that excludes others from making, using, selling, offering or making available the patented invention for the predefined period such as 20 years, during which the patent may be subject to maintenance fees. A patent may provide the holder with the associated royalties in terms of monetary compensation for using it, while its infringements are internationally enforceable and punishable in courts. Patent owners try to extend the boundaries of their patent and, at the same time, scan for infringements to maximize the royalties and penalties and thus recover the costs of patent registration or purchasing and maintenance, scanning and litigation. Therefore, there is always a risk of possible and even unknowing infringement of a patent by the licensor of software and, subsequently, their licensees.
Some licences describe the handling of potentially applicable patents and royalties, which removes at least some of the patents-related uncertainty. A licence may state that it does not grant any rights to the contributors’ patents. Or it may explicitly grant contributors’ patent rights. Both models eliminate some uncertainties, but they do not resolve the patenting issue. The latter approach is an attempt to prevent the appropriation of innovation and software through patents. But no software license would protect a licensee against a claim of infringement brought by a third-party patent holder, as licensors can only license works that belong to them. Since software patents are often too vague, abstract and ambiguous, they may be easily weaponised; they may protect even concepts or methods of interacting with a system. Neither the licensor nor the licensee may be aware of such a patent, so a patent troll or a competitor with an applicable patent may appear at almost any moment.
Even if a patent holder has licensed that patent for use in open-source software or the applied FOSS licence waives any patents-related obligations, that patent may be later narrowed or cancelled through litigation by a holder of a rival patent. If this happens, even the software licensee who has fully complied with the terms of the original license and the licensor’s patent may become liable for infringement of a competing patent if it continues to use the affected software. Since the patent narrowing or cancellation would affect not only the holder of the original patent but also its licensees, licensees may want to participate in protecting the licensor’s patent. As this can become expensive, licensees may embark on such an endeavour only if their business would be seriously affected by the requirements of the competing patent holder.
Other constraints and rights
Most licences require the licence text as well as the copyright notice to be included with the licensed material. Some also require documenting the changes made to the licensed material.
Some licences:
- Describe circumstances in which the source code must be made available
- State whether the changes must be documented
- Describe the allowance or prohibition of using contributors' names, trademarks or logos
- Declare whether they include a limitation of liability. Some clearly state that there is no warranty and that the software creator cannot be charged for damages. They explicitly assert that they do not offer any warranties or guarantees for using the code so that the author cannot be held liable if the code does not work well in some usage.
- Are peculiar in what they consider software usage, or even constrain types of use (e.g., by prohibiting commercial, over-the-network or military usage), which disqualifies them from being considered real FOSS licences.
Contributor agreements
Copyleft licenses, in principle, prevent the code from being incorporated into or re-licenced to proprietary code. However, a licence change may still be possible with a loophole opened by contributor agreements. Terms that are typically used are Contributor License Agreement, Copyright Transfer Agreement or Copyright Assignment Agreement. These agreements are used by organisations that are guardians of software to own or use contributions. They may include a transfer of copyright. However, when these agreements include the transfer of unrestricted republishing rights (regardless of copyright transfer), allow distribution without restriction or explicitly permit relicensing and even sub-licensing, the contributed code can be relicensed at the discretion of the guardian.
Relicensing
Software relicensing is done for commercial reasons, or to improve licence compatibility. In the first case, the change is typically toward a proprietary licence that is often blurred as a ’fauxpen’ or source-available licence. The result of such relicensing is the elimination of some previous uses or users.
Relicensing is for better licence compatibility is conducted if the current licence is incompatible with those of other jointly used components so that a greater combined work could be licensed somehow.
Relicensing is possible:
- Due to prior use of a permissive FOSS licence or another licence that allows sublicensing;
- If it is allowed to the guardian organization by contributors through contributor or copyright agreements, by which they grant the organisation the right to sublicense or relicense, that is, redistribute the work under a different license;
- If it is decided by the owner of the proprietary code.
Adding an alternative licence is not relicensing, as the old licence remains fully valid for those who decide to stick to it. Multi-licensing is therefore a better way to improve licence compatibility than relicensing. Also, it does not require a prior permissive licence or contributor agreements signed by all contributors. “Or later” styled licences are a concisely expressed form of multi-licensing in which all subsequent versions of the mentioned licence are accepted in advance, including those which currently still do not exist.
Governance of FOSS licences
Use of FOSS licenses depending on project intent
For internal use
- One can use any FOSS and not worry about licences – they have their code and are not giving it to anyone, which is OK with all FOSS licences.
- The code is kept private, but internal use is very limited – the use of software may easily evolve into sharing or use in commercial contexts that directly involve other parties.
- What when the creators later decide to offer software to others? Without considering the licences of used components. They may end up with components with incompatible licences, unable to choose one for the product/project. Therefore is important to:
- Start early to consider licences and overall attitude towards FOSS licences.
- Learn about licences of used components and determine which licences are acceptable within the project.
- Determine the potential future licence if the way software is used is changed.
Sharing software with someone
- With permissive licences of components, the modifiers do not have to make any source code available.
- With copyleft components, access to some or all source code must be allowed.
- When sharing, the same or compatible licence for changed code or even the entire project must be used.
- With several strong copyleft components, the creators may not be able to pick up a licence that is compatible with all of them.
- Licence compatibility has become a major and very actual issue in the wider software community.
- One should think twice about the software under a permissive licence that is effectively controlled by a single entity, especially if the software may be used in a service. Some modifiers or their customers may therefore prefer copyleft so that would be protected from licence changes.
For a service, the provider is safe if using any FOSS except one under AGPL (but even that as well as long as it does not mind letting the users get its code or do not want to use it from cloud providers, which may be forbidden from offering a service based on such software. The same applies to ‘fauxpen’ licences such as SSPL or ELv2.
Licence impact on community, quality, longevity and sustainability
Projects often follow a natural cycle of creation, a burst of intense activity, a long phase of steady use and productivity, and fading as it is replaced by new projects covering the same space but with a more advanced technology base; this happens through the slow or fast migration of the community. Factors that affect software sustainability and longevity are often analysed [https://opensource.com/life/14/1/evaluate-sustainability-open-source-project, https://repositum.tuwien.at/handle/20.500.12708/2820]. The longer the project is alive, the more likely it will exist. The activity of the community (number of contributions and active contributors) and the quality of its core members are more significant than the size of the user base for the sustainability of the software. An analysis of the Ohloh data, now at [https://www.openhub.net/], about a large number of FOSS projects [https://redmonk.com/dberkholz/2013/04/22/the-size-of-open-source-communities-and-its-impact-upon-activity-licensing-and-hosting/] indicates that:
- The larger the project is, the more like it is to work out the licensing issues and specify a licence. The portion of projects without a specified licence decreases with the number of monthly committers – they start at about 50% for a single committer, decrease to 40% for five, and stabilise at about 20% for projects with more than contributors.
- Permissively licenced projects are evenly distributed regardless of their size. They start at 20% for up to 10 monthly contributors, peak at about 25% for 20-30 contributors, and then return to the baseline.
- The use of copyleft licences coincides with the size of the active community. It starts with about 20%, increases to about 35% after 10 committers, and ends up at about 40% for projects with many contributors.
The lack of a clear licence is an indication that the developers find licensing unimportant, confusing or too time-consuming for their purpose. Such projects do not tend to last long and establish a large community.
The utility of software is maximised if the widest possible set of users can appropriate its benefits. But FOSS, like many other parts of the digital infrastructure, suffers from a free-rider problem: “Resources are offered for free, and everybody (whether individual developer or large software company) uses them, so nobody is incentivised to contribute back, figuring that somebody else will step in.” [https://www.fordfoundation.org/work/learning/research-reports/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure/]. A free rider has a competitive advantage, since it did not have to invest in the original development, and can invest in developing additional benefits and services instead. While a free-rider does not exclude others from using the code, which is not an exclusive resource, it may exhaust the original creator’s access to users. Users turn into customers; customers are an exhaustible common resource as they tend to stick to one provider. Customers contribute to provider income in various ways defined by its business model. Although most people perceive free riding as deeply unfair, it is still better to have someone using the creator’s open-source software than somebody else’s. The presence of a free-rider makes it more likely that others will also use this software and some of them will contribute back. Therefore, software free-riders, including competitors who capitalise on others’ work may have a positive overall effect as they act as mediators towards other contributors and customers. A large user community brings the contributors and paying customers and even brings the sponsors who otherwise would not show up. Still, for this to happen it is necessary to prevent the competing free-riders from suffocating the primary contributors. This means that the original contributor’s offering (beyond just software) must be made somewhat exclusive to incentivise users and customers to join.
- Permissively licences software may start small, stay that way, or increase in terms of activity, but seem to be somewhat limited by the optionality of returning to the community by those who modify it.
- Weak copyleft licences are suitable for libraries and other components the popularity and utility of which would be significantly affected by expansive licensing rules of strong copyleft licences.
- Strong copyleft licences are suitable for large or standalone projects such as operating systems and specialised or productivity tools.
There is a growing number of companies whose business model is based on FOSS. This is model is called commercial open-source software. Their commercial offerings usually take the form of proprietary or closed-source IP, which may include a combination of premium features and hosted services that offer performance, scalability, availability, productivity, and security assurances. This is known as the ‘open core business model’. Some of them also offer professional services, including maintenance and support assurances.
The obligation to keep all modifications under the same or compatible copyleft licence works exceptionally well in projects such as the Linux kernel. This is particularly the case when the licence does not preclude the use of software to run other software under other types of licences. Therefore, the use of a copyleft licence may be a great benefit to the software, especially if it does not reduce its use in normal usage scenarios. This is why weak copyleft licences such as LGPL were designed for and they are applied when it is more important to enlarge the number of contributors (as with research software) than to boost its popularity by maximally liberal terms of use or to keep the competitive advantage by keeping the code proprietary. Furthermore, a combination of (often copyleft) open-source with additional proprietary add-on components or services on top is an often applied approach that balances openness with sustainability.
If a large enough member of the community has a sufficient influence on the platform, it may decide to fork it under a ‘fauxpen’ proprietary licence that significantly constrains its use, at the same redirecting most of the current users to the fork and taking full control over the new developments in it. The inadequacy of the original business model or the appearance of competing offerors is the reason for some makers of open source products, which have been seen as its custodians by the communities, to make this move. Of course, this is possible only with permissive licences that typically allow appropriation through relicensing. The appropriator does not even have to be the primary contributor to software, but the one most users refer to, for example by providing support or popular commercial add-ons. A similar outcome may be caused by the extensive use of software as a part of a cloud offering, where the cloud provider effectively distributes and monetises open-source software without meaningfully contributing back to it or by providing proprietary add-ons, which are typically limited to facilitating access to the platform within its cloud offering. The original ‘open core business model’ provider may then move to a network protective or ‘fauxpen’ licence. However, some projects with permissive licences, such as the Apache web server, have an extremely long lifespan and a huge community.
It is not possible to empirically determine whether software longevity benefits more from copyleft or permissive licences, as this more depends on other circumstances. The choice of a licence supporting sustainability and longevity of software primarily depends on the attitude of the developers and community, as well as the primary usage scenarios. Of course, such a choice may not be available at all due to the requirements imposed by the organisation, funder or dependencies. Interestingly, when this choice is available, it may be more dependent on intrinsic motivations and view about fairness than extrinsic motivations such as the expectation of reputation or economic gain [https://opensource.com/law/13/8/motivation-free-software-licensing]. On the other hand, if the developers invest in interoperability and open standards, this may greatly help project adoption regardless of the licence.
Multi-licensing under permissive and copyleft or copyleft and proprietary terms is also a viable solution, as it increases licensing compatibility when the software when is to be combined with other components into new products. At the same time, it allows for a larger user base and, at least to some extent, stimulates future contributions.
The user base is also not to be neglected, and it greatly contributes to the sustainability of permissive FOSS. Users drive the functionality, identify the bugs, and shape the direction of a project to meet their needs. This may result in slick products that ‘just work’ without much configuration and customisation, as long as the target audience is large enough and there are other factors that contribute to the product ecosystem. Still, it is often very hard to determine the size and engagement of the user community. What is often much easier to assess (when choosing), but hard to incite (when developing), is the wider ecosystem around a project established by the engagement of other providers which may offer support, consultancy, customisation, hosting, or bundling with their products or services.
Besides often emphasised doubts about the FOSS business models, some authors [Lanier, Jaron, You Are Not a Gadget: A Manifesto] dispute open source and open content expropriation of intellectual production as a form of "Digital Maoism" which stifles small-scale entrepreneurship and destroyed opportunities for the middle class to finance content creation, resulting in the concentration of wealth in a few corporations and individuals, who insert themselves as content and service concentrators. However, instead of FOSS, this criticism should be rather directed to the centralisation of distribution and advertising platforms and the model of “free services” paid for through reselling of personal data, user profiles and targeted marketing. The large concentrators depend on FOSS like anyone else, but their core components are always proprietary. On the other hand, big tech companies frequently create or appropriate FOSS platforms and tools for consumers and developers to tie the customers to their ecosystems and technologies. Typical examples are some very popular application development tools, run-time environments, non-SQL data storage and processing platforms and AI platforms which are typically conveniently tied to companies’ cloud offerings. Therefore, whenever a big tech company offers a sleek FOSS component or platform, developers should think twice if they want to become ‘products’ again and be recruited into the company’s camp.
Despite FOSS’s success, scaling and sustaining open-source projects remain challenging.
Licence selection and attaining compliance
Copyleft licences ensure licensing stability, while permissive software can be forked and relicensed by a major contributor or a company providing popular free or commercial services or products based on this software. Such an organisation can also strongly influence software evolution and usage patterns.
- Available options may be mandated or recommended by the institution, project management or funder.
- The constraints of other involved parties and coauthors must be respected.
- The constraints imposed by original authors and licences of dependencies must be respected.
- There may be some typical and established software licensing practices of the community.
Personal preferences and attitudes of software authors, who should also consider desirable public messages and non-mandating institutional, project-level or funder preferences on software licensing and open source.
The choice is typically quite simple. The existing constraints most often mandate the type of licence. If these institutional or other policies prohibit the use of copyleft licences, this also means that the software must not use components under such licences. But if this is allowed and such components are needed and useful, then a compatible copyleft licence is to be used.
The opportunity for a relatively free choice exists in a situation where all important used components come with either permissive or weak copyleft licences. If components with weak copyleft licences are modified, these modifications must retain the original or use a compatible licence.
Software licence management steps
Gather and document information
- Note the licence of the ‘product’ (entire bundle of created components) or ‘project’ (one program or stand-alone component), if set
- Create an open-source inventory of used components
- Detect vulnerable open-source components (to remove or replace)
- Identify outdated open-source libraries (to replace)
- Identify licences of used components (in-licences)
- Clarify ambiguities or doubts, such as those on the use or modification of libraries
- A tool may not be able to properly identify a licence – in Mend, some are suspect or ambiguous
- Information about the applied licence may be false, unclear or contradictory
- Some licences may be recognised under several names
- Some (permissive) licences (BSD, Artistic …) have unnumbered variants or are sometimes edited by authors
- Applicability of ‘or later’ licences may be unclear or even edited in the licence text
- Document gathered information – Mend does the above through reports, UI and data exports
- Document your decisions – some may be refined during remediation
Document
Remediate
- Choose a product/project licence (out-licence) compatible with key dependencies
- Initial improvements
- Remedy vulnerable open-source components
- Update outdated open-source libraries (where possible)
- Ask component authors to clarify their licence or to relicense
- Pay for the required proprietarily licensed software
- Choose among dual licences of components
- Identify remaining incompatible licences
- Decide what to do with components that use these licences
- Remove (component and corresponding functionality) if not necessary
- Replace with an existing equivalent
- Move to server-side (central service)
- Write your replacement
- Enforce open source licence compliance (e.g., provide all required compliance artifacts)
- Accept some risks
Create compliance artifacts (to ensure compliance)
- As required by the applied policy
Software composition analysis – inventory tools
Ideally, compliance should be continuously monitored as a part of the build process.
Commercial services
- Mend
- Black Duck
- JFrog Xray (add-on for Artifactory)
- GitLab Ultimate licence compliance feature, https://docs.gitlab.com/ee/user/compliance/license_compliance/
FOSS solutions
- FOSSology, https://www.fossology.org/
- QMSTR (Quartermaster), toolchain and reports – it was stalled, now back to progress, https://qmstr.org/
- Scancode-Toolkit, https://github.com/nexB/scancode-toolkit
- License Compliance Verifier (LCV), Demonstrator based on a subset of the compatibility rules from the Open Source Automation Development Lab (OSADL) matrix, https://github.com/fasten-project/fasten/wiki/License-compliance
Licence selection tools
- Choose an open-source license, https://choosealicense.com/
- Joinup Licensing Assistant – Find and compare software licenses, https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-find-and-compare-software-licenses
- Creative Commons (CC) licence chooser, https://creativecommons.org/choose/, https://chooser-beta.creativecommons.org/
Sustainability of FOSS in science
Often, overwhelmed by the range of software tools available, unsure of their quality and often questioning the quality of implementation performed computation, researchers resort to developing tools tailored for their specific use cases. This leads to a large number of tools and packages, which have limited support, short lifetimes and a small number of users. Often, the developers simply do not want to look elsewhere as they are paid for programming, may justify it, or think that their case is singular or very special. Making software open-source alleviates, but does not resolve this problem.
Also, despite FOSS’s success, scaling and sustaining open-source projects remain challenging. Sometimes, researchers or developers manage to maintain their tools as a side project ( and sometimes build an entire community while keeping the tool free and open source. An example of this is Gephi. Also, small open-source communities can rely on volunteers and self-governance.
It is more interesting to look at large open-source projects to see how individuals and companies who make living out of them can be financially sustainable. Increasingly, researchers are finding ways of developing tools that are both open source and capable of making revenue:
- The organisations can be funded through institutional membership and fundraising. Lyrasis, which homes ArchivesSpace, CollectionSpace, DSpace, Fedora and VIVO open source platforms, has more than 1000 members and it launched the DSpace Development Fund (DDF) in 2022.
- Some projects follow an open core model (for example RapidMiner) by licensing parts of the code that enable scaling to enterprise levels.
- Based on the strong base in its institutional community and influence stemming from its open-source software, an organisation may offer commercial services. Lyrasis provides certification of partners and other providers, hosting, consulting, training, digitization, preservation and fiscal services. It also mediates in content creation and acquisition and application for grants.
- The primary contributor to open source software may also use it or related services as a promotional and visibility device or a token of its participation in a larger collaboration. This requires funding for software and service operation, support, maintenance and development from other sources, such as its other businesses, institutional budget or national research projects.
- The project may provide its solution as a service that is used within a larger scientific infrastructure, platform or collaboration and in return share a part of its income, regardless of its business model.
- Access to venture capital (VC) and private investors may be suitable for teams that intend to commercialise or make a profit from tools that cover a wider market than the academic sector.
Scientific open source and related services cannot expect to generate enough traffic to sustain themselves from advertisements. Crowdsourcing may work for projects related to the research of more or less frequent diseases or therapies for them. It is also possible to seek small donations from interested individuals or large corporate ones if the subject is attractive and something many people are interested in or passionate about, such as climate change, astronomy or long-standing mathematic problems. This can be extended further by stimulating public participation by offering individuals to engage as citizen scientists or by providing their computing resources, as is done by distributed computing projects. Examples of this are Folding@home, iThena, GPUGRID.net, PrimeGrid, World Community Grid (WCG), Rosetta@home, Cosmology@Home, SETI@Home, climateprediction.net (CPDN) and LHC@home, the majority of which are based on LGPL-licensed Berkeley Open Infrastructure for Network Computing (BOINC) [https://boinc.berkeley.edu/projects.php]. Such platforms do not try to monetise even a part of obtained resources, as this would repel their contributors, in particular those who bestow resources owned by someone else (e.g., employer). It also should be noted that the processing load on these platforms is often intentionally proprietary to protect the integrity of calculation, as performed analyses may be prone to bombing by fake results from marred worker nodes, which would then require everything to be verified in a controlled environment.
The open-source operational models outlined above provide different advantages and disadvantages and require varying levels of engagement. Therefore, there is no uniform approach that will solve the sustainability challenge. However, options that have already been successfully implemented, recipes and examples from a wider OS community and COSS (Commercial Open Source Software) in particular do help [Popp, Karl. (2015). Best practices for commercial use of open source software: Business models, Processes and Tools for Managing Open Source Software; https://flagsmith.com/podcast/joseph-jj-jacks-oss-capital/]. The Software Sustainability Institute provides guidance and support around open-source code for research. The Apereo Foundation is a membership organisation that offers guidance and incubation opportunities for teams working on open-source technologies for learning and research within higher education.
Governance of a growing FOSS project and one that is going through a change in its business model can be extremely difficult. Self-governance, centralization of originally distributed projects and privatization or commercialisation require clear rules around membership, contribution duties and appropriation rights. In the case of self-governance, these rules require monitoring and enforcement by a private agent or several members of the group, or an external agent for centralization and privatization.
OS in science has a smaller user base, but it may have a stronger appeal and can more easily attract institutional sponsors. There are also many examples of organisations coming together in consortia or forming a non-profit organisation to support the development and sustainable management of scientific OS tools. It is also good that OS for science can easily take off within international or university research projects, and after the initial incubation, maturation and proving, continue to operate in a wider environment. Software tools developed within NI4OS (LCT, RoLECT and RePol) are examples of this. In the long run, FOSS, which is currently being challenged by COSS, may emerge as more suitable for scientific software than use in the commodified environment of commercial cloud-based services.
Here provided list is a blend of “Four simple recommendations to encourage best practices in research software” [ https://f1000research.com/articles/6-876/v1] and “Five recommendations for fair software” [https://zenodo.org/record/4310217]:
- Make source code public and use the publicly accessible and versioned repository from the very beginning (relate is the practice of depositing software in archives due to changes in journal policies – the primary goal of this is the reproducibility of results by preserving the research environment. This is why software is typically rather deposited in specialised repositories that have been developed and evolved independently from scientific ones. These platforms provide long-term benefits and support the improvement of software as living products maintained by several contributors by providing specific features, access mechanisms and integrations). But even putting software on GitHub may not do much for reusability without a clear licence and readme information as the primary enablers and indicators of reusability
- Adopt a licence and comply with the licensing requirements of all dependencies and contributors
- Provide basic metadata in by registering software in a relevant community registry to make it easy to discover (sometimes described in the documentation of the registry, but you can also see for yourself by installing a tool)
- Establish clear and transparent contribution, communication and governance workflows
- Enable citation of software (Some archiving services that meet these requirements)
- From FARI4RS R3: meet the standards of the domain community. The lowest – stick to community expectations conventions on formats used to read and write data, but also on in terms of provided functionality, terminology, and other domain’s conventions and practices – even if not immediately needed for the immediate purpose, as it will increase adoption and reuse and chances of sustainability and external contribution. Do not limit to Software metadata and documentation standards of the domain-relevant community.
- Use a software quality checklist to assess components and your research software – [https://www.sciencedirect.com/science/article/pii/S0164121222000267]:
- Community support and adoption (with factors such as popularity, reputation, size, communication channels, and involvement)
- Documentation
- Costs (licence, training, support, etc.)
- Licensing conditions
- Operational characteristics such as independence from other software, development language, portability, compliance and testability
- Maturity
- Quality aspects such as reliability, performance, modularity, maintainability, code quality and architecture
- Perceived risks related to confidentiality, integrity, availability, etc.
- Trustworthiness of components, architecture and platform, provider reputation, 3rd-party assessments
Reporting open source in NI4OS-Europe Agora
Data about the use of open-source software and technologies are collected in NI4OS-Europe Agora following the EOSC Portal Profiles v4.00 specification [https://zenodo.org/record/5726890] in the field named ERP.MTI.5 or “Open Source Technologies”, in “EOSC Resource Profile Tables / Data Model / Maturity Information / Open Source Technologies”). This field is used to provide a “List of open source technologies incorporated into the Resource”. The specification states that this field is for specific technologies not broad ones like HTTP or a Linux distribution. This field is optional with multiple values of up to 100 characters. The validation criterion is rather simple: “Check that the technologies mentioned/projects exist”.
However, since the use of Free and open source software (FOSS) is a significant facilitator in the advancement of Open Science and related services, in NI4OS-Europe we are asking you to be more detailed and verbose when it comes to reporting the use of FOSS by your services. Therefore, please provide a one-line description for each significant component of your service. If possible, follow the identifying name of your component (version numbers should be rather avoided) with the comma-separated name or SPDX code of the corresponding software licence [https://spdx.org/licenses/]. If you think it is needed, you can also provide related URLs; however, make sure that the entire line does not exceed 100 characters. If there is enough space available, you can also provide a short description of what the software is used for, separating it using ‘ – ’. Here are a few examples of valid descriptions:
- DSpace, BSD
- DSpace, BSD 3-Clause – Repository
- DSpace https://dspace.lyrasis.org/, BSD-3-Clause – Repository
- DSpace, BSD https://dspace.lyrasis.org/dspace-source-code-bsd-license/ – Repository
- DSpace https://dspace.lyrasis.org/, BSD https://dspace.lyrasis.org/dspace-source-code-bsd-license/
To help you in providing this information, below are the names, licences, URLs, and purposes for “Open Source Technologies” that are most frequently mentioned NI4OS-Europe Agora (as of July 2022):
NI4OS-Europe service admins very often provide ‘Linux’ and ‘XML’. As mentioned before, please do not indicate the use of such generic or general-purpose technologies.
Sometimes, the primary software product used by your service may rely on some other software, where a few alternatives may be available. Please, try to indicate these as well within separate “Open Source Technologies” entries. For example, repositories based on DSpace and related supporting tools use several components, where some choices are possible. Highlighted are the components that were selected for the repositories maintained by the UoB, along with their potential alternatives:
- Java-environment: OpenJDK (GPL-2.0-only with linking exception) instead of Oracle's Java (Oracle No-Fee Terms and Conditions" (NFTC))
- Web server: Apache Tomcat (Apache License 2.0), Jetty (Apache License 2.0 and Eclipse Public License 1.0), or Caucho Resin (GPLv3 or proprietary)
- Relational database: PostgreSQL (PostgreSQL License, similar to BSD or MIT); a led favoured alternative would be Oracle (Oracle Database XE cos without Java stored procedures and has limited in terms of data quantity and use of only a single core)
- Reverse proxy: NGINX (2-clause BSD license or proprietary) or Apache (Apache License 2.0)
- Non-relational DB: Solr (Apache License 2.0) instead of Elasticsearch (Elastic License 2.0 (ELv2) or Server Side Public License (SSPL), which are a source-available ‘fauxpen’ licence and a usual proprietary licence)
Literature
- A Fresh Look at FAIR for Research Software, https://arxiv.org/ftp/arxiv/papers/2101/2101.10883.pdf
- Barthonnat, Céline, Blotière, Emilie, Gingold, Arnaud, Mas, François-Xavier, Stanić, Nikola, Pierno, Alessandro, Szulińska, Agnieszka, Armando, Lorenzo, Pochet, Bernard, de Santis, Luca, MacGregor, James, Pozzo, Riccardo, & Pogačnik, Aleš. (2021). OPERAS SIG on Tools for Open Scholarly Communication: White Paper 2021. Zenodo, https://doi.org/10.5281/zenodo.5654319
- Black Duck Open Hub, https://www.openhub.net/
- Chue Hong, N. P., Katz, D. S., Barker, M., Lamprecht, A-L, Martinez, C., Psomopoulos, F. E., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., Honeyman, T., et al. (2022). FAIR Principles for Research Software version 1.0. (FAIR4RS Principles v1.0). Research Data Alliance. https://doi.org/10.15497/RDA00068, https://rd-alliance.org/group/fair-research-software-fair4rs-wg/outcomes/fair-principles-research-software-fair4rs-0
- Discover your next tool for social media analysis – A list of tools and software to support the collection and analysis of social media data (dataset), https://socialmediatools.pory.app/
- Donnie Berkholz, The size of open-source communities and its impact upon activity, licensing, and hosting, April 22, 2013 https://redmonk.com/dberkholz/2013/04/22/the-size-of-open-source-communities-and-its-impact-upon-activity-licensing-and-hosting/
- Duca, D. (2019), The ecosystem of technologies for social science research (dataset). doi: 10.5281/zenodo.3555207, https://github.com/sagepublishing/SAGE_tools_social_science/blob/master/data/master_tools_current.csv
- Duca, D., & Metzler, K. (2019). The ecosystem of technologies for social science research (White paper). London, UK: Sage. doi: 10.4135/wp191101, https://static1.squarespace.com/static/5d5ad9e0100bdf0001af0f5e/t/5ed0ea0631c1a80efe375fe5/1590749710566/The+Ecosystem+of+Technologies+for+Social+Science+Research.pdf, https://group.sagepub.com/white-paper-archive/the-ecosystem-of-technologies-for-social-science-research, https://sagepublishing.github.io/sage_tools_social_science/
- Duca, D., Developing a comprehensive directory of tools and technologies for social science research methods, https://forrt.org/educators-corner/003-developing-tools/]
- Five recommendations for fair software, https://fair-software.eu/, https://zenodo.org/record/4310217
- Ford Fondation, Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure, https://www.fordfoundation.org/work/learning/research-reports/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure/
- HAL, https://hal.archives-ouvertes.fr/
- Hasselbring, Wilhelm, Carr, Leslie, Hettrick, Simon, Packer, Heather and Tiropanis, Thanassis. "From FAIR research data toward FAIR and open research software" it – Information Technology, vol. 62, no. 1, 2020, pp. 39-47., https://doi.org/10.1515/itit-2019-0040, https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html
- How to evaluate the sustainability of an open source project, January 22, 2014, https://opensource.com/life/14/1/evaluate-sustainability-open-source-project
- https://www.sciencedirect.com/science/article/pii/S0164121222000267
- https://www.softwarepreservationnetwork.org/wp-content/uploads/2022/01/Software_Metadata_Recommended_Format_Guide-1.pdf
- Ben Rometsch, Interview with Joseph "JJ" Jacks: Founder and General Partner, OSS Capital’s Vision for Open Source Software, May 25, 2021, https://flagsmith.com/podcast/joseph-jj-jacks-oss-capital/]
- Jackson, 2019, https://www.software.ac.uk/how-cite-software
- Jiménez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations to encourage best practices in research software (version 1). F1000Research 2017, 6:876, https://doi.org/10.12688/f1000research.11407.1, https://f1000research.com/articles/6-876/v1
- John W Maxwell, Erik Hanson, Leena Desai, Carmen Tiampo, Kim O'Donnell, Avvai Ketheeswaran, Melody Sun, Emma Walter, Ellen Michelle, A Landscape Analysis of Open Source Publishing Tools and Platforms, Mind the Gap, Simon Fraser University, July 2019, https://mindthegap.pubpub.org/
- Katz DS, Chue Hong NP, Clark T et al. Recognizing the value of software: a software citation guide (version 2). F1000Research 2021, 9:1257, https://doi.org/10.12688/f1000research.26932.2
- Kiselka, B. (2015). Software project longevity – a case study on open source software development projects [Master Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2015.34133, https://repositum.tuwien.at/handle/20.500.12708/2820
- Lamprecht, Anna-Lena et al., Towards FAIR principles for research software, Data Science, vol. 3, no. 1, pp. 37-59., https://doi.org/10.3233/DS-190026, https://doi.org/10.5281/zenodo.6374598, https://content.iospress.com/articles/data-science/ds190026
- Lanier, Jaron, You Are Not a Gadget: A Manifesto. New York, Vintage Books, 2011.
- Mariannig Le Béchec, Aline Bouchard, Philippe Charrier, Claire Denecker, Gabriel Gallezot, et al., Pratiques et usages des outils numériques dans les communautés scientifiques en France. [Rapport de recherche] Comité pour la science ouverte. 2022, 112 p. hal-03545512, https://www.ouvrirlascience.fr/state-of-open-science-practices-in-france-sosp-fr/, https://hal-lara.archives-ouvertes.fr/OUVRIR-LA-SCIENCE/hal-03545512, https://hal-lara.archives-ouvertes.fr/hal-03545512/document
- Martinez, et al. (2022), A Survey on Adoption Guidelines for the FAIR4RS Principles: Dataset (1.0) (dataset), Zenodo, https://doi.org/10.5281/zenodo.6375540
- Michael Jackson. (2018). Software Deposit: Guidance for Researchers (1.0). Zenodo. https://doi.org/10.5281/zenodo.1327310, https://zenodo.org/record/1327310
- Nicolas Suzor, What motivates free software developers to choose between copyleft and permissive licences?, August 8, 2013, https://opensource.com/law/13/8/motivation-free-software-licensing
- Open Source Initiative, Licenses & Standards, https://opensource.org/licenses
- Open Source Initiative, The Open Source Definition, https://opensource.org/osd
- Popp, Karl. (2015). Best practices for commercial use of open source software: Business models, Processes and Tools for Managing Open Source Software
- Projects that use Berkeley Open Infrastructure for Network Computing (BOINC), (dataset) https://boinc.berkeley.edu/projects.php
- R. D. Cosmo, M. Gruenpeter and S. Zacchiroli, "Referencing Source Code Artifacts: A Separate Concern in Software Citation," in Computing in Science & Engineering, vol. 22, no. 2, pp. 33-43, March-April 2020, doi: 10.1109/MCSE.2019.2963148., https://ieeexplore.ieee.org/document/8946737
- Sanchez-P. Jorge-A. (2021). EOSC Portal Profiles v4.00 (v4.00). Zenodo. https://doi.org/10.5281/zenodo.5726890, https://zenodo.org/record/5726890
- Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. 2016. Software citation principles. PeerJ Computer Science 2:e86, https://doi.org/10.7717/peerj-cs.86
- SPDX License List, https://spdx.org/licenses/
- Top open source licenses and legal risk for developers, July 13, 2022, https://www.synopsys.com/blogs/software-security/top-open-source-licenses/