Software and free and open source software in open science

From NI4OS wiki
Jump to navigation Jump to search

Software and free and open-source software in open science

Software is one of the pillars of open science, along with publications and data. Software is ubiquitous nowadays - it is used as a general scientific tool that facilitates research and supports and integrates various elements of open science (OS). Sometimes software is an integral part of a particular scientific investigation, essential for its conduct, reproducibility and follow-up research. It may even be a primary outcome of the research.

Software is used to process data. Research data can be better understood with software that provides an analytical or comparative view of it. Software also facilitates access not only to research data, but also to data about research. It runs the open science infrastructure, including repositories, archives, catalogues, databases and platforms for collaboration. The software-based services and infrastructure of OS are so important that it is safe to say that OS would not exist today without software, and, for a large part of that claim, without free and open source software (FOSS).

Despite many differences in their topics, methods and culture, open science and open software work harmoniously towards similar goals. Therefore, it is crucial for contemporary researchers to be familiar with the specifics of science-related software, its use in open science and especially the use of FOSS.

FOSS solutions and platforms make up a large part of the generic and domain-specific solutions in NI4OS-Europe. Subsequently, the end-users and service and resource providers are questioning the best attitude towards FOSS in their work and the relationship between OS and FOSS. The issues they are the most concerned with are proper use of FOSS, licensing, referencing, preservation and sustainability. These issues are of concern for both research software used or developed or within research communities, but also for the original NI4OS-Europe developments. This report aims to offer clear basic guidance on using FOSS software in research, OS and related services, currently and in the near future, within the NI4OS-Europe community and beyond.

Use of software in science

Software in science is typically split into research software and software in research.

Research software includes source code, algorithms, scripts, computational workflows and executables created during the research process or with research as its purpose. It is typically domain-specific. Research also directly employs computational and data processing workflows, which may combine several pieces of software and other resources, and self-contained executable notebooks that integrate data, analytical methods, procedures, code, results, visualizations and narrative. Research software can be treated as an OS resource that is managed together with other parts of research but it still needs to be distinguished as a separate entity of a special type. Furthermore, all its elements may be aggregated in a standalone OS research product, especially if it is to be used by several research projects.

Software in research includes software that is used before, during or after research and that was specifically created to assist, track, share or manage research or to facilitate participation in OS. Such software is often provided as a service or locally used tool that interacts with external services. The provided services may combine several software components or involve human contribution or intervention. Sometimes, even a general-purpose software or service can be adopted by a community within a discipline and thus become a tool used in research.

A distinction between research software and software in research is often difficult to make, as some services are exclusively used in specific scientific domains and communities. Also, specialised research software may be integrated with more general services and resources.

Finally, both software in research and research software is just a thin layer on top of a huge stack of other software and platforms that research depends upon. Although the top layer fully depends on it, this other software it is not considered as software in research or research software. Still, the licences and rules imposed on that software may be applicable, given the nature of the dependence.

Publications are typically issued only once and are used as such for a long time, as they may be updated only with minor corrections. Even though publications may be overridden, relativized or partially replaced by subsequent research and publications, they remain usable and used in their original form. Research-related products other than software are typically linked to individual publications. Datasets can be reused in many research efforts but, apart from format adaptations, are substantially unchanged, even if combined and used together with other datasets. On the other hand, software is constantly modified – even small changes in it may have a huge on its functioning. While the lifetime of one version of the software is generally shorter than that of data [https://content.iospress.com/articles/data-science/ds190026], it is quite opposite if its lifecycle and patterns of use are considered across several versions.

Research software often has a lifecycle that is independent of given research, rules for the governance of scientific artifacts and involves people who are not a direct part of related research. Therefore, software typically needs to be handled differently than publications or data. However, these specificities are not yet consistently addressed, due to a huge difference in the way software is perceived and managed. As researchers, groups and organisations use software to support different activities and address different needs and have distinctive priorities and capabilities, they also differently handle various aspects of software management. The primary aspect of every piece of software is human, while machine interpretation and execution are just parts of its lifecycle primarily marked by human reading, parsing and occasional modification. Software modification and maintenance are often very difficult and time-consuming. The code can be seen as the capture of precious knowledge, where the target discipline, logic, processing steps and technical implementation are integrated. There is also a sophisticated and evolving developer community with its conventions and culture, in which the expectations, conventions, standards and tastes change over time. Coding practices and those related to documentation, testing and quality assurance vary across scientific disciplines and researchers and often differ from the ones of professional software engineers, but even they are not uniform in their approach. The licences and copyright law apply and must be adhered to even if the software is FOSS.

The most successful pieces of software are often used for many years and often go through dozens or even hundreds of modifications and versions, especially if an agile development process and automated delivery are used. Software is managed over years by many people and through steps and processes that are not a part of the scientific process; even organisations that govern, maintain and modify it may change. During that time, their earlier versions may become completely unusable or even dangerous. The history of software changes is crucial for its understanding and maintenance. As a piece of code evolves, the causes, reasoning and intent of a change, along with all affected locations, are often a key to its interpretation and base for further modifications. The size of source code in successful software projects may grow dramatically and is often measured in millions of lines. At the same time, the code is fragile, due to many dependencies and easy to make and extremely difficult to detect mistakes in understanding and implementing interactions between components. A small mistake in a single line of code may invalidate or jeopardise a large system and impact many related software components. Besides the code itself, software metadata, configuration and customisations, licence information and corresponding service policies also need to be defined, captured, provided and tracked. Therefore, the structure of software is much more complex than one of the typical datasets as it includes many dependencies and other relationships which for the most part are not directly related to the subject of research.

Software that is used in open science or by research infrastructures and services is not necessarily free and open source. All types of software (private, FOSS and proprietary) are commonly preserved and managed by using software repositories which conceptually and functionally significantly differ from repositories used in open science. Science expects research software artifacts to be properly archived, referenced, identified, described, cited and credited, but also discoverable, visible, accessible and reused when needed. Software quality, reproducibility and traceability have to be handled differently than other less mutable research-related products. Furthermore, software needs a policy framework for dissemination, reuse, evaluation and recognition that includes funding and software-related incentives, a sustainability framework with organisational schemes, legal tools and economic models. On top of all this, both science and software communities would benefit from a strategic framework that would combine various approaches and methods for consolidation across scientific disciplines and technical communities, which should include harmonisation, technology transfer and industry collaboration.

FOSS and open science

Open-source software is a key element of many OS tools and services. These services support OS and contribute to its digital ecosystem, by enabling, supporting or streamlining the exchange and use of research information and shared data. Open science and open software share the same ethos and drive or openness of knowledge, information and opportunities and often support each other, they are substantially independent and have a different practical focus. Open source and permissive FOSS licences, in particular, are often used in open science, but they are not required to be used in open science. Although open-source software is popular due to its potential to be more easily applied and maintained in the scientific community, the actual settings and needed tools may differ and therefore include proprietary solutions. About half of the tools used in science are FOSS or at least free to use [SOSP-FR report], and even in social science, there is a similar number of paid and free tools available [https://group.sagepub.com/white-paper-archive/the-ecosystem-of-technologies-for-social-science-research ], which is something that would be easily presumed for natural and technical sciences.

Both FOSS and proprietary software used in science often use FOSS components. The concern about access to software and its licensing is initially related to the guidance on how to deposit [https://zenodo.org/record/1327310] and cite or reference [https://doi.org/10.7717/peerj-cs.86] research software. Software Heritage, also known as SWH and HAL [https://hal.archives-ouvertes.fr/] is one of the related supporting initiatives and services aiming to address software in open science. Still, when depositing, most researchers rely on GitHub, while some use Zenodo as a catch-all archiving service. Depositing practices are associated with the specification of software location and providing of licences and other metadata and citation that make research software findable and accessible.

Poor citation practices contribute to inadequate visibility and accessibility of research software. The used software tools and packages are not mentioned or insufficiently identified in academic papers, even when their names are unique enough. Often, researchers mention software they use in the methodology section or footnotes. This complicates finding the tool-related research and does not provide direct credit. But authors are often shy to go into such “technical details”. Even when they mention used software, they may be asked by reviewers to remove that part. Creators of some tools ask for a specific paper to be cited, which facilitates tool finding by prospect users and warrants credit to developers. However, this is not enough. Software references are still not standardised and refer to many kinds of sources, predominantly via URLs, which is a practice that is not persisting and interoperable in the long run. FORCE11 Software Citation Working Group defined the basic software citation principles [https://doi.org/10.7717/peerj-cs.86]

  • Importance – Software should be considered a legitimate and citable product of research.
  • Credit and attribution – Software citations should facilitate giving scholarly credit and legal attribution to all contributors to the software in a suitable way.
  • Unique identification – A citation should include identification that is machine actionable, globally unique, and interoperable and is recognized by at least the discipline community.
  • Persistence – Identifiers and metadata should persist beyond the lifespan of the software.
  • Accessibility – Citations should facilitate access to the software and associated metadata, documentation and other materials necessary for informed use.
  • Specificity – Citations should facilitate identification of, and access to, the used version of the software.

Starting from these principles, additional guidelines on how to cite software were developed [https://www.software.ac.uk/how-cite-software; Recognizing the value of software: a software citation guide].

It should be noted that the background, attitudes and motivations and goals of researchers and professional software developers differ. As both groups develop and operate research-related software, these differences should be accounted for.

Even the most popular licences significantly differ, although sometimes the licences that are popular in the OS world are applied for software. However, the most often used software licences are designed to support the goals of the open-source movement and were developed independently and with different objectives than those that are typically used in OS.

Typically, there is a lack of time, effort and knowledge that are necessary to address FAIR concerning software, and even IPR and licensing in general by software developers. A joint RDA/FORCE11/ReSA working group on FAIR for Research Software (FAIR4RS), established in 2020 reviewed and redefined FAIR guiding principles for software and related computational code-based research products and published its adaptation of the general FAIR principles for research software in March 2022. Although the minimal software metadata has been discussed, defined and collected by often-used registries for a long time, the first comprehensive guide for describing and cataloguing software materials was developed in 2020 and published in February 2022 by Software Preservation Network’s Metadata Working Group [https://www.softwarepreservationnetwork.org/wp-content/uploads/2022/01/Software_Metadata_Recommended_Format_Guide-1.pdf]

The association between FOSS and OS extends to broader guidelines on FOSS software and licence usage and governance, but also elaboration and establishment of strategic orientation in the management of tools and services. This relationship also includes shared governance: ensuring continual investment in software development, investment pay-off, control over software evolution, and its long-term usability, maintenance and sustainability. Therefore, the management of software as scientific assets is becoming a critical part of OS governance that should be fused with related practices coming from software engineering, where IPR governance, FOSS licensing and licence compatibility is very important and current subject.

Many existing links between software and various aspects of research and existing explain why software, access to source code provided by FOSS, need to access other software for reproducibility illustrate why the strengthening of the productive relationship between research and software communities is so important. As many international organisations and collaborations call for closer use of FOSS in research and supporting infrastructure, much more needs to be done in terms of recognition of software as fundamentally different from research data, and establishment of associated conventions, infrastructure, rules, evaluation and support. At the same time, the members of the software community who are participating in OS development, need to firmly adhere to its requirements and practices, which they, due to the desire to circumvent obstacles and a sense of entitlement, may try to evade.

Software in OS practices (SOSP-FR report)

The report “State of open science practices in France” [SOSP-FR, in French: Pratiques et usages des outils numériques dans les communautés scientifiques en France] from 2022 describes practices and the use of digital tools in scientific communities in France. It is based on a survey of 1089 researchers in various fields.

About two-thirds of respondents use free and open-source software (FOSS), and the same portion uses paid software. Other types of software are much less used. These include partially or completely free proprietary software (21%) and software created for research or resulting from research (17%). In particular, respondents who are 35 years old or younger are more inclined toward FOSS. Physics, mathematics and computer science, in particular, are resolutely oriented toward FOSS, with this inclination approaching 80% for the latter two. Literature, social, humanistic and life sciences are equally split in the use of FOSS and paid software, while chemistry, engineering sciences and medicine are more inclined to paid software (at around 55%).

Most used are general-purpose authoring tools (such as MS Excel and Word). Next are more technical languages and platforms commonly used in data science and analysis R, Python and MATLAB. The reason for the high frequency of general tools (word processing, spreadsheets, visualisation and presentation) is that they are used by all specialities. As the general public, researchers most often use common office applications. Similarly, the versatility and neutrality of analytical platforms are the reasons for their high ranking. The two most used tools for data analysis, Excel and R, symbolise the two major flavours of digital data analysis Excel is a long-established paid software for general use with a very strong reputation, while R is an open-source programming language and environment for statistical computing and graphics. Still, R is more recent, two less used and its audience is still limited to groups of (often younger) researchers.

The only well-ranked discipline-specific tool is a FOSS geographic information system QGIS. Next are, with almost equal use that jointly equals R: software developed by the researchers, in-house software of the organisation or laboratory and LibreOffice, another office package. This strongly indicates that researchers and organisations do not want to develop dedicated software unless they have to. It is also interesting that all locally developed software is much less mentioned than any of R, Python and MATLAB, which means that their use is not considered programming.

  • MS Excel: 219
  • MS Word: 143
  • R: 112
  • Python: 105
  • Matlab: 80
  • Qgis: 49
  • Software designed by user: 35
  • Libre Office: 33
  • Internal software of the organisation: 33
  • ImageJ: 32
  • LateX: 27
  • FileMaker: 27
  • Origin: 26
  • Photoshop: 26
  • Lime survey: 26
  • Powerpoint: 23
  • Illustrator: 23
  • SPSS: 22
  • Arcgis: 19
  • Access: 18
  • C / C++: 18
  • SAS: 16
  • Zotero: 15
  • Labview: 15
  • NVivo: 14
  • Sphinx: 14
  • Stata: 14
  • Prism: 13
  • RStudio: 13
  • Oxygen: 12
  • GraphPad: 11
  • Mathematica: 10
  • Iramuteq: 10
  • Chemdraw: 10
  • Gimp: 10
  • Inkscape: 10

Next are various often-needed task-specific tools that are can be used across communities. They serve for image processing, database management, preparation of publications, surveys, statistical analysis, data acquisition, and instrument automation and control. The most often mentioned open source ones are ImageJ, LaTeX, Lime Survey, Zotero, Sphynx, RStudio, Iramuteq (in French), GIMP and Inkscape. Proprietary tools are FileMaker, Origin, Photoshop, PowerPoint, Illustrator, SPSS, ArcGIS, Access, SAS, LabVIEW, NVivo, Stata, Prism & GraphPad, OXYGEN (by DEWETRON), Mathematica and ChemDraw. Among them are also C and C++, which is a far cry from their past glory. Most of these tools and platforms are decades old, so researchers stick to what they know to use and are satisfied with. Some of these tools may be parts of the shared culture and hard to replace. Zotero, used for the management of bibliographic data and related materials, is the first mentioned software that is directly related to the OS.

Almost no software tied to a particular scientific area is mentioned in the joint list. Such tools are too diverse and dispersed across communities. Still, some community-related patterns for the above-listed tools can be recognised:

  • The physical sciences, mathematics, computer and engineering sciences are inclined to the tools they can adapt to their needs (MATLAB, Python, LaTeX, software designed by researchers)
  • Humanities, arts and social sciences widely use software from two major publishers, Microsoft and Adobe; they also use QGIS and SPSS. Social sciences also use R.
  • Biology and chemistry often use software specialised for image processing and graphical presentation. Life sciences also use R.
  • Medicine has a low specificity of tools, with the most common use of graphic and statistical tools (ImageJ, SAS and GraphPad products such as Prism).

The authors of the study in their analysis also emphasize that:

  • Python users often use also Linux (49%, compared to 22.5% in the respondent population) and prefer open and free software (91.2%, compared to 69.5% of all respondents). They are more often men (68.2%, compared to 51.9% of men among all respondents).
  • Programmatic solutions and collaborative environments and tools are most used by younger researchers who work in small groups (2 to 5 members) and also typically use Linux.
  • The use of FOSS software is related to its free availability, especially for early researchers and those without sufficient funding (the SOSP-FR report highlights this factor for humanities, arts and social sciences, but it is generally applicable). This may be a stronger driver for FOSS than its use in open science.
  • Not partaking in the use of digital tools popular in the wider community locks researchers into a proprietary environment; they may not be aware of this before they try to use new software available only for an operating system they do not have practice with. Linux is popular, but software vendors should not disregard other systems.
  • The use of open science tools and information or open-source software does not mean that a person is deliberately partaking in either movement. The practitioners are not necessarily fully aware that the practices or rules they follow belong to open science. The use of FOSS tools is both opportunistic and due to open-source philosophy but is unclear which of those two factors is more significant.

Based on differences between age groups, the authors of the report hypothesize that dissemination and use of research tools driven by teaching influence later digital research practices. Software like GitLab, programming languages and free software like R are present in university training as teachers tend to develop training on freely available software because of the limited availability of licences in the educational context. Thus open-source tools become default research environments for future young researchers. Such tools, as well as collaborative environments and executable notebooks (also known as computable documents; as those provided by Jupyter Notebook/JupyterLab), may therefore soon even more strongly influence the practices of publication and communication of research results. These tools are also made available to scientific communities via research infrastructures and online services. It would therefore be beneficial to better map accessibility and training needs on research infrastructures and quantify them for those who are not participating in large research collaborations or at the start of their careers. The use of software tools and open-source software in particular as well as research infrastructures should be assessed independently from open science and related digital practices.

However, based on our survey less formal survey from December 2022, the use of FOSS in academic training is considered by researchers and developers to be less significant than factors such as availability without the need to pay, open source’s natural link to open science, and FOSS principles and development model. it is, in fact, in the second group of motivators that include avoiding bureaucratic obstacles associated with procurement, use by the community and maintainability.

Finding research software

A good list of research software registries, classified by scientific domains and other criteria is available at [https://github.com/NLeSC/awesome-research-software-registries]. It is covering:

  • Astrophysics
  • Computational Fluid Dynamics
  • Grid Computing middleware
  • Earth Sciences
  • Humanities
  • Life Sciences / Biology / Medical
  • Mathematics
  • Machine Learning
  • Nano Technology
  • Social and Ecological Sciences
  • Generic tools
  • Registries by country
  • Registries by organization
  • Registries by programming language

On the other hand, when it comes to general tools for scientific publishing, an analysis and catalogue of open-source publishing tools and platforms are available at [https://mindthegap.pubpub.org/]. A white paper produced by the OPERAS Special Interest Group on Tools Research and Development for Scholarly Communication is available at [https://doi.org/10.5281/zenodo.5654319].

Tools for social sciences

This document covers all scientific domains. But it can illustrate the use of FOSS in social sciences, as one domain where this software is less frequently used.

The Directory for Social Sciences summary [https://forrt.org/educators-corner/003-developing-tools/] and related white paper [https://static1.squarespace.com/static/5d5ad9e0100bdf0001af0f5e/t/5ed0ea0631c1a80efe375fe5/1590749710566/The+Ecosystem+of+Technologies+for+Social+Science+Research.pdf] describe and list many tools used and for social science research and the related trends. For example, organizations are coming together in consortia to support the development and sustainable management of these tools. The number of research tools available has grown rapidly since 2004, from about 50 to more than 400 at the time of writing the white paper in 2019. This is probably due to researchers’ adoption of digital tools and software development skills and advances in computer science and its accessibility. The number of paid and free tools is similar, but the number of free tools is growing at a slightly faster pace. This may be due to the adoption of open source and a greater number of individuals developing their tools. This directory currently contains about 600 software packages/tools at [https://github.com/sagepublishing/sage_tools_social_science/blob/master/data/master_tools_current.csv].

Surveying and sourcing participants

There are also many free and paid online platforms.

Annotating, labelling, and coding text

Open-source text annotation tools are

Social media research

Some free and paid tools are listed at [https://socialmediatools.pory.app/].

Most of these tools work with Twitter. Facebook and Instagram have more active users, but Twitter provides an API that makes its data much more accessible than other platforms. Furthermore, LinkedIn and Facebook even have a policy against using their API for research purposes. Still, there are research tools that provide access to content from several social media platforms. Facebook’s reputational problems pushed it to set up Social Science One non-profit partnership and provide selected researchers with grants-based access to their data. Similarly, LinkedIn set up the Economic Graph Research Program.

More than half of the social media tools are either free (in a form of applications or as freely available packages on GitHub, typically as open source). These tools offer analysis, data collection, monitoring, network visualisation, platform management, sentiment analysis, text analysis and visualisation. Only a few tools have limited free functionality.

Free tools that can access several social media platforms:

  • NodeXL, Social Media Research Foundation
  • SMaPP Toolkit, New York University
  • Vader, MIT
  • Social Feed Manager, George Washington University Libraries
  • Webometric Analyst, University of Wolverhampton
  • SOCRATES, NSF
  • Just Twitter

A very popular commercial tool is NVivo. Some Twitter-specific tools are:

  • academictwitteR, The University of Edinburgh
  • DocNow- Shift Design, University of Maryland, University of Virginia
  • rtweet, University of Missouri

Recommendations on research software and engineering in open science

  • Software must be recognised as a first-class citizen of the research ecosystem and adequate software-related research practices established.
  • As the primary motivations behind the acceptance of OS and the use of FOSS vary, it is best to promote them by emphasizing both the practical benefits and deeper motivations. For some researchers, one may work better than the other, while for those who are influenced by both the synergetic effect may be multiplicative.
  • To promote the adoption of FOSS, start with task-oriented OS tools for which the habits are not very strong while they are available for various platforms, and see whether their growing adoptions will be followed by the use of tools such as LibreOffice, which are currently facing entrenched proprietary packages.
  • Initiatives oriented to women in science should also popularise FOSS. Women are less inclined to use it, so the gain can be higher.
  • Promote the use of Linux, as its regular use is closely correlated to the orientation toward open software and the use of associated collaborative tools.
  • As a dynamic entity, software needs to be adequately cited and identified in references in a way that more practically and reliably links research and software, including used and newer software versions in dedicated software repositories.
  • Establish and promote a dedicated infrastructure for research software, as developers need to use various tools and services along with the services that are used in the context of OS.
  • Standardise and automate interaction between repositories, cross-referencing and checks, along with the exchange of metadata and provenance information.
  • The existing tools need to be harmonised and updated and new tools established to address the emerging problems.
  • Modern software engineering and runtime management put data, software, configuration and other elements of execution context into containers, they employ continual delivery and high availability technologies. This requires additional technical skills related to both software development and infrasttructure management, as well as a more effective combining of skills and norms of researchers, software developers and other IT professionals.
  • Researchers need to be acquainted with software industry governance practices; software engineers need to be educated in the needs and practices of OS; both groups need to be trained on the forthcoming norms for software in OS.

FOSS licences

This and the next section are based on work carried out by the author of this report as part of the GN4 Phase 3 project funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 856726 (GN4-3).

The issues of software licences, their compatibility and selection are complex and therefore often neglected even by professional software developers, let alone those who develop software to support their actual work, such as research.

  • Our work is increasingly dependent on Free and Open Source Software (FOSS), which always comes with a specific FOSS licence.
  • FOSS licences help keep FOSS software alive.
  • Licence compliance is important for legal reasons and to ensure better cooperation.
  • Licensing considerations are of much greater importance in software distribution, as FOSS licences are subject to certain conditions.

Trust in research is based on the peer review system, which requires the ability to reproduce the experiment or analysis described. As the software is often a fundamental part of this work, its validity cannot be confirmed without the ability to access and use it. Furthermore, since part of the analysis process is codified in the software, the trustworthiness of the analysis and conclusions may depend on the ability to examine and reuse not only the input data but also the software used and its internal logic. This does not mean that the source code has to be reviewed by the reviewers – it is usually sufficient to make it available for an independent audit, which is a side effect of shared software development, use and maintenance for commonly used software tools with a large community of users and maintainers.

As pointed out in [https://doi.org/10.1515/itit-2019-0040], the lack of access to the underlying software makes it much more difficult to develop new results based on existing research, but the practices of open source software and the increasing acceptance of it have enabled the general open science agenda. The principles of open science affect the life cycle of research, the way science is conducted and its results – including software – are published, evaluated, discovered and monitored.

The frequency of use of FOSS licences is tracked in the Black Duck Top List [https://www.synopsys.com/blogs/software-security/top-open-source-licenses/].

Rank Licence Usage Licensing Risk
1 MIT 32% Low
2 GPL 2.0 18% High
3 Apache 2.0 14% Low
4 GPL 3.0 7% High
5 BSD 2 6% Low
6 ISC 5% Low
7 Artistic (Perl) 4% Medium
8 LGPL 2.1 4% High
9 LGPL 3.0 2% High
10 Eclipse (EPL) 1% Medium

Advantages of FOSS

The market is supposed to equalise the total costs of equivalent software products. Nevertheless, proprietary software vendors often claim that after a while their products are easier and cheaper to use than FOSS. Some organisations do not want to invest in the expertise and costs they often associate with using and maintaining FOSS, even if the effort required is negligible. They often want to have an external organisation they can hold accountable for software risks and problems. However, experience has shown that it is very difficult to hold a commercial software provider liable unless the software has been customised for a single client and very strict contractual commitments have been made.

On the other hand, the features of FOSS make the software very attractive to many users and organisations. In addition, many commercial developers of open source software often offer additional services as part of their business model. There are even third-party companies that offer support for open source software.

The open source model is very different from that of traditional licensed proprietary and commercial software because it allows both distribution and modification. The first restriction also does not apply to freeware (free) and shareware (free trial), at least for the binary form, but the ability to modify the source code is crucial and has had a huge impact, bringing several significant improvements over traditional proprietary licensing models.

Availability and lower costs – For sophisticated or specialised requirements such as those encountered in research, new technologies and heavy workloads, the market is not large enough for a suitable commercial product to be available and last. The immediate and free availability of open source is also associated with the need to develop in-house expertise, which organisations and individuals often want to do anyway or to hire external support and maintenance. Many users – especially advanced ones – want to customise the software and are often happy to debug and maintain it. In research and education, the obligation to pay for software, the overheads associated with obtaining and renewing commercial licences, and the obligations involved are barriers that students, early career researchers or small research groups cannot overcome. Access to source code can also shorten their learning curve and enable them to use the technology more effectively. In addition, the cost of highly specialised or tailor-made software is often unacceptable even for large organisations, especially when a working solution can be achieved by combining and customising even several freely available building blocks. These factors have led to the widespread adoption of open source in more advanced and specialised communities, the emergence of entirely new usage scenarios, and the introduction of large toolsets or architectures with many components based on microservices, alongside the commercialisation and componentisation of resources and runtime environments.

Innovation and flexibility – The easy access to source code allows developers to add the features they need and bring in new ideas, algorithms and usage scenarios. They are not constrained by the organisational, commercial or strategic limitations of proprietary software vendors, where users would have to wait for the vendor to decide to provide and implement what they need. The more useful a new feature or solution is, the more developers and companies will participate and contribute. The more programmers contribute, the better, more useful and more valuable the result will be. This has been shown to work even when contributors are not required to share their work. Moreover, people who inherently need a certain improvement are more likely to provide a creative and innovative solution than those who are just paid to produce something.

Security and reliability through transparency – When source code is available, many people can review and debug it. It is more likely that security and functional flaws will be uncovered and suboptimal solutions or vulnerabilities will be fixed. Contributors are aware that other experts will look at and review their code, and are therefore more likely to hold to higher quality standards. Any new reviewer can see what has been done before and what went wrong. If they try to fix the problem, they may have a greater incentive to find a better solution than someone who has had to adapt to strict deadlines and other constraints or priorities. Also, someone who is in the right context is often in a better position to identify and fix a problem or bug than the original author because of their prior knowledge, experience or previous involvement with a similar problem. In many open source projects, some special reviewers review changes before they are incorporated into the main codebase. They are not testers, but experts who care about the quality of the code and can work at their own pace. All these factors greatly improve the security and reliability of software, and that is the main reason why the vast majority of services on the Internet today run on Linux. In open source, when a security vulnerability is discovered in an active open source project, interested community members usually fan out and patch it quickly. However, only those who keep up to date with the latest recommended versions, including the libraries they depend on, benefit from this dynamic activity.

Longevity – Commercially licensed software may be discontinued by its vendor who no longer supports it. The vendor may go out of business without selling its software to another company. In such situations, there is no way to update the discontinued software, fix bugs or adapt it to new applications or platforms. Support, patches and other related services are no longer available. Therefore, its usability can deteriorate quite quickly and the user has to decide when to invest in and switch to new software rather than live with the growing problems. In contrast, open source software can evolve continuously because anyone can access and contribute to the source code. Even after it has not been used for a while, anyone can revive, adapt, repair or repurpose it. Many useful and widely used FOSS have large, active and stable user communities whose members include individuals and research groups, but also large and small companies.

Types of FOSS licences

Open source licences allow for software to be freely used, modified and redistributed [https://opensource.org/licenses]. Although many “free software” and “open source software” licences are recognised by the Free Software Foundation (FSF) and approved by the Open Source Initiative (OSI), there are few licences that are popular, widely used or have strong communities. However, a single software project may contain several components, which in turn include other components, so there may be many software licences. To be recognised as open by OSI, software licences must meet some criteria [https://opensource.org/osd]. But even open licences differ in terms of the rights and restrictions they contain. These often relate to derivative works, such as the modification of the original code or its use by the new code. Based on the scope of the code to which they apply, there are two main groups of FOSS licences – permissive and copyleft licences, which differ in whether modifications or the code using the licensed code must be published under the same licence or a similar licence may be used. The applicability of a licence may be limited to modifications of existing files, additions or even any use of a software library. However, all FOSS licences require to some extent the disclosure of existing and new code to external users or when the software is distributed, but not for private use. Most licences require that the licence text and copyright notice accompany the licensed material. Some also require documentation of changes made to the licensed material. It should also be noted that some licences also consider access via the network as use, which includes the right to obtain the source code.

The open source rules are designed so that those who receive copies of the software must themselves be able to redistribute the original and create derivative works from it while allowing others to do the same. Some licences prevent open source code from “getting closed” and require that users and contributors to the code accept the values of open source by redistributing their modifications or additions (derivative works) on the same terms as the original. This means that those who receive copies of these works must be able to redistribute the original and make derivatives under the same conditions.

Unlike the Creative Commons CC-ND and CC-NC licences, an open source software licence must allow modifications or commercial uses to be considered truly open. If the licence prohibits the use of the licensed material and derivatives for commercial or (for example) military purposes, it is not considered a free software licence because it restricts who can use a program or for what. The OSI does not allow discrimination against any person, group, or line of work in the use of the software, so it can be used for any purpose, including any business.

Portions of new software may modify or extend fragments of existing software, which is similar to the creation of “derivatives” in CC licences. To preserve the integrity of the original work and ensure its maintenance, open source licences often require that derivative works be distributed under the same conditions under which the licensee was permitted access to the original work, such as the source code used (incorporated, copied, or modified) and the use of the resulting components, such as software libraries. While the CC-ND licences allow sharing and reuse of content on condition that it remains unchanged, software may be used in many ways without requiring modification or even actual incorporation of the software used. A piece of software may depend on other software by relying on its definitions, specifications and interfaces or by invoking them through dynamic or static linking, network communication and various types of interfaces and connectors. Therefore, software licences differ from those for other types of works in that they focus on the different ways in which software is used and, when a particular type of use occurs, how it affects the licensing of other software that uses it and the extent of that impact. These conditions will include all the terms and obligations set out in the licence of the software being used. If the requirements are few, the licence is called permissive, as opposed to restrictive, or, more correctly, copyleft. If the scope is narrow, it does not extend to all extensions of the work used or all software that uses the licensed material. Instead, it is limited to requiring the availability of modifications to the original work or the modified existing files. The term used in this case is ‘weak copyleft’. When the scope is broad, the licence of the component used and its associated terms must be applied to all software that uses it. This is referred to as ‘strong copyleft’.

Public domain licences offer the most permissive model. Anyone can modify and use software without any restrictions. But even if a component is free and without legal restrictions, one should always make sure it is secure before including it in the codebase.

Permissive licences contain minimal requirements on how the software may be modified or redistributed. Users do not have to republish the changes they make and usually only need to credit the original authors. They contain a disclaimer and often require that modifications be described. This type of licence is used by almost two-thirds of the open source software in circulation [https://www.synopsys.com/blogs/software-security/top-open-source-licenses/]. Permissive licences are popular because of the flexibility they offer to those using such licenced software and the low IPR risk. These licences include the MIT (the most popular, short and simple), Apache 2.0 (requires notice of changes, grants a licence to patents unless challenged in court, and mentions the preservation of trademark rights), BSD (some versions require the inclusion of a disclaimer) and ISC (along with its OpenBSD variant is a further simplification of MIT and BSD). Artistic Licence (used for Perl and in several variants of versions 1.0 and 2.0) is permissive but includes compensation for damage.

Copyleft licences, also known as reciprocal licences or restrictive, protective and even viral licences, allow the modification of the code and the distribution of new works based on it as long as the requirements for redistribution under the same conditions are met. This is to ensure that the rights from which the user or modifier has benefited are preserved in derivative works by prohibiting contributors from appropriating their modifications, which would place them in an asymmetrical position vis-à-vis upstream contributors. This usually means that anyone who modifies the code must also release their modifications under the same licence. Copyleft licences are often in commercial settings considered to be riskier, as they can limit potential business value or jeopardise the secrecy of intellectual property. All copyleft licences are used by more than a third of open source software.

Weak copyleft licences have a library or file cope. Examples are the LGPL (Lesser GNU General Public License; 2.1 cleans up the text of 2.0 and allows dynamic linking without enforcing copyleft; 3.0 allows patent use, is not compatible with LGPL 2.0 but is compatible with Apache 2.0 and the end user must be able to install a modified version – it prohibits closed devices, DRM or hardware encryption or patent retaliation); EPL (Eclipse Public License 1.0 and 2.0); MPL (Mozilla Public License 1.0, 1.1 and 2.0 – it is simple, allows static linking and licence variants with additional conditions); Ms- PL (Microsoft Public License), Ms-RL (Microsoft Reciprocal License) and CDDL (Common Development and Distribution License 1.0 and 1.1) require only the release of the modified code, allowing the use of open source libraries in proprietary software. MPL, Ms-RL and CDDL require this only for modifications to existing files. Libraries under LGPL, EPL and Ms-RL allow proprietary licences for the code that uses them, but the original licence extends to new files in a modified library.

On the other hand, strong copyleft licences often require releasing the entire project or product under the licence that is the same or similar to the one of the used work. Among the copyleft software, the use of strong copyleft licences significantly prevails. These licences intend to keep everyone on the same page and disallow ‘free ride’ which is still possible with permissive and weak copyleft licenses. By introducing these restrictions, the creators of strong copyleft licences wanted to expand the presence of open-source software, ensure the sustainability of the open-source software ecosystem and strengthen the open-source-software movement. The most common and widely used licence is the GPL (GNU General Public License; 2.0 is more often used; 3.0 grants the use of patents, it is compatible with Apache 2.0 and the end-user must be able to install modified software).

AGPL (Affero General Public License 3.0) is similar to GPL, but it is also network protective. Use over a network is considered distribution, so modified code must be available to external users. It is becoming increasingly popular because it closes the ‘ASP/SaaS loophole’ of the GPL that allows software under the GPL to be used without disclosure since SaaS software by its nature is not distributed to users. The AGPL is, as its preamble states, “specifically designed to ensure cooperation with the community in the case of network server software”.

Source-available and ‘fauxpen’ licences

There are also non-FOSS restrictive licences, often presented or perceived as similar to FOSS, but which impose restrictions that prevent them from being considered open source according to the Open Source Initiative (OSI) and free according to the Free Software Foundation. Source available licences (or shared source licences) are proprietary licences that allow the source code to be viewed and, in some cases, modified and redistributed. They make the code available for viewing to facilitate scenarios such as inspection, understanding how it works, debugging, integration, or testing of external components. Examples of such restrictive licences include Business Source License (BSL), Microsoft Limited Public License (Ms-LPL), Microsoft Limited Reciprocal License (Ms-LRL), and Microsoft Reference Source License (Ms-RSL). Some of these licences grant rights only to developers of Microsoft Windows-based software, while the Ms-RSL allows viewing of source code for reference and debugging purposes.

The user has no rights to use, redistribute, modify or (sometimes) even compile the code. On the other hand, FOSS is not just about access to the source code, but about full freedom to use it, even for commercial or objectionable purposes, as long as the same freedom is preserved for those who use or even pay for the code in question.

‘Fauxpen’ licences are similar to source-available licences. They are presented as open, but a closer look reveals that the licensed software or product is actually under the strict control of the vendor. These hybrid licences are intentionally deceptive and confusing. The Server Side Public License (SSPL) is a strong copyleft licence that requires the public release of the source code of service management layers when a service is provided. This prevents cloud providers from offering software licensed under the SSPL to third parties as a service, as they must release all source code, APIs and other software required by a user to run an instance of their service under the SSPL. The SSPL also makes it impossible to use the Linux kernel, which is under the incompatible GPLv2-only licence. Thus, the SSPL discriminates against a particular field of use. ELv2 (Elastic License v2) is a non-copyrighted licence that prohibits making the products available to others as a managed service, circumventing the functionality of the licence keys, or removing or disabling features protected by licence keys.

Open Source Rule 6 (“no discrimination against fields of endeavour”) and the FSF’s Freedom Zero (“the freedom to run the program as you wish, for any purpose”) indicate that ‘fauxpen’ and open source licences are not FOSS. Vendors of software who switch to such licences effectively convert projects that started as open source to proprietary licences and admit that their business models are not compatible with open source. They claim that they want to protect their work from unfair exploitation by cloud providers and other free riders who would use their software without paying for its creation and maintenance. At the same time, they appropriate the contributions of outside developers who have donated their time and energy by contributing to projects when they were open source. In addition, these companies often use code from other open source projects to run their business.

When a provider is switching to a proprietary commercial licence, it can choose the time, terms and cost. The future costs of software and even its future availability are unknowable, as with any other proprietary software. When previously open software is embedded in or changed into a proprietary product, its users have to agree to the terms of a proprietary licence, be left with an unmaintained version, or fork the last open version of software and carry the associated burden of maintenance forever.

Products subject to this bait-and-switch became popular because they were marketed as FOSS, as developers prefer to have control over what runs in their programs and fix it or have other people fix it, even though they are not affiliated with the original maker of the tool or component. In addition, such platforms are gaining traction because they tend to offer free, one-stop solutions, while expensive licences for commercial alternatives often add up and open source replacement solutions are less integrated. Even developers in large enterprises prefer to use FOSS rather than go through the slow, bureaucratic and multi-layered approval and procurement processes.

The only advantage of the prior FOSS status is the possibility to fork the last open version, but even this window of opportunity may effectively close after some time if the community sticks with the vendor and its proprietary changes. Forks are also hard due to the resources needed and the necessary switch in branding, whereas people do not switch easily from one brand to another.

All this means that access to updates to software under permissive licences and those with the ‘sublicense’ option can be volatile in the long run if the software is controlled by a single company, as the cases of Elasticsearch and MongoDB show. This is why it is so important to choose software that is guaranteed, or, at least, highly likely to remain FOSS.

Copyrights, patents and warranties

Copyright is a form of intellectual property that allows the creator of an original work to license that work to the extent governed by copyright law. No registration or official notice is required to declare copyright in a work, only a clear and visible statement of copyright and a definition of its subject. Copyright can also be easily transferred to another subject, typically by contract or statement. Since open source software licences already by definition make the work available to everyone under clear conditions, copyright as such is not an issue, but the actual details of licensing conditions. The licence is affected by whether copyright is stated in the licence text and how the text of the licence and the copyright notice (containing the original copyright and attribution notices) should be included and presented in the licenced material. The licence may require the copyright notice for source code only, or may also require it for binaries.

Patents are a much more complex form of intellectual property. An organisation or individual that has invented something substantial, new and useful proves this through a regulated, expensive and time-consuming patent registration procedure. If this process is successful, the patent holder is granted the right that excludes others from making, using, selling, offering or making available the patented invention for a predetermined period (e.g., 20 years), and fees may be charged for using the patent. A patent can earn the owner the associated royalties in the form of financial compensation for the use of the invention, while its infringements are internationally enforceable and can be prosecuted in court. Patent owners try to extend the boundaries of their patent while seeking infringements to maximise royalties and penalties to cover the costs of invention development, filing or acquiring and maintaining the patent, scanning for infringements and litigation. Therefore, there is always a risk of possible and even unwitting infringement of a patent by the licensor of software and subsequently by its licensees.

Some licences describe how to deal with potentially applicable patents and royalties, which removes at least some of the patent-related uncertainty. A licence may state that it does not grant rights to contributors’ patents, or it can explicitly grant contributors’ patent rights. Both models remove some uncertainty, but they do not solve the problem of patenting. The latter approach is an attempt to prevent the appropriation of innovation and software through patents. But no software licence would protect a licensee from being sued by a third-party patent holder, since licensors can only license works that they own. Because software patents are often too vague, abstract and ambiguous, they can easily be used as a weapon; they may even protect concepts or methods of interacting with a system. Neither the licensor nor the licensee may be aware of such a patent, so a patent troll or a competitor with a corresponding patent can appear at almost any moment.

Even if a patent holder has licensed the patent for use in open source software, or the applied FOSS licence waives all patent-related obligations, that patent may later be narrowed or revoked through litigation by a holder of a competing patent. In this case, even the software licensee who has fully complied with the terms of the original licence and the licensor’s patent may be held liable for infringement of a competing patent if it continues to use affected software. Since the narrowing or cancellation of the original patent would not only affect its owner but also its licensees, licensees may wish to participate in the protection of the licensor’s patent. Since this can be expensive, licensees should only engage in such an endeavour if their business would be seriously affected by the competing patent holder’s claims.

Other constraints and rights

Most licences require that the licence text and copyright notice accompany the licenced material. Some also require documenting the changes made to the licensed material.

Some licences:

  • Describe the circumstances under which the source code must be made available
  • Indicate whether the changes must be documented
  • Describe the allowance or prohibition of using contributors’ names, trademarks or logos
  • Describe permission or prohibition to use contributors’ names, trademarks or logos
  • Declare whether they include a limitation of liability. Some clearly state that there is no warranty and that the software producer cannot be held liable for damages. They explicitly state that they offer no guarantees or warranties for the use of the code, so the author cannot be held liable if the code does not work well in a particular case.
  • Are peculiar about the use of software or even restrict the type or field of usage (e.g., by prohibiting commercial, military or use over a network), which prevents them from being considered true FOSS licences.

Contributor agreements

Copyleft licences in principle prevent code from being incorporated into or relicensed as proprietary code. However, a licence change may still be possible, as contributor agreements open a loophole. The terms that are typically used are Contributor License Agreement, Copyright Transfer Agreement or Copyright Assignment Agreement. These agreements are used by organisations that own or use contributions as custodians of software. They often involve a transfer of copyright. However, if these agreements involve the transfer of unrestricted reproduction rights, permit unrestricted distribution, or expressly permit relicensing and even sublicensing, the contributed code may be relicensed at the discretion of the custodians regardless of the copyright.

Relicensing

Software relicensing is done for commercial reasons or to improve licence compatibility.* In the first case, the change is typically towards a proprietary licence, often a ‘fauxpen’ or source available licence. The consequences of such relicensing are the elimination of some previously allowed usages, the appropriation of prior contributions and the restriction of access to further improvements of the software.

  • Relicencing for better licence compatibility is done when the current licence is incompatible with those of other jointly used components so that a larger combined work could be licenced.

Relicensing is possible:

  • Due to the previous use of a permissive FOSS licence or another licence that allows sublicensing;
  • When the contributors grant the custodian organisation the right to sublicense or relicense software through contributor or copyright agreements which allow redistributing the work under a different licence;
  • If the owner of the proprietary code so decides.

Adding an alternative licence is not relicensing, as the old licence remains fully valid for those who decide to stick to it. Multi-licensing is therefore a better way to improve licence compatibility than relicensing. Also, it does not require a prior permissive licence or contributor agreements signed by all contributors. “Or later” styled licences are a concisely expressed form of multi-licensing in which all subsequent versions of the mentioned licence are accepted in advance, including those which currently still do not exist.

Governance of FOSS licences

Use of FOSS licenses depending on project intent

For internal use

  • One can use any FOSS and not worry about licences – they have their code and are not giving it to anyone, which is OK with all FOSS licences.
  • The code is kept private, but internal use is very limited – the use of software may easily evolve into sharing or use in commercial contexts that directly involve other parties.
  • What when the creators later decide to offer software to others? Without considering the licences of used components. They may end up with components with incompatible licences, unable to choose one for the product/project. Therefore is important to:
    • Start early to consider licences and overall attitude towards FOSS licences.
    • Learn about licences of used components and determine which licences are acceptable within the project.
    • Determine the potential future licence if the way software is used is changed.

Sharing software with someone

  • With permissive licences of components, the modifiers do not have to make any source code available.
  • With copyleft components, access to some or all source code must be allowed.
  • When sharing, the same or compatible licence for changed code or even the entire project must be used.
  • With several strong copyleft components, the creators may not be able to pick up a licence that is compatible with all of them.
  • Licence compatibility has become a major and very actual issue in the wider software community.
  • One should think twice about the software under a permissive licence that is effectively controlled by a single entity, especially if the software may be used in a service. Some modifiers or their customers may therefore prefer copyleft so that would be protected from licence changes.

For a service, the provider is safe if using any FOSS except one under AGPL (but even that as well as long as it does not mind letting the users get its code or do not want to use it from cloud providers, which may be forbidden from offering a service based on such software. The same applies to ‘fauxpen’ licences such as SSPL or ELv2.

Licence impact on community, quality, longevity and sustainability

Projects often follow a natural cycle of creation, a burst of intense activity, a long phase of steady use and productivity, and fading as it is replaced by new projects covering the same space but with a more advanced technology base; this happens through the slow or fast migration of the community. Factors that affect software sustainability and longevity are often analysed [https://opensource.com/life/14/1/evaluate-sustainability-open-source-project, https://repositum.tuwien.at/handle/20.500.12708/2820]. The longer the project is alive, the more likely it will exist. The activity of the community (number of contributions and active contributors) and the quality of its core members are more significant than the size of the user base for the sustainability of the software. An analysis of the Ohloh data, now at [https://www.openhub.net/], about a large number of FOSS projects [https://redmonk.com/dberkholz/2013/04/22/the-size-of-open-source-communities-and-its-impact-upon-activity-licensing-and-hosting/] indicates that:

  • The larger the project is, the more like it is to work out the licensing issues and specify a licence. The portion of projects without a specified licence decreases with the number of monthly committers – they start at about 50% for a single committer, decrease to 40% for five, and stabilise at about 20% for projects with more than contributors.
  • Permissively licenced projects are evenly distributed regardless of their size. They start at 20% for up to 10 monthly contributors, peak at about 25% for 20-30 contributors, and then return to the baseline.
  • The use of copyleft licences coincides with the size of the active community. It starts with about 20%, increases to about 35% after 10 committers, and ends up at about 40% for projects with many contributors.

The lack of a clear licence is an indication that the developers find licensing unimportant, confusing or too time-consuming for their purpose. Such projects do not tend to last long and establish a large community.

The utility of software is maximised if the widest possible set of users can appropriate its benefits. But FOSS, like many other parts of the digital infrastructure, suffers from a free-rider problem: “Resources are offered for free, and everybody (whether individual developer or large software company) uses them, so nobody is incentivised to contribute back, figuring that somebody else will step in.” [https://www.fordfoundation.org/work/learning/research-reports/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure/]. A free rider has a competitive advantage, since it did not have to invest in the original development, and can invest in developing additional benefits and services instead. While a free-rider does not exclude others from using the code, which is not an exclusive resource, it may exhaust the original creator’s access to users. Users turn into customers; customers are an exhaustible common resource as they tend to stick to one provider. Customers contribute to provider income in various ways defined by its business model. Although most people perceive free riding as deeply unfair, it is still better to have someone using the creator’s open-source software than somebody else’s. The presence of a free-rider makes it more likely that others will also use this software and some of them will contribute back. Therefore, software free-riders, including competitors who capitalise on others’ work may have a positive overall effect as they act as mediators towards other contributors and customers. A large user community brings the contributors and paying customers and even brings the sponsors who otherwise would not show up. Still, for this to happen it is necessary to prevent the competing free-riders from suffocating the primary contributors. This means that the original contributor’s offering (beyond just software) must be made somewhat exclusive to incentivise users and customers to join.

  • Permissively licences software may start small, stay that way, or increase in terms of activity, but seem to be somewhat limited by the optionality of returning to the community by those who modify it.
  • Weak copyleft licences are suitable for libraries and other components the popularity and utility of which would be significantly affected by expansive licensing rules of strong copyleft licences.
  • Strong copyleft licences are suitable for large or standalone projects such as operating systems and specialised or productivity tools.

There is a growing number of companies whose business model is based on FOSS. This is model is called commercial open-source software. Their commercial offerings usually take the form of proprietary or closed-source IP, which may include a combination of premium features and hosted services that offer performance, scalability, availability, productivity, and security assurances. This is known as the ‘open core business model’. Some of them also offer professional services, including maintenance and support assurances.

The obligation to keep all modifications under the same or compatible copyleft licence works exceptionally well in projects such as the Linux kernel. This is particularly the case when the licence does not preclude the use of software to run other software under other types of licences. Therefore, the use of a copyleft licence may be a great benefit to the software, especially if it does not reduce its use in normal usage scenarios. This is why weak copyleft licences such as LGPL were designed for and they are applied when it is more important to enlarge the number of contributors (as with research software) than to boost its popularity by maximally liberal terms of use or to keep the competitive advantage by keeping the code proprietary. Furthermore, a combination of (often copyleft) open-source with additional proprietary add-on components or services on top is an often applied approach that balances openness with sustainability.

If a large enough member of the community has a sufficient influence on the platform, it may decide to fork it under a ‘fauxpen’ proprietary licence that significantly constrains its use, at the same redirecting most of the current users to the fork and taking full control over the new developments in it. The inadequacy of the original business model or the appearance of competing offerors is the reason for some makers of open source products, which have been seen as its custodians by the communities, to make this move. Of course, this is possible only with permissive licences that typically allow appropriation through relicensing. The appropriator does not even have to be the primary contributor to software, but the one most users refer to, for example by providing support or popular commercial add-ons. A similar outcome may be caused by the extensive use of software as a part of a cloud offering, where the cloud provider effectively distributes and monetises open-source software without meaningfully contributing back to it or by providing proprietary add-ons, which are typically limited to facilitating access to the platform within its cloud offering. The original ‘open core business model’ provider may then move to a network protective or ‘fauxpen’ licence. However, some projects with permissive licences, such as the Apache web server, have an extremely long lifespan and a huge community.

It is not possible to empirically determine whether software longevity benefits more from copyleft or permissive licences, as this more depends on other circumstances. The choice of a licence supporting sustainability and longevity of software primarily depends on the attitude of the developers and community, as well as the primary usage scenarios. Of course, such a choice may not be available at all due to the requirements imposed by the organisation, funder or dependencies. Interestingly, when this choice is available, it may be more dependent on intrinsic motivations and view about fairness than extrinsic motivations such as the expectation of reputation or economic gain [https://opensource.com/law/13/8/motivation-free-software-licensing]. On the other hand, if the developers invest in interoperability and open standards, this may greatly help project adoption regardless of the licence.

Multi-licensing under permissive and copyleft or copyleft and proprietary terms is also a viable solution, as it increases licensing compatibility when the software when is to be combined with other components into new products. At the same time, it allows for a larger user base and, at least to some extent, stimulates future contributions.

The user base is also not to be neglected, and it greatly contributes to the sustainability of permissive FOSS. Users drive the functionality, identify the bugs, and shape the direction of a project to meet their needs. This may result in slick products that ‘just work’ without much configuration and customisation, as long as the target audience is large enough and there are other factors that contribute to the product ecosystem. Still, it is often very hard to determine the size and engagement of the user community. What is often much easier to assess (when choosing), but hard to incite (when developing), is the wider ecosystem around a project established by the engagement of other providers which may offer support, consultancy, customisation, hosting, or bundling with their products or services.

Besides often emphasised doubts about the FOSS business models, some authors [Lanier, Jaron, You Are Not a Gadget: A Manifesto] dispute open source and open content expropriation of intellectual production as a form of "Digital Maoism" which stifles small-scale entrepreneurship and destroyed opportunities for the middle class to finance content creation, resulting in the concentration of wealth in a few corporations and individuals, who insert themselves as content and service concentrators. However, instead of FOSS, this criticism should be rather directed to the centralisation of distribution and advertising platforms and the model of “free services” paid for through reselling of personal data, user profiles and targeted marketing. The large concentrators depend on FOSS like anyone else, but their core components are always proprietary. On the other hand, big tech companies frequently create or appropriate FOSS platforms and tools for consumers and developers to tie the customers to their ecosystems and technologies. Typical examples are some very popular application development tools, run-time environments, non-SQL data storage and processing platforms and AI platforms which are typically conveniently tied to companies’ cloud offerings. Therefore, whenever a big tech company offers a sleek FOSS component or platform, developers should think twice if they want to become ‘products’ again and be recruited into the company’s camp.

Despite FOSS’s success, scaling and sustaining open-source projects remain challenging.

Licence selection and attaining compliance

Copyleft licences ensure licensing stability, while permissive software can be forked and relicensed by a major contributor or a company providing popular free or commercial services or products based on this software. Such an organisation can also strongly influence software evolution and usage patterns.

  • Available options may be mandated or recommended by the institution, project management or funder.
  • The constraints of other involved parties and coauthors must be respected.
  • The constraints imposed by original authors and licences of dependencies must be respected.
  • There may be some typical and established software licensing practices of the community.

Personal preferences and attitudes of software authors, who should also consider desirable public messages and non-mandating institutional, project-level or funder preferences on software licensing and open source.

The choice is typically quite simple. The existing constraints most often mandate the type of licence. If these institutional or other policies prohibit the use of copyleft licences, this also means that the software must not use components under such licences. But if this is allowed and such components are needed and useful, then a compatible copyleft licence is to be used.

The opportunity for a relatively free choice exists in a situation where all important used components come with either permissive or weak copyleft licences. If components with weak copyleft licences are modified, these modifications must retain the original or use a compatible licence.

Software licence management steps

Gather and document information

  • Note the licence of the ‘product’ (entire bundle of created components) or ‘project’ (one program or stand-alone component), if set
  • Create an open-source inventory of used components
  • Detect vulnerable open-source components (to remove or replace)
  • Identify outdated open-source libraries (to replace)
  • Identify licences of used components (in-licences)
  • Clarify ambiguities or doubts, such as those on the use or modification of libraries
  • A tool may not be able to properly identify a licence – in Mend, some are suspect or ambiguous
  • Information about the applied licence may be false, unclear or contradictory
  • Some licences may be recognised under several names
  • Some (permissive) licences (BSD, Artistic …) have unnumbered variants or are sometimes edited by authors
  • Applicability of ‘or later’ licences may be unclear or even edited in the licence text
  • Document gathered information – Mend does the above through reports, UI and data exports
  • Document your decisions – some may be refined during remediation

Remediate

  • Choose a product/project licence (out-licence) compatible with key dependencies
  • Initial improvements
  • Remedy vulnerable open-source components
  • Update outdated open-source libraries (where possible)
  • Ask component authors to clarify their licence or to relicense
  • Pay for the required proprietarily licensed software
  • Choose among dual licences of components
  • Identify remaining incompatible licences
  • Decide what to do with components that use these licences
  • Remove (component and corresponding functionality) if not necessary
  • Replace with an existing equivalent
  • Move to server-side (central service)
  • Write your replacement
  • Enforce open source licence compliance (e.g., provide all required compliance artifacts)
  • Accept some risks

Create compliance artifacts (to ensure compliance)

  • As required by the applied policy

Software composition analysis (SCA) and licence selection tools

There is a number of tools that analyse software projects' dependencies, used libraries, their licenses and licenses declared or distributed with the source code. Ideally, software composition and licenses of components and parts of the code should be continuously monitored as a part of the build process.

Commercial SCA services

FOSS SCA solutions

Licence selection tools

Sustainability of FOSS in science

Often, overwhelmed by the range of software tools available, unsure of their quality and often questioning the quality of implementation performed computation, researchers resort to developing tools tailored for their specific use cases. This leads to a large number of tools and packages, which have limited support, short lifetimes and a small number of users. Often, the developers simply do not want to look elsewhere as they are paid for programming, may justify it, or think that their case is singular or very special. Making software open-source alleviates, but does not resolve this problem.

Also, despite FOSS’s success, scaling and sustaining open-source projects remain challenging. Sometimes, researchers or developers manage to maintain their tools as a side project ( and sometimes build an entire community while keeping the tool free and open source. An example of this is Gephi. Also, small open-source communities can rely on volunteers and self-governance.

It is more interesting to look at large open-source projects to see how individuals and companies who make living out of them can be financially sustainable. Increasingly, researchers are finding ways of developing tools that are both open source and capable of making revenue:

  • The organisations can be funded through institutional membership and fundraising. Lyrasis, which homes ArchivesSpace, CollectionSpace, DSpace, Fedora and VIVO open source platforms, has more than 1000 members and it launched the DSpace Development Fund (DDF) in 2022.
  • Some projects follow an open core model (for example RapidMiner) by licensing parts of the code that enable scaling to enterprise levels.
  • Based on the strong base in its institutional community and influence stemming from its open-source software, an organisation may offer commercial services. Lyrasis provides certification of partners and other providers, hosting, consulting, training, digitization, preservation and fiscal services. It also mediates in content creation and acquisition and application for grants.
  • The primary contributor to open source software may also use it or related services as a promotional and visibility device or a token of its participation in a larger collaboration. This requires funding for software and service operation, support, maintenance and development from other sources, such as its other businesses, institutional budget or national research projects.
  • The project may provide its solution as a service that is used within a larger scientific infrastructure, platform or collaboration and in return share a part of its income, regardless of its business model.
  • Access to venture capital (VC) and private investors may be suitable for teams that intend to commercialise or make a profit from tools that cover a wider market than the academic sector.

Scientific open source and related services cannot expect to generate enough traffic to sustain themselves from advertisements. Crowdsourcing may work for projects related to the research of more or less frequent diseases or therapies for them. It is also possible to seek small donations from interested individuals or large corporate ones if the subject is attractive and something many people are interested in or passionate about, such as climate change, astronomy or long-standing mathematic problems. This can be extended further by stimulating public participation by offering individuals to engage as citizen scientists or by providing their computing resources, as is done by distributed computing projects. Examples of this are Folding@home, iThena, GPUGRID.net, PrimeGrid, World Community Grid (WCG), Rosetta@home, Cosmology@Home, SETI@Home, climateprediction.net (CPDN) and LHC@home, the majority of which are based on LGPL-licensed Berkeley Open Infrastructure for Network Computing (BOINC) [https://boinc.berkeley.edu/projects.php]. Such platforms do not try to monetise even a part of obtained resources, as this would repel their contributors, in particular those who bestow resources owned by someone else (e.g., employer). It also should be noted that the processing load on these platforms is often intentionally proprietary to protect the integrity of calculation, as performed analyses may be prone to bombing by fake results from marred worker nodes, which would then require everything to be verified in a controlled environment.

The open-source operational models outlined above provide different advantages and disadvantages and require varying levels of engagement. Therefore, there is no uniform approach that will solve the sustainability challenge. However, options that have already been successfully implemented, recipes and examples from a wider OS community and COSS (Commercial Open Source Software) in particular do help [Popp, Karl. (2015). Best practices for commercial use of open source software: Business models, Processes and Tools for Managing Open Source Software; https://flagsmith.com/podcast/joseph-jj-jacks-oss-capital/]. The Software Sustainability Institute provides guidance and support around open-source code for research. The Apereo Foundation is a membership organisation that offers guidance and incubation opportunities for teams working on open-source technologies for learning and research within higher education.

Governance of a growing FOSS project and one that is going through a change in its business model can be extremely difficult. Self-governance, centralization of originally distributed projects and privatization or commercialisation require clear rules around membership, contribution duties and appropriation rights. In the case of self-governance, these rules require monitoring and enforcement by a private agent or several members of the group, or an external agent for centralization and privatization.

OS in science has a smaller user base, but it may have a stronger appeal and can more easily attract institutional sponsors. There are also many examples of organisations coming together in consortia or forming a non-profit organisation to support the development and sustainable management of scientific OS tools. It is also good that OS for science can easily take off within international or university research projects, and after the initial incubation, maturation and proving, continue to operate in a wider environment. Software tools developed within NI4OS (LCT, RoLECT and RePol) are examples of this. In the long run, FOSS, which is currently being challenged by COSS, may emerge as more suitable for scientific software than use in the commodified environment of commercial cloud-based services.

FAIR-related guidelines for software creators

Here provided list is a blend of “Four simple recommendations to encourage best practices in research software” [ https://f1000research.com/articles/6-876/v1] and “Five recommendations for fair software” [https://zenodo.org/record/4310217]:

  1. Make source code public and use the publicly accessible and versioned repository from the very beginning (relate is the practice of depositing software in archives due to changes in journal policies – the primary goal of this is the reproducibility of results by preserving the research environment. This is why software is typically rather deposited in specialised repositories that have been developed and evolved independently from scientific ones. These platforms provide long-term benefits and support the improvement of software as living products maintained by several contributors by providing specific features, access mechanisms and integrations). But even putting software on GitHub may not do much for reusability without a clear licence and readme information as the primary enablers and indicators of reusability
  2. Adopt a licence and comply with the licensing requirements of all dependencies and contributors
  3. Provide basic metadata in by registering software in a relevant community registry to make it easy to discover (sometimes described in the documentation of the registry, but you can also see for yourself by installing a tool)
  4. Establish clear and transparent contribution, communication and governance workflows
  5. Enable citation of software (Some archiving services that meet these requirements)
  6. From FARI4RS R3: meet the standards of the domain community. The lowest – stick to community expectations conventions on formats used to read and write data, but also on in terms of provided functionality, terminology, and other domain’s conventions and practices – even if not immediately needed for the immediate purpose, as it will increase adoption and reuse and chances of sustainability and external contribution. Do not limit to Software metadata and documentation standards of the domain-relevant community.
  7. Use a software quality checklist to assess components and your research software – [https://www.sciencedirect.com/science/article/pii/S0164121222000267]:
    1. Community support and adoption (with factors such as popularity, reputation, size, communication channels, and involvement)
    2. Documentation
    3. Costs (licence, training, support, etc.)
    4. Licensing conditions
    5. Operational characteristics such as independence from other software, development language, portability, compliance and testability
    6. Maturity
    7. Quality aspects such as reliability, performance, modularity, maintainability, code quality and architecture
    8. Perceived risks related to confidentiality, integrity, availability, etc.
    9. Trustworthiness of components, architecture and platform, provider reputation, 3rd-party assessments

Reporting open source in NI4OS-Europe Agora

Data about the use of open-source software and technologies are collected in NI4OS-Europe Agora following the EOSC Portal Profiles v4.00 specification [https://zenodo.org/record/5726890] in the field named ERP.MTI.5 or “Open Source Technologies”, in “EOSC Resource Profile Tables / Data Model / Maturity Information / Open Source Technologies”). This field is used to provide a “List of open source technologies incorporated into the Resource”. The specification states that this field is for specific technologies not broad ones like HTTP or a Linux distribution. This field is optional with multiple values of up to 100 characters. The validation criterion is rather simple: “Check that the technologies mentioned/projects exist”.

However, since the use of Free and open source software (FOSS) is a significant facilitator in the advancement of Open Science and related services, in NI4OS-Europe we are asking you to be more detailed and verbose when it comes to reporting the use of FOSS by your services. Therefore, please provide a one-line description for each significant component of your service. If possible, follow the identifying name of your component (version numbers should be rather avoided) with the comma-separated name or SPDX code of the corresponding software licence [https://spdx.org/licenses/]. If you think it is needed, you can also provide related URLs; however, make sure that the entire line does not exceed 100 characters. If there is enough space available, you can also provide a short description of what the software is used for, separating it using ‘ – ’. Here are a few examples of valid descriptions:

To help you in providing this information, below are the names, licences, URLs, and purposes for “Open Source Technologies” that are most frequently mentioned NI4OS-Europe Agora (as of July 2022):

Name URL Licence Licence URL Purpose
DSpace https://dspace.lyrasis.org/ BSD-3-Clause (permissive) https://dspace.lyrasis.org/dspace-source-code-bsd-license/ Repository
PostgreSQL https://www.postgresql.org/ PostgreSQL (permissive) https://www.postgresql.org/about/licence/
Apache HTTP Server https://httpd.apache.org/ Apache License 2.0 (permissive) https://httpd.apache.org/docs/current/license.html Web server
Java EE (now Jakarta EE) https://jakarta.ee/

Eclipse Public License 2.0 (weak copyleft)

or GNU General Public License 2 with the GNU Classpath Exception (weak copyleft)

https://www.eclipse.org/legal/epl-2.0/. https://projects.eclipse.org/license/secondary-gpl-2.0-cp, https://www.gnu.org/software/classpath/license.html Enterprise Java
MongoDB https://www.mongodb.com/atlas/database

Server Side Public License (SSPL) v1.0 after October 16, 2018 (proprietary).

GNU AGPL v3.0 before October 16, 2018 (network protective strong copyleft)

https://www.mongodb.com/community/licensing, https://www.mongodb.com/licensing/server-side-public-license,

old versions: https://www.gnu.org/licenses/agpl-3.0.html

Non-relational DB
MySQL https://www.mysql.com/products/ GPL v2+ (strong copyleft) or proprietary

https://downloads.mysql.com/docs/licenses/mysqld-8.0-gpl-en.pdf

https://downloads.mysql.com/docs/licenses/mysqld-8.0-com-en.pdf

Relational database
Spring Boot Framework https://spring.io/projects/spring-boot Apache License 2.0 (permissive) https://github.com/spring-projects/spring-boot/blob/main/LICENSE.txt Application framework
Angular https://angular.io/ MIT (permissive) https://angular.io/license Application framework
Apache Tomcat https://tomcat.apache.org/ Apache License 2.0 (permissive) https://www.apache.org/licenses/LICENSE-2.0 Web server
Google BERT https://github.com/google-research/bert Apache License 2.0 (permissive) https://github.com/google-research/bert/blob/master/LICENSE Language model for NLP
Numpy https://numpy.org/ BSD (permissive) https://github.com/numpy/numpy/blob/main/tools/npy_tempita/license.txt Scientific computing library
OpenStack https://www.openstack.org/ Apache License 2.0 (permissive) https://github.com/openstack/openstack/blob/master/LICENSE Cloud platform
Python https://www.python.org/ PSF Licence Agreement, GPL compatible (permissive) https://docs.python.org/3/license.html Programming language
PyTorch https://pytorch.org/ BSD (permissive) https://github.com/pytorch/pytorch/blob/master/LICENSE Machine learning framework
Scikit-Learn https://github.com/scikit-learn/scikit-learn BSD 3-Clause ("New" or "Revised") (permissive) https://github.com/scikit-learn/scikit-learn/blob/main/COPYING Machine learning library
TensorFlow https://www.tensorflow.org Apache License 2.0 (permissive) https://github.com/tensorflow/tensorflow/blob/master/LICENSE Machine learning platform/library


NI4OS-Europe service admins very often provide ‘Linux’ and ‘XML’. As mentioned before, please do not indicate the use of such generic or general-purpose technologies.

Sometimes, the primary software product used by your service may rely on some other software, where a few alternatives may be available. Please, try to indicate these as well within separate “Open Source Technologies” entries. For example, repositories based on DSpace and related supporting tools use several components, where some choices are possible. Highlighted are the components that were selected for the repositories maintained by the UoB, along with their potential alternatives:

  • Java-environment: OpenJDK (GPL-2.0-only with linking exception) instead of Oracle's Java (Oracle No-Fee Terms and Conditions" (NFTC))
  • Web server: Apache Tomcat (Apache License 2.0), Jetty (Apache License 2.0 and Eclipse Public License 1.0), or Caucho Resin (GPLv3 or proprietary)
  • Relational database: PostgreSQL (PostgreSQL License, similar to BSD or MIT); a led favoured alternative would be Oracle (Oracle Database XE cos without Java stored procedures and has limited in terms of data quantity and use of only a single core)
  • Reverse proxy: NGINX (2-clause BSD license or proprietary) or Apache (Apache License 2.0)
  • Non-relational DB: Solr (Apache License 2.0) instead of Elasticsearch (Elastic License 2.0 (ELv2) or Server Side Public License (SSPL), which are a source-available ‘fauxpen’ licence and a usual proprietary licence)

Literature

  1. A Fresh Look at FAIR for Research Software, https://arxiv.org/ftp/arxiv/papers/2101/2101.10883.pdf
  2. Barthonnat, Céline, Blotière, Emilie, Gingold, Arnaud, Mas, François-Xavier, Stanić, Nikola, Pierno, Alessandro, Szulińska, Agnieszka, Armando, Lorenzo, Pochet, Bernard, de Santis, Luca, MacGregor, James, Pozzo, Riccardo, & Pogačnik, Aleš. (2021). OPERAS SIG on Tools for Open Scholarly Communication: White Paper 2021. Zenodo, https://doi.org/10.5281/zenodo.5654319
  3. Black Duck Open Hub, https://www.openhub.net/
  4. Chue Hong, N. P., Katz, D. S., Barker, M., Lamprecht, A-L, Martinez, C., Psomopoulos, F. E., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., Honeyman, T., et al. (2022). FAIR Principles for Research Software version 1.0. (FAIR4RS Principles v1.0). Research Data Alliance. https://doi.org/10.15497/RDA00068, https://rd-alliance.org/group/fair-research-software-fair4rs-wg/outcomes/fair-principles-research-software-fair4rs-0
  5. Discover your next tool for social media analysis – A list of tools and software to support the collection and analysis of social media data (dataset), https://socialmediatools.pory.app/
  6. Donnie Berkholz, The size of open-source communities and its impact upon activity, licensing, and hosting, April 22, 2013 https://redmonk.com/dberkholz/2013/04/22/the-size-of-open-source-communities-and-its-impact-upon-activity-licensing-and-hosting/
  7. Duca, D. (2019), The ecosystem of technologies for social science research (dataset). doi: 10.5281/zenodo.3555207, https://github.com/sagepublishing/SAGE_tools_social_science/blob/master/data/master_tools_current.csv
  8. Duca, D., & Metzler, K. (2019). The ecosystem of technologies for social science research (White paper). London, UK: Sage. doi: 10.4135/wp191101, https://static1.squarespace.com/static/5d5ad9e0100bdf0001af0f5e/t/5ed0ea0631c1a80efe375fe5/1590749710566/The+Ecosystem+of+Technologies+for+Social+Science+Research.pdf, https://group.sagepub.com/white-paper-archive/the-ecosystem-of-technologies-for-social-science-research, https://sagepublishing.github.io/sage_tools_social_science/
  9. Duca, D., Developing a comprehensive directory of tools and technologies for social science research methods, https://forrt.org/educators-corner/003-developing-tools/]
  10. Five recommendations for fair software, https://fair-software.eu/, https://zenodo.org/record/4310217
  11. Ford Fondation, Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure, https://www.fordfoundation.org/work/learning/research-reports/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure/
  12. HAL, https://hal.archives-ouvertes.fr/
  13. Hasselbring, Wilhelm, Carr, Leslie, Hettrick, Simon, Packer, Heather and Tiropanis, Thanassis. "From FAIR research data toward FAIR and open research software" it – Information Technology, vol. 62, no. 1, 2020, pp. 39-47., https://doi.org/10.1515/itit-2019-0040, https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html
  14. How to evaluate the sustainability of an open source project, January 22, 2014, https://opensource.com/life/14/1/evaluate-sustainability-open-source-project
  15. https://www.sciencedirect.com/science/article/pii/S0164121222000267
  16. https://www.softwarepreservationnetwork.org/wp-content/uploads/2022/01/Software_Metadata_Recommended_Format_Guide-1.pdf
  17. Ben Rometsch, Interview with Joseph "JJ" Jacks: Founder and General Partner, OSS Capital’s Vision for Open Source Software, May 25, 2021, https://flagsmith.com/podcast/joseph-jj-jacks-oss-capital/]
  18. Jackson, 2019, https://www.software.ac.uk/how-cite-software
  19. Jiménez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations to encourage best practices in research software (version 1). F1000Research 2017, 6:876, https://doi.org/10.12688/f1000research.11407.1, https://f1000research.com/articles/6-876/v1
  20. John W Maxwell, Erik Hanson, Leena Desai, Carmen Tiampo, Kim O'Donnell, Avvai Ketheeswaran, Melody Sun, Emma Walter, Ellen Michelle, A Landscape Analysis of Open Source Publishing Tools and Platforms, Mind the Gap, Simon Fraser University, July 2019, https://mindthegap.pubpub.org/
  21. Katz DS, Chue Hong NP, Clark T et al. Recognizing the value of software: a software citation guide (version 2). F1000Research 2021, 9:1257, https://doi.org/10.12688/f1000research.26932.2
  22. Kiselka, B. (2015). Software project longevity – a case study on open source software development projects [Master Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2015.34133, https://repositum.tuwien.at/handle/20.500.12708/2820
  23. Lamprecht, Anna-Lena et al., Towards FAIR principles for research software, Data Science, vol. 3, no. 1, pp. 37-59., https://doi.org/10.3233/DS-190026, https://doi.org/10.5281/zenodo.6374598, https://content.iospress.com/articles/data-science/ds190026
  24. Lanier, Jaron, You Are Not a Gadget: A Manifesto. New York, Vintage Books, 2011.
  25. Mariannig Le Béchec, Aline Bouchard, Philippe Charrier, Claire Denecker, Gabriel Gallezot, et al., Pratiques et usages des outils numériques dans les communautés scientifiques en France. [Rapport de recherche] Comité pour la science ouverte. 2022, 112 p. hal-03545512, https://www.ouvrirlascience.fr/state-of-open-science-practices-in-france-sosp-fr/, https://hal-lara.archives-ouvertes.fr/OUVRIR-LA-SCIENCE/hal-03545512, https://hal-lara.archives-ouvertes.fr/hal-03545512/document
  26. Martinez, et al. (2022), A Survey on Adoption Guidelines for the FAIR4RS Principles: Dataset (1.0) (dataset), Zenodo, https://doi.org/10.5281/zenodo.6375540
  27. Michael Jackson. (2018). Software Deposit: Guidance for Researchers (1.0). Zenodo. https://doi.org/10.5281/zenodo.1327310, https://zenodo.org/record/1327310
  28. Nicolas Suzor, What motivates free software developers to choose between copyleft and permissive licences?, August 8, 2013, https://opensource.com/law/13/8/motivation-free-software-licensing
  29. Open Source Initiative, Licenses & Standards, https://opensource.org/licenses
  30. Open Source Initiative, The Open Source Definition, https://opensource.org/osd
  31. Popp, Karl. (2015). Best practices for commercial use of open source software: Business models, Processes and Tools for Managing Open Source Software
  32. Projects that use Berkeley Open Infrastructure for Network Computing (BOINC), (dataset) https://boinc.berkeley.edu/projects.php
  33. R. D. Cosmo, M. Gruenpeter and S. Zacchiroli, "Referencing Source Code Artifacts: A Separate Concern in Software Citation," in Computing in Science & Engineering, vol. 22, no. 2, pp. 33-43, March-April 2020, doi: 10.1109/MCSE.2019.2963148., https://ieeexplore.ieee.org/document/8946737
  34. Sanchez-P. Jorge-A. (2021). EOSC Portal Profiles v4.00 (v4.00). Zenodo. https://doi.org/10.5281/zenodo.5726890, https://zenodo.org/record/5726890
  35. Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. 2016. Software citation principles. PeerJ Computer Science 2:e86, https://doi.org/10.7717/peerj-cs.86
  36. SPDX License List, https://spdx.org/licenses/
  37. Top open source licenses and legal risk for developers, July 13, 2022, https://www.synopsys.com/blogs/software-security/top-open-source-licenses/